academic_observatory_workflows.doi_workflow.queries

Attributes

MAX_QUERIES

Classes

`Table`
`SQLQuery`
`Aggregation`

Functions

`make_sql_queries`(→ List[List[SQLQuery]])
`fetch_ror_affiliations`(→ Dict)	Fetch the ROR affiliations for a given affiliation string.
`get_snapshot_date`(project_id, dataset_id, table_id, ...)
`traverse_ancestors`(index, child_ids)	Traverse all of the ancestors of a set of child ROR ids.
`ror_to_ror_hierarchy_index`(→ Dict)	Make an index of child to ancestor relationships.

Module Contents

academic_observatory_workflows.doi_workflow.queries.MAX_QUERIES = 100[source]

class academic_observatory_workflows.doi_workflow.queries.Table[source]

project_id: str[source]

dataset_id: str[source]

table_name: str = None[source]

sharded: bool = False[source]

snapshot_date: pendulum.DateTime = None[source]

property table_id[source]

Generates the BigQuery table_id for both sharded and non-sharded tables.

Returns:: BigQuery table_id.

class academic_observatory_workflows.doi_workflow.queries.SQLQuery[source]

name: str[source]

inputs: Dict = None[source]

output_table: Table = None[source]

output_clustering_fields: List = None[source]

class academic_observatory_workflows.doi_workflow.queries.Aggregation[source]

table_name: str[source]

aggregation_field: str[source]

group_by_time_field: str = 'published_year'[source]

relate_to_institutions: bool = False[source]

relate_to_countries: bool = False[source]

relate_to_groups: bool = False[source]

relate_to_members: bool = False[source]

relate_to_journals: bool = False[source]

relate_to_funders: bool = False[source]

relate_to_publishers: bool = False[source]

academic_observatory_workflows.doi_workflow.queries.make_sql_queries(input_project_id: str, output_project_id: str, dataset_id_crossref_events: str = 'crossref_events', dataset_id_crossref_metadata: str = 'crossref_metadata', dataset_id_crossref_fundref: str = 'crossref_fundref', dataset_id_ror: str = 'ror', dataset_id_orcid: str = 'orcid', dataset_id_open_citations: str = 'open_citations', dataset_id_unpaywall: str = 'unpaywall', dataset_id_scihub: str = 'scihub', dataset_id_openalex: str = 'openalex', dataset_id_pubmed: str = 'pubmed', dataset_id_settings: str = 'settings', dataset_id_observatory: str = 'observatory', dataset_id_observatory_intermediate: str = 'observatory_intermediate') → List[List[SQLQuery]][source]

academic_observatory_workflows.doi_workflow.queries.fetch_ror_affiliations(repository_institution: str, num_retries: int = 3) → Dict[source]

Fetch the ROR affiliations for a given affiliation string.

Parameters:

repository_institution – the affiliation string to search with.
num_retries – the number of retries.

Returns:

the list of ROR affiliations.

academic_observatory_workflows.doi_workflow.queries.get_snapshot_date(project_id: str, dataset_id: str, table_id: str, snapshot_date: pendulum.DateTime)[source]

academic_observatory_workflows.doi_workflow.queries.traverse_ancestors(index: Dict, child_ids: Set)[source]

Traverse all of the ancestors of a set of child ROR ids.

Parameters:

index – the index.
child_ids – the child ids.

Returns:

all of the ancestors of all child ids.

academic_observatory_workflows.doi_workflow.queries.ror_to_ror_hierarchy_index(ror: List[Dict]) → Dict[source]

Make an index of child to ancestor relationships.

Parameters:: ror – the ROR dataset as a list of dicts.
Returns:: the index.