academic_observatory_workflows.doi_workflow.queries

Attributes

MAX_QUERIES

Classes

Table

SQLQuery

Aggregation

Functions

make_sql_queries(→ List[List[SQLQuery]])

fetch_ror_affiliations(→ Dict)

Fetch the ROR affiliations for a given affiliation string.

get_snapshot_date(project_id, dataset_id, table_id, ...)

traverse_ancestors(index, child_ids)

Traverse all of the ancestors of a set of child ROR ids.

ror_to_ror_hierarchy_index(→ Dict)

Make an index of child to ancestor relationships.

Module Contents

academic_observatory_workflows.doi_workflow.queries.MAX_QUERIES = 100[source]
class academic_observatory_workflows.doi_workflow.queries.Table[source]
project_id: str[source]
dataset_id: str[source]
table_name: str = None[source]
sharded: bool = False[source]
snapshot_date: pendulum.DateTime = None[source]
property table_id[source]

Generates the BigQuery table_id for both sharded and non-sharded tables.

Returns:

BigQuery table_id.

class academic_observatory_workflows.doi_workflow.queries.SQLQuery[source]
name: str[source]
inputs: Dict = None[source]
output_table: Table = None[source]
output_clustering_fields: List = None[source]
class academic_observatory_workflows.doi_workflow.queries.Aggregation[source]
table_name: str[source]
aggregation_field: str[source]
group_by_time_field: str = 'published_year'[source]
relate_to_institutions: bool = False[source]
relate_to_countries: bool = False[source]
relate_to_groups: bool = False[source]
relate_to_members: bool = False[source]
relate_to_journals: bool = False[source]
relate_to_funders: bool = False[source]
relate_to_publishers: bool = False[source]
academic_observatory_workflows.doi_workflow.queries.make_sql_queries(input_project_id: str, output_project_id: str, dataset_id_crossref_events: str = 'crossref_events', dataset_id_crossref_metadata: str = 'crossref_metadata', dataset_id_crossref_fundref: str = 'crossref_fundref', dataset_id_ror: str = 'ror', dataset_id_orcid: str = 'orcid', dataset_id_open_citations: str = 'open_citations', dataset_id_unpaywall: str = 'unpaywall', dataset_id_scihub: str = 'scihub', dataset_id_openalex: str = 'openalex', dataset_id_pubmed: str = 'pubmed', dataset_id_settings: str = 'settings', dataset_id_observatory: str = 'observatory', dataset_id_observatory_intermediate: str = 'observatory_intermediate') List[List[SQLQuery]][source]
academic_observatory_workflows.doi_workflow.queries.fetch_ror_affiliations(repository_institution: str, num_retries: int = 3) Dict[source]

Fetch the ROR affiliations for a given affiliation string.

Parameters:
  • repository_institution – the affiliation string to search with.

  • num_retries – the number of retries.

Returns:

the list of ROR affiliations.

academic_observatory_workflows.doi_workflow.queries.get_snapshot_date(project_id: str, dataset_id: str, table_id: str, snapshot_date: pendulum.DateTime)[source]
academic_observatory_workflows.doi_workflow.queries.traverse_ancestors(index: Dict, child_ids: Set)[source]

Traverse all of the ancestors of a set of child ROR ids.

Parameters:
  • index – the index.

  • child_ids – the child ids.

Returns:

all of the ancestors of all child ids.

academic_observatory_workflows.doi_workflow.queries.ror_to_ror_hierarchy_index(ror: List[Dict]) Dict[source]

Make an index of child to ancestor relationships.

Parameters:

ror – the ROR dataset as a list of dicts.

Returns:

the index.