academic_observatory_workflows.crossref_fundref_telescope.crossref_fundref_telescope

Module Contents

Classes

CrossrefFundrefRelease

Functions

create_dag(, dataset_description, table_description, ...)

Construct a CrossrefFundrefTelescope instance.

list_releases(→ List[dict])

List all available CrossrefFundref releases between the start and end date

new_funder_template()

Helper Function for creating a new Funder.

parse_fundref_registry_rdf(→ Tuple[List, dict])

Helper function to parse a fundref registry rdf file and to return a python list containing each funder.

add_funders_relationships(→ List)

Adds any children/parent relationships to funder instances in the funders list.

recursive_funders(→ Tuple[List, int])

Recursively goes through a funder/sub_funder dict. The funder properties can be looked up with the

Attributes

RELEASES_URL

academic_observatory_workflows.crossref_fundref_telescope.crossref_fundref_telescope.RELEASES_URL = 'https://gitlab.com/api/v4/projects/crossref%2Fopen_funder_registry/releases'[source]
class academic_observatory_workflows.crossref_fundref_telescope.crossref_fundref_telescope.CrossrefFundrefRelease(*, dag_id: str, run_id: str, snapshot_date: pendulum.DateTime, url: str, cloud_workspace: observatory.platform.observatory_config.CloudWorkspace)[source]

Bases: observatory.platform.workflows.workflow.SnapshotRelease

property download_file_path[source]
property extract_file_path[source]
property transform_file_path[source]
property download_blob_name[source]
property transform_blob_name[source]
property download_uri[source]
property transform_uri[source]
to_dict() Dict[source]
static from_dict(dict_: Dict) CrossrefFundrefRelease[source]
academic_observatory_workflows.crossref_fundref_telescope.crossref_fundref_telescope.create_dag(*, dag_id: str, cloud_workspace: observatory.platform.observatory_config.CloudWorkspace, bq_dataset_id: str = 'crossref_fundref', bq_table_name: str = 'crossref_fundref', api_dataset_id: str = 'crossref_fundref', schema_folder: str = project_path('crossref_fundref_telescope', 'schema'), dataset_description: str = 'The Crossref Funder Registry dataset: https://www.crossref.org/services/funder-registry/', table_description: str = 'The Crossref Funder Registry dataset: https://www.crossref.org/services/funder-registry/', observatory_api_conn_id: str = AirflowConns.OBSERVATORY_API, start_date: pendulum.DateTime = pendulum.datetime(2014, 2, 23), schedule: str = '@weekly', catchup: bool = True, gitlab_pool_name: str = 'gitlab_pool', gitlab_pool_slots: int = 2, gitlab_pool_description: str = 'A pool to limit the connections to Gitlab', retries: int = 3) airflow.DAG[source]

Construct a CrossrefFundrefTelescope instance.

Parameters:
  • dag_id – the id of the DAG.

  • cloud_workspace – the cloud workspace settings.

  • bq_dataset_id – the BigQuery dataset id.

  • bq_table_name – the BigQuery table name.

  • api_dataset_id – the Dataset ID to use when storing releases.

  • schema_folder – the SQL schema path.

  • dataset_description – description for the BigQuery dataset.

  • table_description – description for the BigQuery table.

  • observatory_api_conn_id – the Observatory API connection key.

  • start_date – the start date of the DAG.

  • schedule – the schedule interval of the DAG.

  • catchup – whether to catchup the DAG or not.

  • gitlab_pool_name – name of the Gitlab Pool.

  • gitlab_pool_slots – number of slots for the Gitlab Pool.

  • gitlab_pool_description – description for the Gitlab Pool.

  • retries – the number of times to retry a task.

academic_observatory_workflows.crossref_fundref_telescope.crossref_fundref_telescope.list_releases(start_date: pendulum.DateTime, end_date: pendulum.DateTime) List[dict][source]

List all available CrossrefFundref releases between the start and end date

Parameters:
  • start_date – The start date of the period to look for releases

  • end_date – The end date of the period to look for releases

Returns:

list with dictionaries of release info (url and release date)

academic_observatory_workflows.crossref_fundref_telescope.crossref_fundref_telescope.new_funder_template()[source]

Helper Function for creating a new Funder.

Returns:

a blank funder object.

academic_observatory_workflows.crossref_fundref_telescope.crossref_fundref_telescope.parse_fundref_registry_rdf(registry_file_path: str) Tuple[List, dict][source]

Helper function to parse a fundref registry rdf file and to return a python list containing each funder.

Parameters:

registry_file_path – the filename of the registry.rdf file to be parsed.

Returns:

funders list containing all the funders parsed from the input rdf and dictionary of funders with their

id as key.

academic_observatory_workflows.crossref_fundref_telescope.crossref_fundref_telescope.add_funders_relationships(funders: List, funders_by_key: Dict) List[source]

Adds any children/parent relationships to funder instances in the funders list.

Parameters:
  • funders – List of funders

  • funders_by_key – Dictionary of funders with their id as key.

Returns:

funders with added relationships.

academic_observatory_workflows.crossref_fundref_telescope.crossref_fundref_telescope.recursive_funders(funders_by_key: Dict, funder: Dict, depth: int, direction: str, sub_funders: List) Tuple[List, int][source]

Recursively goes through a funder/sub_funder dict. The funder properties can be looked up with the funders_by_key dictionary that stores the properties per funder id. Any children/parents for the funder are already given in the xml element with the ‘narrower’ and ‘broader’ tags. For each funder in the list, it will recursively add any children/parents for those funders in ‘narrower’/’broader’ and their funder properties.

Parameters:
  • funders_by_key – dictionary with id as key and funders object as value

  • funder – dictionary of a given funder containing ‘narrower’ and ‘broader’ info

  • depth – keeping track of nested depth

  • direction – either ‘narrower’ or ‘broader’ to get ‘children’ or ‘parents’

  • sub_funders – list to keep track of which funder ids are parents

Returns:

list of children and current depth