academic_observatory_workflows.workflows.ror_telescope
Module Contents
Classes
The Research Organization Registry (ROR): https://ror.readme.io/ |
Functions
|
List all ROR Zenodo versions available between two dates. |
|
Validate whether a lat and lng are valid. |
|
Transform a ROR release. |
- class academic_observatory_workflows.workflows.ror_telescope.RorRelease(*, dag_id: str, run_id: str, snapshot_date: pendulum.DateTime, url: str, checksum: str)[source]
Bases:
observatory.platform.workflows.workflow.SnapshotRelease
- class academic_observatory_workflows.workflows.ror_telescope.RorTelescope(*, dag_id: str, cloud_workspace: observatory.platform.observatory_config.CloudWorkspace, bq_dataset_id: str = 'ror', bq_table_name: str = 'ror', api_dataset_id: str = 'ror', schema_folder: str = os.path.join(default_schema_folder(), 'ror'), dataset_description: str = 'The Research Organization Registry (ROR) database: https://ror.org/', table_description: str = 'The Research Organization Registry (ROR) database: https://ror.org/', observatory_api_conn_id: str = AirflowConns.OBSERVATORY_API, ror_conceptrecid: int = 6347574, start_date: pendulum.DateTime = pendulum.datetime(2021, 9, 1), schedule: str = '@weekly', catchup: bool = True)[source]
Bases:
observatory.platform.workflows.workflow.WorkflowThe Research Organization Registry (ROR): https://ror.readme.io/
Saved to the BigQuery table: <project_id>.ror.rorYYYYMMDD
- make_release(**kwargs) List[RorRelease][source]
Make release instances. The release is passed as an argument to the function (TelescopeFunction) that is called in ‘task_callable’.
- Parameters:
kwargs – the context passed from the PythonOperator. See
https://airflow.apache.org/docs/stable/macros-ref.html for a list of the keyword arguments that are passed to this argument. :return: A list of ROR release instances
- list_releases(**kwargs)[source]
Lists all ROR records and publishes their url, snapshot_date and checksum as an XCom.
- download(releases: List[RorRelease], **kwargs)[source]
Task to download the ROR releases.
- upload_downloaded(releases: List[RorRelease], **kwargs)[source]
Upload the Geonames data to Cloud Storage.
- extract(releases: List[RorRelease], **kwargs)[source]
Task to extract the ROR releases.
- transform(releases: List[RorRelease], **kwargs)[source]
Task to transform the ROR releases.
- upload_transformed(releases: List[RorRelease], **kwargs) None[source]
Upload the transformed data to Cloud Storage.
- bq_load(releases: List[RorRelease], **kwargs) None[source]
Load the data into BigQuery.
- add_new_dataset_releases(releases: List[RorRelease], **kwargs) None[source]
Adds release information to API.
- cleanup(releases: List[RorRelease], **kwargs) None[source]
Delete all files, folders and XComs associated with this release.
- academic_observatory_workflows.workflows.ror_telescope.list_ror_records(conceptrecid: int, start_date: pendulum.DateTime, end_date: pendulum.DateTime, page_size: int = 10, timeout: float = 30.0) List[dict][source]
List all ROR Zenodo versions available between two dates.
- Parameters:
conceptrecid – the Zendodo conceptrecid for ROR.
start_date – Start date of period to look into
end_date – End date of period to look into
page_size – the page size for the query.
timeout – the number of seconds to wait until timing out.
- Returns:
the list of ROR Zenodo records with required variables stored as a dictionary.