academic_observatory_workflows.ror_telescope.tasks
Functions
|
Lists all ROR records and publishes their url, snapshot_date and checksum as an XCom. |
|
Task to download the ROR releases. |
|
Task to transform the ROR releases. |
|
Load the data into BigQuery. |
|
Adds release information to API. |
|
Delete all files, folders and XComs associated with this release. |
|
List all ROR Zenodo versions available between two dates. |
|
Validate whether a lat and lng are valid. |
|
Transform a ROR release. |
Module Contents
- academic_observatory_workflows.ror_telescope.tasks.fetch_releases(dag_id: str, run_id: str, cloud_workspace: observatory_platform.airflow.workflow.CloudWorkspace, data_interval_start: pendulum.Datetime, data_interval_end: pendulum.Datetime, ror_conceptrecid: int) List[dict][source]
Lists all ROR records and publishes their url, snapshot_date and checksum as an XCom.
- Parameters:
dag_id – The ID of the dag
run_id – The ID of this dagrun
cloud_workspace – The CoudWorkspace object
data_interval_start – The start of the data interval for this dagrun
data_interval_end – The end of the data interval for this dagrun
ror_conceptrecid – the Zenodo conceptrecid for the ROR dataset.
- Returns:
The list of releases
- academic_observatory_workflows.ror_telescope.tasks.download(release: dict)[source]
Task to download the ROR releases.
- academic_observatory_workflows.ror_telescope.tasks.transform(release: dict)[source]
Task to transform the ROR releases.
- academic_observatory_workflows.ror_telescope.tasks.bq_load(release: dict, bq_dataset_id: str, dataset_description: str, bq_table_name: str, table_description: str, schema_folder: str) None[source]
Load the data into BigQuery.
- Parameters:
bq_dataset_id – The bigquery dataset ID to load the data to
dataset_description – The description to give the bigquery dataset
bq_table_name – The table name to load into
table_description – The description to give the bigquery table
schema_folder – the folder containing the schema
- academic_observatory_workflows.ror_telescope.tasks.add_dataset_releases(release: dict, api_bq_dataset_id: str) None[source]
Adds release information to API.
- Parameters:
api_bq_dataset_id – The dataset containing the api table”
- academic_observatory_workflows.ror_telescope.tasks.cleanup_workflow(release: dict) None[source]
Delete all files, folders and XComs associated with this release.
- academic_observatory_workflows.ror_telescope.tasks.list_ror_records(conceptrecid: int, start_date: pendulum.DateTime, end_date: pendulum.DateTime, page_size: int = 10, timeout: float = 30.0) List[dict][source]
List all ROR Zenodo versions available between two dates.
- Parameters:
conceptrecid – the Zendodo conceptrecid for ROR.
start_date – Start date of period to look into
end_date – End date of period to look into
page_size – the page size for the query.
timeout – the number of seconds to wait until timing out.
- Returns:
the list of ROR Zenodo records with required variables stored as a dictionary.