academic_observatory_workflows.ror_telescope.tasks

Functions

fetch_releases(→ List[dict])

Lists all ROR records and publishes their url, snapshot_date and checksum as an XCom.

download(release)

Task to download the ROR releases.

transform(release)

Task to transform the ROR releases.

bq_load(→ None)

Load the data into BigQuery.

add_dataset_releases(→ None)

Adds release information to API.

cleanup_workflow(→ None)

Delete all files, folders and XComs associated with this release.

list_ror_records(→ List[dict])

List all ROR Zenodo versions available between two dates.

is_lat_lng_valid(→ bool)

Validate whether a lat and lng are valid.

transform_ror(→ List[Dict])

Transform a ROR release.

Module Contents

academic_observatory_workflows.ror_telescope.tasks.fetch_releases(dag_id: str, run_id: str, cloud_workspace: observatory_platform.airflow.workflow.CloudWorkspace, data_interval_start: pendulum.Datetime, data_interval_end: pendulum.Datetime, ror_conceptrecid: int) List[dict][source]

Lists all ROR records and publishes their url, snapshot_date and checksum as an XCom.

Parameters:
  • dag_id – The ID of the dag

  • run_id – The ID of this dagrun

  • cloud_workspace – The CoudWorkspace object

  • data_interval_start – The start of the data interval for this dagrun

  • data_interval_end – The end of the data interval for this dagrun

  • ror_conceptrecid – the Zenodo conceptrecid for the ROR dataset.

Returns:

The list of releases

academic_observatory_workflows.ror_telescope.tasks.download(release: dict)[source]

Task to download the ROR releases.

academic_observatory_workflows.ror_telescope.tasks.transform(release: dict)[source]

Task to transform the ROR releases.

academic_observatory_workflows.ror_telescope.tasks.bq_load(release: dict, bq_dataset_id: str, dataset_description: str, bq_table_name: str, table_description: str, schema_folder: str) None[source]

Load the data into BigQuery.

Parameters:
  • bq_dataset_id – The bigquery dataset ID to load the data to

  • dataset_description – The description to give the bigquery dataset

  • bq_table_name – The table name to load into

  • table_description – The description to give the bigquery table

  • schema_folder – the folder containing the schema

academic_observatory_workflows.ror_telescope.tasks.add_dataset_releases(release: dict, api_bq_dataset_id: str) None[source]

Adds release information to API.

Parameters:

api_bq_dataset_id – The dataset containing the api table”

academic_observatory_workflows.ror_telescope.tasks.cleanup_workflow(release: dict) None[source]

Delete all files, folders and XComs associated with this release.

academic_observatory_workflows.ror_telescope.tasks.list_ror_records(conceptrecid: int, start_date: pendulum.DateTime, end_date: pendulum.DateTime, page_size: int = 10, timeout: float = 30.0) List[dict][source]

List all ROR Zenodo versions available between two dates.

Parameters:
  • conceptrecid – the Zendodo conceptrecid for ROR.

  • start_date – Start date of period to look into

  • end_date – End date of period to look into

  • page_size – the page size for the query.

  • timeout – the number of seconds to wait until timing out.

Returns:

the list of ROR Zenodo records with required variables stored as a dictionary.

academic_observatory_workflows.ror_telescope.tasks.is_lat_lng_valid(lat: Any, lng: Any) bool[source]

Validate whether a lat and lng are valid.

Parameters:
  • lat – the latitude.

  • lng – the longitude.

Returns:

whether the lat/long combination is valid

academic_observatory_workflows.ror_telescope.tasks.transform_ror(ror: List[Dict]) List[Dict][source]

Transform a ROR release.

Parameters:

ror – the ROR records.

Returns:

the transfromed records.