academic_observatory_workflows.ror_telescope.ror_telescope

Module Contents

Classes

RorRelease

Functions

create_dag(, dataset_description, table_description, ...)

Construct a RorTelescope instance.

list_ror_records(→ List[dict])

List all ROR Zenodo versions available between two dates.

is_lat_lng_valid(→ bool)

Validate whether a lat and lng are valid.

transform_ror(→ List[Dict])

Transform a ROR release.

class academic_observatory_workflows.ror_telescope.ror_telescope.RorRelease(*, dag_id: str, run_id: str, snapshot_date: pendulum.DateTime, url: str, checksum: str, cloud_workspace: observatory.platform.observatory_config.CloudWorkspace)[source]

Bases: observatory.platform.workflows.workflow.SnapshotRelease

property download_blob_name[source]
property transform_blob_name[source]
property download_uri[source]
property transform_uri[source]
to_dict() Dict[source]
static from_dict(dict_: Dict) RorRelease[source]
academic_observatory_workflows.ror_telescope.ror_telescope.create_dag(*, dag_id: str, cloud_workspace: observatory.platform.observatory_config.CloudWorkspace, bq_dataset_id: str = 'ror', bq_table_name: str = 'ror', api_dataset_id: str = 'ror', schema_folder: str = project_path('ror_telescope', 'schema'), dataset_description: str = 'The Research Organization Registry (ROR) database: https://ror.org/', table_description: str = 'The Research Organization Registry (ROR) database: https://ror.org/', observatory_api_conn_id: str = AirflowConns.OBSERVATORY_API, ror_conceptrecid: int = 6347574, start_date: pendulum.DateTime = pendulum.datetime(2021, 9, 1), schedule: str = '@weekly', catchup: bool = True, max_active_runs: int = 1, retries: int = 3) airflow.DAG[source]

Construct a RorTelescope instance.

Parameters:
  • dag_id – the id of the DAG.

  • cloud_workspace – the cloud workspace settings.

  • bq_dataset_id – the BigQuery dataset id.

  • bq_table_name – the BigQuery table name.

  • api_dataset_id – the Dataset ID to use when storing releases.

  • schema_folder – the SQL schema path.

  • dataset_description – description for the BigQuery dataset.

  • table_description – description for the BigQuery table.

  • observatory_api_conn_id – the Observatory API connection key.

  • ror_conceptrecid – the Zenodo conceptrecid for the ROR dataset.

  • start_date – the start date of the DAG.

  • schedule – the schedule interval of the DAG.

  • catchup – whether to catchup the DAG or not.

  • max_active_runs – the maximum number of DAG runs that can be run at once.

  • retries – the number of times to retry a task.

academic_observatory_workflows.ror_telescope.ror_telescope.list_ror_records(conceptrecid: int, start_date: pendulum.DateTime, end_date: pendulum.DateTime, page_size: int = 10, timeout: float = 30.0) List[dict][source]

List all ROR Zenodo versions available between two dates.

Parameters:
  • conceptrecid – the Zendodo conceptrecid for ROR.

  • start_date – Start date of period to look into

  • end_date – End date of period to look into

  • page_size – the page size for the query.

  • timeout – the number of seconds to wait until timing out.

Returns:

the list of ROR Zenodo records with required variables stored as a dictionary.

academic_observatory_workflows.ror_telescope.ror_telescope.is_lat_lng_valid(lat: Any, lng: Any) bool[source]

Validate whether a lat and lng are valid.

Parameters:
  • lat – the latitude.

  • lng – the longitude.

Returns:

academic_observatory_workflows.ror_telescope.ror_telescope.transform_ror(ror: List[Dict]) List[Dict][source]

Transform a ROR release.

Parameters:

ror – the ROR records.

Returns:

the transfromed records.