academic_observatory_workflows.workflows.ror_telescope

Module Contents

Classes

RorRelease

RorTelescope

The Research Organization Registry (ROR): https://ror.readme.io/

Functions

list_ror_records(→ List[dict])

List all ROR Zenodo versions available between two dates.

is_lat_lng_valid(→ bool)

Validate whether a lat and lng are valid.

transform_ror(→ List[Dict])

Transform a ROR release.

class academic_observatory_workflows.workflows.ror_telescope.RorRelease(*, dag_id: str, run_id: str, snapshot_date: pendulum.DateTime, url: str, checksum: str)[source]

Bases: observatory.platform.workflows.workflow.SnapshotRelease

class academic_observatory_workflows.workflows.ror_telescope.RorTelescope(*, dag_id: str, cloud_workspace: observatory.platform.observatory_config.CloudWorkspace, bq_dataset_id: str = 'ror', bq_table_name: str = 'ror', api_dataset_id: str = 'ror', schema_folder: str = os.path.join(default_schema_folder(), 'ror'), dataset_description: str = 'The Research Organization Registry (ROR) database: https://ror.org/', table_description: str = 'The Research Organization Registry (ROR) database: https://ror.org/', observatory_api_conn_id: str = AirflowConns.OBSERVATORY_API, ror_conceptrecid: int = 6347574, start_date: pendulum.DateTime = pendulum.datetime(2021, 9, 1), schedule: str = '@weekly', catchup: bool = True)[source]

Bases: observatory.platform.workflows.workflow.Workflow

The Research Organization Registry (ROR): https://ror.readme.io/

Saved to the BigQuery table: <project_id>.ror.rorYYYYMMDD

make_release(**kwargs) List[RorRelease][source]

Make release instances. The release is passed as an argument to the function (TelescopeFunction) that is called in ‘task_callable’.

Parameters:

kwargs – the context passed from the PythonOperator. See

https://airflow.apache.org/docs/stable/macros-ref.html for a list of the keyword arguments that are passed to this argument. :return: A list of ROR release instances

list_releases(**kwargs)[source]

Lists all ROR records and publishes their url, snapshot_date and checksum as an XCom.

download(releases: List[RorRelease], **kwargs)[source]

Task to download the ROR releases.

upload_downloaded(releases: List[RorRelease], **kwargs)[source]

Upload the Geonames data to Cloud Storage.

extract(releases: List[RorRelease], **kwargs)[source]

Task to extract the ROR releases.

transform(releases: List[RorRelease], **kwargs)[source]

Task to transform the ROR releases.

upload_transformed(releases: List[RorRelease], **kwargs) None[source]

Upload the transformed data to Cloud Storage.

bq_load(releases: List[RorRelease], **kwargs) None[source]

Load the data into BigQuery.

add_new_dataset_releases(releases: List[RorRelease], **kwargs) None[source]

Adds release information to API.

cleanup(releases: List[RorRelease], **kwargs) None[source]

Delete all files, folders and XComs associated with this release.

academic_observatory_workflows.workflows.ror_telescope.list_ror_records(conceptrecid: int, start_date: pendulum.DateTime, end_date: pendulum.DateTime, page_size: int = 10, timeout: float = 30.0) List[dict][source]

List all ROR Zenodo versions available between two dates.

Parameters:
  • conceptrecid – the Zendodo conceptrecid for ROR.

  • start_date – Start date of period to look into

  • end_date – End date of period to look into

  • page_size – the page size for the query.

  • timeout – the number of seconds to wait until timing out.

Returns:

the list of ROR Zenodo records with required variables stored as a dictionary.

academic_observatory_workflows.workflows.ror_telescope.is_lat_lng_valid(lat: Any, lng: Any) bool[source]

Validate whether a lat and lng are valid.

Parameters:
  • lat – the latitude.

  • lng – the longitude.

Returns:

academic_observatory_workflows.workflows.ror_telescope.transform_ror(ror: List[Dict]) List[Dict][source]

Transform a ROR release.

Parameters:

ror – the ROR records.

Returns:

the transfromed records.