academic_observatory_workflows.workflows.geonames_telescope
Module Contents
Classes
A Telescope that harvests the GeoNames geographical database: https://www.geonames.org/ |
Functions
|
Fetch the Geonames release date. |
Attributes
- academic_observatory_workflows.workflows.geonames_telescope.DOWNLOAD_URL = 'https://download.geonames.org/export/dump/allCountries.zip'[source]
- class academic_observatory_workflows.workflows.geonames_telescope.GeonamesRelease(*, dag_id: str, run_id: str, snapshot_date: pendulum.DateTime)[source]
Bases:
observatory.platform.workflows.workflow.SnapshotRelease
- academic_observatory_workflows.workflows.geonames_telescope.fetch_snapshot_date() pendulum.DateTime[source]
Fetch the Geonames release date.
- Returns:
the release date.
- class academic_observatory_workflows.workflows.geonames_telescope.GeonamesTelescope(*, dag_id: str, cloud_workspace: observatory.platform.observatory_config.CloudWorkspace, bq_dataset_id: str = 'geonames', bq_table_name: str = 'geonames', api_dataset_id: str = 'geonames', schema_folder: str = os.path.join(default_schema_folder(), 'geonames'), dataset_description: str = 'The GeoNames geographical database: https://www.geonames.org/', table_description: str = 'The GeoNames geographical database: https://www.geonames.org/', observatory_api_conn_id: str = AirflowConns.OBSERVATORY_API, start_date: pendulum.DateTime = pendulum.datetime(2020, 9, 1), schedule: str = '@monthly')[source]
Bases:
observatory.platform.workflows.workflow.WorkflowA Telescope that harvests the GeoNames geographical database: https://www.geonames.org/
Saved to the BigQuery table: <project_id>.geonames.geonamesYYYYMMDD
- fetch_snapshot_date(**kwargs)[source]
Get the Geonames release for a given month and publishes the snapshot_date as an XCom.
- make_release(**kwargs) GeonamesRelease[source]
Creates a new GeonamesRelease instance
- Parameters:
kwargs – the context passed from the BranchPythonOperator. See
https://airflow.apache.org/docs/stable/macros-ref.html for a list of the keyword arguments that are passed to this argument. :return: GeonamesRelease
- download(release: GeonamesRelease, **kwargs)[source]
Downloads geonames dump file containing country data. The file is in zip format and will be extracted after downloading, saving the unzipped content.
- upload_downloaded(release: GeonamesRelease, **kwargs)[source]
Upload the Geonames data to Cloud Storage.
- extract(release: GeonamesRelease, **kwargs)[source]
Task to extract the Release release for a given month.
- transform(release: GeonamesRelease, **kwargs)[source]
Transforms release by storing file content in gzipped csv format.
- upload_transformed(release: GeonamesRelease, **kwargs) None[source]
Upload the transformed data to Cloud Storage.
- bq_load(release: GeonamesRelease, **kwargs) None[source]
Load the data into BigQuery.
- add_new_dataset_releases(release: GeonamesRelease, **kwargs) None[source]
Adds release information to API.
- cleanup(release: GeonamesRelease, **kwargs) None[source]
Delete all files, folders and XComs associated with this release.