academic_observatory_workflows.workflows.geonames_telescope

Module Contents

Classes

GeonamesRelease

GeonamesTelescope

A Telescope that harvests the GeoNames geographical database: https://www.geonames.org/

Functions

fetch_snapshot_date(→ pendulum.DateTime)

Fetch the Geonames release date.

Attributes

DOWNLOAD_URL

academic_observatory_workflows.workflows.geonames_telescope.DOWNLOAD_URL = 'https://download.geonames.org/export/dump/allCountries.zip'[source]
class academic_observatory_workflows.workflows.geonames_telescope.GeonamesRelease(*, dag_id: str, run_id: str, snapshot_date: pendulum.DateTime)[source]

Bases: observatory.platform.workflows.workflow.SnapshotRelease

academic_observatory_workflows.workflows.geonames_telescope.fetch_snapshot_date() pendulum.DateTime[source]

Fetch the Geonames release date.

Returns:

the release date.

class academic_observatory_workflows.workflows.geonames_telescope.GeonamesTelescope(*, dag_id: str, cloud_workspace: observatory.platform.observatory_config.CloudWorkspace, bq_dataset_id: str = 'geonames', bq_table_name: str = 'geonames', api_dataset_id: str = 'geonames', schema_folder: str = os.path.join(default_schema_folder(), 'geonames'), dataset_description: str = 'The GeoNames geographical database: https://www.geonames.org/', table_description: str = 'The GeoNames geographical database: https://www.geonames.org/', observatory_api_conn_id: str = AirflowConns.OBSERVATORY_API, start_date: pendulum.DateTime = pendulum.datetime(2020, 9, 1), schedule: str = '@monthly')[source]

Bases: observatory.platform.workflows.workflow.Workflow

A Telescope that harvests the GeoNames geographical database: https://www.geonames.org/

Saved to the BigQuery table: <project_id>.geonames.geonamesYYYYMMDD

fetch_snapshot_date(**kwargs)[source]

Get the Geonames release for a given month and publishes the snapshot_date as an XCom.

make_release(**kwargs) GeonamesRelease[source]

Creates a new GeonamesRelease instance

Parameters:

kwargs – the context passed from the BranchPythonOperator. See

https://airflow.apache.org/docs/stable/macros-ref.html for a list of the keyword arguments that are passed to this argument. :return: GeonamesRelease

download(release: GeonamesRelease, **kwargs)[source]

Downloads geonames dump file containing country data. The file is in zip format and will be extracted after downloading, saving the unzipped content.

upload_downloaded(release: GeonamesRelease, **kwargs)[source]

Upload the Geonames data to Cloud Storage.

extract(release: GeonamesRelease, **kwargs)[source]

Task to extract the Release release for a given month.

transform(release: GeonamesRelease, **kwargs)[source]

Transforms release by storing file content in gzipped csv format.

upload_transformed(release: GeonamesRelease, **kwargs) None[source]

Upload the transformed data to Cloud Storage.

bq_load(release: GeonamesRelease, **kwargs) None[source]

Load the data into BigQuery.

add_new_dataset_releases(release: GeonamesRelease, **kwargs) None[source]

Adds release information to API.

cleanup(release: GeonamesRelease, **kwargs) None[source]

Delete all files, folders and XComs associated with this release.