academic_observatory_workflows.unpaywall_telescope.telescope
Classes
|
Functions
|
The Unpaywall Data Feed Telescope. |
Module Contents
- class academic_observatory_workflows.unpaywall_telescope.telescope.DagParams(dag_id: str, cloud_workspace: observatory_platform.airflow.workflow.CloudWorkspace, bq_dataset_id: str = 'unpaywall', bq_table_name: str = 'unpaywall', api_bq_dataset_id: str = 'dataset_api', schema_folder: str = project_path('unpaywall_telescope', 'schema'), dataset_description: str = 'Unpaywall Data Feed: https://unpaywall.org/products/data-feed', table_description: str = 'Unpaywall Data Feed: https://unpaywall.org/products/data-feed', primary_key: str = 'doi', unpaywall_base_url: str = 'https://api.unpaywall.org', snapshot_expiry_days: int = 7, http_header: str = None, unpaywall_conn_id: str = 'unpaywall', start_date: pendulum.DateTime = pendulum.datetime(2021, 7, 2), schedule: str = '@daily', max_active_runs: int = 1, retries: int = 3, test_run: bool = False, gke_volume_size: str = '1000Gi', gke_namespace: str = 'coki-astro', gke_volume_name: str = 'unpaywall', **kwargs)[source]
- Parameters:
dag_id – the id of the DAG.
cloud_workspace – the cloud workspace settings.
bq_dataset_id – the BigQuery dataset id.
bq_table_name – the BigQuery table name.
api_bq_dataset_id – the API dataset id.
schema_folder – the schema folder.
dataset_description – a description for the BigQuery dataset.
table_description – a description for the table.
primary_key – the primary key to use for merging changefiles.
unpaywall_base_url – The unpaywall api base url.
snapshot_expiry_days – the number of days to keep snapshots.
http_header – the http header to use when making requests to Unpaywall.
unpaywall_conn_id – Unpaywall connection key.
observatory_api_conn_id – the Observatory API connection key.
start_date – the start date of the DAG.
schedule – the schedule interval of the DAG.
max_active_runs – the maximum number of DAG runs that can be run at once.
retries – the number of times to retry a task.
gke_namespace – The cluster namespace to use.
gke_volume_name – The name of the persistent volume to create
gke_volume_size – The amount of storage to request for the persistent volume in GiB
kwargs – Takes kwargs for building a GkeParams object.