academic_observatory_workflows.unpaywall_telescope.unpaywall_telescope
Module Contents
Classes
Functions
|
The Unpaywall Data Feed Telescope. |
|
Snapshot URL |
|
Get the Unpaywall snapshot filename. |
|
Data Feed URL |
|
|
|
|
|
Parses a release date from a file name. |
Attributes
- academic_observatory_workflows.unpaywall_telescope.unpaywall_telescope.SNAPSHOT_URL = 'https://api.unpaywall.org/feed/snapshot'[source]
- academic_observatory_workflows.unpaywall_telescope.unpaywall_telescope.CHANGEFILES_URL = 'https://api.unpaywall.org/feed/changefiles'[source]
- academic_observatory_workflows.unpaywall_telescope.unpaywall_telescope.CHANGEFILES_DOWNLOAD_URL = 'https://api.unpaywall.org/daily-feed/changefile'[source]
- class academic_observatory_workflows.unpaywall_telescope.unpaywall_telescope.Changefile(filename: str, changefile_date: pendulum.DateTime, changefile_release: observatory.platform.workflows.workflow.ChangefileRelease = None)[source]
-
- static from_dict(dict_: Dict) Changefile [source]
- class academic_observatory_workflows.unpaywall_telescope.unpaywall_telescope.UnpaywallRelease(*, dag_id: str, run_id: str, cloud_workspace: observatory.platform.observatory_config.CloudWorkspace, bq_dataset_id: str, bq_table_name: str, is_first_run: bool, snapshot_date: pendulum.DateTime, changefiles: List[Changefile], prev_end_date: pendulum.DateTime)[source]
Bases:
observatory.platform.workflows.workflow.Release
- static from_dict(dict_: dict) UnpaywallRelease [source]
- academic_observatory_workflows.unpaywall_telescope.unpaywall_telescope.create_dag(*, dag_id: str, cloud_workspace: observatory.platform.observatory_config.CloudWorkspace, bq_dataset_id: str = 'unpaywall', bq_table_name: str = 'unpaywall', api_dataset_id: str = 'unpaywall', schema_folder: str = project_path('unpaywall_telescope', 'schema'), dataset_description: str = 'Unpaywall Data Feed: https://unpaywall.org/products/data-feed', table_description: str = 'Unpaywall Data Feed: https://unpaywall.org/products/data-feed', primary_key: str = 'doi', snapshot_expiry_days: int = 7, http_header: str = None, unpaywall_conn_id: str = 'unpaywall', observatory_api_conn_id: str = AirflowConns.OBSERVATORY_API, start_date: pendulum.DateTime = pendulum.datetime(2021, 7, 2), schedule: str = '@daily', max_active_runs: int = 1, retries: int = 3) airflow.DAG [source]
The Unpaywall Data Feed Telescope.
- Parameters:
dag_id – the id of the DAG.
cloud_workspace – the cloud workspace settings.
bq_dataset_id – the BigQuery dataset id.
bq_table_name – the BigQuery table name.
api_dataset_id – the API dataset id.
schema_folder – the schema folder.
dataset_description – a description for the BigQuery dataset.
table_description – a description for the table.
primary_key – the primary key to use for merging changefiles.
snapshot_expiry_days – the number of days to keep snapshots.
http_header – the http header to use when making requests to Unpaywall.
unpaywall_conn_id – Unpaywall connection key.
observatory_api_conn_id – the Observatory API connection key.
start_date – the start date of the DAG.
schedule – the schedule interval of the DAG.
max_active_runs – the maximum number of DAG runs that can be run at once.
retries – the number of times to retry a task.
- academic_observatory_workflows.unpaywall_telescope.unpaywall_telescope.snapshot_url(api_key: str) str [source]
Snapshot URL
- academic_observatory_workflows.unpaywall_telescope.unpaywall_telescope.get_snapshot_file_name(api_key: str) str [source]
Get the Unpaywall snapshot filename.
- Returns:
Snapshot file date.
- academic_observatory_workflows.unpaywall_telescope.unpaywall_telescope.changefiles_url(api_key: str) str [source]
Data Feed URL
- academic_observatory_workflows.unpaywall_telescope.unpaywall_telescope.changefile_download_url(filename: str, api_key: str)[source]
- academic_observatory_workflows.unpaywall_telescope.unpaywall_telescope.get_unpaywall_changefiles(api_key: str) List[Changefile] [source]