academic_observatory_workflows.unpaywall_telescope.telescope ============================================================ .. py:module:: academic_observatory_workflows.unpaywall_telescope.telescope Classes ------- .. autoapisummary:: academic_observatory_workflows.unpaywall_telescope.telescope.DagParams Functions --------- .. autoapisummary:: academic_observatory_workflows.unpaywall_telescope.telescope.create_dag Module Contents --------------- .. py:class:: DagParams(dag_id: str, cloud_workspace: observatory_platform.airflow.workflow.CloudWorkspace, bq_dataset_id: str = 'unpaywall', bq_table_name: str = 'unpaywall', api_bq_dataset_id: str = 'dataset_api', schema_folder: str = project_path('unpaywall_telescope', 'schema'), dataset_description: str = 'Unpaywall Data Feed: https://unpaywall.org/products/data-feed', table_description: str = 'Unpaywall Data Feed: https://unpaywall.org/products/data-feed', primary_key: str = 'doi', unpaywall_base_url: str = 'https://api.unpaywall.org', snapshot_expiry_days: int = 7, http_header: str = None, unpaywall_conn_id: str = 'unpaywall', start_date: pendulum.DateTime = pendulum.datetime(2021, 7, 2), schedule: str = '@daily', max_active_runs: int = 1, retries: int = 3, test_run: bool = False, gke_volume_size: str = '1000Gi', gke_namespace: str = 'coki-astro', gke_volume_name: str = 'unpaywall', **kwargs) :param dag_id: the id of the DAG. :param cloud_workspace: the cloud workspace settings. :param bq_dataset_id: the BigQuery dataset id. :param bq_table_name: the BigQuery table name. :param api_bq_dataset_id: the API dataset id. :param schema_folder: the schema folder. :param dataset_description: a description for the BigQuery dataset. :param table_description: a description for the table. :param primary_key: the primary key to use for merging changefiles. :param unpaywall_base_url: The unpaywall api base url. :param snapshot_expiry_days: the number of days to keep snapshots. :param http_header: the http header to use when making requests to Unpaywall. :param unpaywall_conn_id: Unpaywall connection key. :param observatory_api_conn_id: the Observatory API connection key. :param start_date: the start date of the DAG. :param schedule: the schedule interval of the DAG. :param max_active_runs: the maximum number of DAG runs that can be run at once. :param retries: the number of times to retry a task. :param gke_namespace: The cluster namespace to use. :param gke_volume_name: The name of the persistent volume to create :param gke_volume_size: The amount of storage to request for the persistent volume in GiB :param kwargs: Takes kwargs for building a GkeParams object. .. py:attribute:: dag_id .. py:attribute:: cloud_workspace .. py:attribute:: bq_dataset_id :value: 'unpaywall' .. py:attribute:: bq_table_name :value: 'unpaywall' .. py:attribute:: api_bq_dataset_id :value: 'dataset_api' .. py:attribute:: schema_folder .. py:attribute:: schema_file_path .. py:attribute:: dataset_description :value: 'Unpaywall Data Feed: https://unpaywall.org/products/data-feed' .. py:attribute:: table_description :value: 'Unpaywall Data Feed: https://unpaywall.org/products/data-feed' .. py:attribute:: primary_key :value: 'doi' .. py:attribute:: unpaywall_base_url :value: 'https://api.unpaywall.org' .. py:attribute:: snapshot_expiry_days :value: 7 .. py:attribute:: http_header :value: None .. py:attribute:: unpaywall_conn_id :value: 'unpaywall' .. py:attribute:: start_date .. py:attribute:: schedule :value: '@daily' .. py:attribute:: max_active_runs :value: 1 .. py:attribute:: retries :value: 3 .. py:attribute:: test_run :value: False .. py:attribute:: gke_volume_size :value: '1000Gi' .. py:attribute:: gke_namespace :value: 'coki-astro' .. py:attribute:: gke_volume_name :value: 'unpaywall' .. py:attribute:: gke_params .. py:function:: create_dag(dag_params: DagParams) -> airflow.DAG The Unpaywall Data Feed Telescope.