academic_observatory_workflows.scopus_telescope.telescope
Classes
|
Functions
|
Scopus telescope. |
Module Contents
- class academic_observatory_workflows.scopus_telescope.telescope.DagParams(*, dag_id: str, cloud_workspace: observatory_platform.airflow.workflow.CloudWorkspace, institution_ids: List[str], scopus_conn_ids: List[str], view: str = 'STANDARD', earliest_date: pendulum.DateTime = pendulum.datetime(1800, 1, 1), bq_dataset_id: str = 'scopus', bq_table_name: str = 'scopus', api_bq_dataset_id: str = 'dataset_api', schema_folder: str = project_path('scopus_telescope', 'schema'), dataset_description: str = 'The Scopus citation database: https://www.scopus.com', table_description: str = 'The Scopus citation database: https://www.scopus.com', start_date: pendulum.DateTime = pendulum.datetime(2018, 5, 14), schedule: str = '@monthly', max_active_runs: int = 1, retries: int = 3)[source]
- Parameters:
dag_id – the id of the DAG.
cloud_workspace – the cloud workspace settings.
institution_ids – list of institution IDs to use for the Scopus search query.
scopus_conn_ids – list of Scopus Airflow Connection IDs.
view – The view type. Standard or complete. See https://dev.elsevier.com/sc_search_views.html
earliest_date – earliest date to query for results.
bq_dataset_id – the BigQuery dataset id.
bq_table_name – the BigQuery table name.
api_bq_dataset_id – the Dataset ID to use when storing releases.
schema_folder – the SQL schema path.
dataset_description – description for the BigQuery dataset.
table_description – description for the BigQuery table.
observatory_api_conn_id – the Observatory API connection key.
start_date – the start date of the DAG.
schedule – the schedule interval of the DAG.
max_active_runs – the maximum number of DAG runs that can be run at once.
retries – the number of times to retry a task.