Scopus

“Scopus uniquely combines a comprehensive, curated abstract and citation database with enriched data and linked scholarly content.

Quickly find relevant and trusted research, identify experts, and access reliable data, metrics, and analytical tools for confident research strategy decisions – all from one database and one subscription.”

“SCOPUS is an Elsevier bibliometrics database containing abstracts, citations, of journals, books, and conference proceedings.” – SCOPUS website.

The telescope will look for connections conforming to the Airflow connection ID naming convention (below) and generate a subdag to handle the entire ETL pipeline for each institution.

Observatory Platform API

The telescope relies on the Observatory Platform API in order to create dags. A DAG will be created in Airflow for every scopus telescope returned by the API, i.e., one for each organisation.

The following fields need to be set in the extra field of the telescope:

  • airflow_connections which is a list of Airflow connection ID names with API keys set in the password field.

  • institution_ids which is a list of strings containing the institution IDs to search in SCOPUS, e.g., [’60031226’] for Curtin University.

  • earliest_date which is the earliest datetime to query.

  • view SCOPUS view, i.e., “STANDARD” or “COMPLETE”.

Storage location

The telescope saves the dataset to a Google BigQuery table with the project_id specified in the standard Airflow variable for the project ID. The dataset_id defaults to elsevier. The table_id is set to scopus<date suffix>.

Download throttling limits

The telescope downloads SCOPUS data in parallel sessions up to the number of API keys supplied. Each session observes the following throttling limits imposed by Elsevier:

  • API calls are rate limited to 2 call/s (Elsevier sets 2 call/s as their documented rate).

  • Number of results returned per call is capped at 25 (Elsevier limit).

  • Maximum number of results per query is 5000 (Elsevier limit).

Summary

Harvest Type

API

Harvest frequency

Default: @monthly

Runs on remote worker

Default: False

Catchup missed runs

Default: False

Table Write Disposition

Append

Dataset Update Frequency

Daily

Credentials Required

Yes

Uses Workflow Template

Snapshot

Each shard includes all data

Yes

Latest schema

name

type

mode

description

snapshot_date

DATE

NULLABLE

%E4Y-%m-%d currently keyed to the DAG execution date.

keywords

STRING

REPEATED

Author keywords

abstract

STRING

NULLABLE

Abstract

affiliations

RECORD

REPEATED

Affiliations

affiliations.id

STRING

NULLABLE

Affiliation ID

affiliations.name_variant

STRING

NULLABLE

Alternate affiliation name

affiliations.country

STRING

NULLABLE

Affiliation country

affiliations.city

STRING

NULLABLE

Affiliation city

affiliations.name

STRING

NULLABLE

Affiliation name

aggregation_type

STRING

NULLABLE

Source type

source_id

INTEGER

NULLABLE

Source ID

eid

STRING

NULLABLE

Electronic ID

pii

STRING

NULLABLE

Publisher item identifier

pubmed_id

INTEGER

NULLABLE

MEDLINE identifier

identifier

STRING

NULLABLE

SCOPUS ID

isbn

STRING

REPEATED

International standard book number

open_access_flag

BOOLEAN

NULLABLE

Open access flag

authors

STRING

REPEATED

List of authors

cover_date

DATE

NULLABLE

%E4Y-%m-%d Publication Date

open_access

INTEGER

NULLABLE

Open access status (appears to be 1 for yes, 0 for no)

doi

STRING

REPEATED

Document object identifier

publication_name

STRING

NULLABLE

Source Title

institution_ids

INTEGER

REPEATED

List of institution ids used for this query

creator

STRING

NULLABLE

First author

article_number

STRING

NULLABLE

Article number

title

STRING

NULLABLE

Article title

issn

STRING

REPEATED

International standard serial number

eissn

STRING

REPEATED

Electronic international standard serial number

orcid

STRING

NULLABLE

ORCID ID

subtype_description

STRING

NULLABLE

Document type description

fund_agency_name

STRING

NULLABLE

Funding agency name

fund_agency_id

STRING

NULLABLE

Funding agency id

fund_agency_ac

STRING

NULLABLE

Funding agency acronym

citedby_count

INTEGER

NULLABLE

Cited by count

External references