Web of Science

“The Web of Science is the information and technology provider for the global scientific research community. We provide data, analytics and insights, as well as workflow tools and bespoke professional services to researchers and the entire research community that underpins research – universities and research institutions, national and local governments, private and public research funding organizations, publishers and research-intensive corporations, across the world.” – Web of Science website.

Web of science, previously Web of knowledge, provides bibliometric information, including funding acknowledgements, international publication identifiers, and abstracts. - source: WOS and data details.

Observatory Platform API

The telescope relies on the Observatory Platform API in order to create dags. A DAG will be created in Airflow for every web_of_science telescope returned by the API, i.e., one for each organisation.

The following fields need to be set in the extra field of the telescope:

  • airflow_connection which is a list of Airflow connection ID names containing the login and password for accessing the Web of Science service.

  • institution_ids which is a list of strings containing the institution IDs to search in Web of Science, for example “Curtin University”.

  • earliest_date which is the earliest datetime to query.

Storage location

The telescope saves the dataset to a Google BigQuery table with the project_id specified in the standard Airflow variable for the project ID, the dataset as clarivate (unless overriden), and table id as web_of_science<date suffix>.

Download throttling limits

The telescope downloads results in parallel. Web of Science has imposed throttling limits for API access. The following limits are observed:

  • New session creation: 5 per 5-min period.

  • API calls: 2 calls/s

  • Returned results: 100 max per call.

  • Cited references: 100 max per article.

  • Max records retrievable in period: licence dependent. Unclear what Curtin’s limit is if any.

Summary

Harvest Type

API

Harvest frequency

Default: @monthly

Runs on remote worker

Default: False

Catchup missed runs

Default: False

Table Write Disposition

Append

Dataset Update Frequency

Daily

Credentials Required

Yes

Uses Workflow Template

Snapshot

Each shard includes all data

Yes

Latest schema

name

type

mode

description

categories

RECORD

NULLABLE

Category descriptions

categories.subjects

RECORD

REPEATED

Category subjects

categories.subjects.text

STRING

NULLABLE

Category name

categories.subjects.code

STRING

NULLABLE

Category code

categories.subjects.ascatype

STRING

NULLABLE

Defines the two collection of subject categories used to classify journals in Web of Knowledge

categories.subheadings

STRING

REPEATED

Category subheadings

categories.headings

STRING

REPEATED

Category headings

fund_ack

RECORD

NULLABLE

Funding acknowledgements

fund_ack.grants

RECORD

REPEATED

Grant information

fund_ack.grants.ids

STRING

REPEATED

Grant id

fund_ack.grants.agency

STRING

NULLABLE

Grant agency

fund_ack.text

STRING

REPEATED

Funding acknowledgement texts

identifiers

RECORD

NULLABLE

Document identifiers

identifiers.art_no

STRING

NULLABLE

identifiers.doi

STRING

NULLABLE

Digital object identifier

identifiers.eissn

STRING

NULLABLE

Electronic ISSN

identifiers.issn

STRING

NULLABLE

ISSN

identifiers.meeting_abs

STRING

NULLABLE

identifiers.xref_doi

STRING

NULLABLE

identifiers.isbn

STRING

NULLABLE

ISBN

identifiers.eisbn

STRING

NULLABLE

Electronic ISBN

identifiers.parent_book_doi

STRING

NULLABLE

identifiers.uid

STRING

NULLABLE

Web of Science UID

abstract

STRING

REPEATED

List of abstracts

conferences

RECORD

REPEATED

Information on the conference proceedings

conferences.name

STRING

NULLABLE

Conference name

conferences.id

INTEGER

NULLABLE

Conference id

ref_count

INTEGER

NULLABLE

Reference count

names

RECORD

REPEATED

Names associated with publication

names.full_name

STRING

NULLABLE

Full name

names.daisng_id

INTEGER

NULLABLE

WoS identifier from their entity disambiguation algorithm

names.orcid

STRING

NULLABLE

ORCID identifier

names.last_name

STRING

NULLABLE

Last name

names.wos_standard

STRING

NULLABLE

Surname followed by a comma and up to five initials

names.role

STRING

NULLABLE

Role of name, e.g., author

names.first_name

STRING

NULLABLE

First name

names.r_id

STRING

NULLABLE

ResearcherID identifier

names.seq_no

INTEGER

NULLABLE

Position the author appears in the publication

languages

RECORD

REPEATED

Languages used in the publication

languages.name

STRING

NULLABLE

Name of language

languages.type

STRING

NULLABLE

Type of role language plays, e.g., primary

title

STRING

NULLABLE

Title of publication

orgs

RECORD

REPEATED

Organisations affiliated with the publication (possibly through authors)

orgs.org_name

STRING

NULLABLE

Organisation name

orgs.country

STRING

NULLABLE

Country where organisation resides

orgs.state

STRING

NULLABLE

State where organisation resides

orgs.names

RECORD

REPEATED

Names associated with this organisation, e.g., authors

orgs.names.wos_standard

STRING

NULLABLE

Surname followed by a comma and up to five initials

orgs.names.full_name

STRING

NULLABLE

Full name

orgs.names.daisng_id

INTEGER

NULLABLE

WoS identifier from their entity disambiguation algorithm

orgs.names.last_name

STRING

NULLABLE

Last name

orgs.names.first_name

STRING

NULLABLE

First name

orgs.suborgs

STRING

REPEATED

Any relevant suborganisations of this organisation

orgs.city

STRING

NULLABLE

City where organisation resides

keywords

STRING

REPEATED

List of keywords and keywords plus (where available)

pub_info

RECORD

NULLABLE

Publication information summary

pub_info.publisher

STRING

NULLABLE

Name of publisher

pub_info.publisher_city

STRING

NULLABLE

City where publisher is

pub_info.doc_type

STRING

NULLABLE

Type of publication

pub_info.source

STRING

NULLABLE

Source publication, e.g., publishing journal

pub_info.pub_type

STRING

NULLABLE

Type of publication

pub_info.page_count

INTEGER

NULLABLE

Page count for this document

pub_info.sort_date

DATE

NULLABLE

%E4Y-%m-%d

snapshot_date

DATE

NULLABLE

%E4Y-%m-%d the date that the workflow harvested the data.

institution_ids

STRING

REPEATED

Institution IDs used to fetch record. This indicates the list of institutions fetched under same key in a OR query, e.g., OG=(Curtin University OR Other authorised institution)

External references