OpenAlex
OpenAlex is a fully open catalog of the global research system. It’s named after the ancient Library of Alexandria.
The OpenAlex dataset describes scholarly entities and how those entities are connected to each other. There are five types of entities:
Works are papers, books, datasets, etc; they cite other works
Authors are people who create works
Venues are journals and repositories that host works
Institutions are universities and other orgs that are affiliated with works (via authors)
Concepts tag Works with a topic
Together, these make a huge web (or more technically, heterogeneous directed graph) of hundreds of millions of entities and over a billion connections between them all.
See https://docs.openalex.org/ for more information.
This telescope transfers OpenAlex data from an AWS S3 bucket and loads it into multiple tables in BigQuery, with one
table for each entity (Works, Authors, Venues, Institutions, Concepts).
The first run will process all the files that are available in the S3 bucket.
A manifest file is used for later runs to keep track of which files have changed since the last run.
Only the files that have changed will then be processed in this telescope.
The data for the Authors and Venues entities do not require any transformations before loading into BigQuery. This means that the files for these entities are directly transferred to the transform bucket.
The other entities do require some transformation and those files are transferred to the download bucket. After transforming the data the resulting files are then uploaded to the transform bucket.
The transformation that is required has to do with two fields that have nested fields with dynamic field names. These make it impossible to create a schema beforehand and upload the data straight into BigQuery. The two mentioned fields are ‘abstract_inverted_index’ (present in Work entity only) and ‘international’ (present in Concept and Institute entities).
As a workaround, these fields are transformed into a RECORD of two arrays of the same length. The first array contains all the original field names and the second array the corresponding values.
Summary |
|
---|---|
Average runtime |
12-24h |
Average download size |
>100GB |
Harvest Type |
AWS transfer |
Workflow Update Frequency |
Weekly |
Runs on remote worker |
True |
Catchup missed runs |
False |
Table Write Disposition |
Append |
Provider Update Frequency |
Weekly |
Credentials Required |
No |
Uses Workflow Template |
Stream |
Each shard includes all data |
No |
Using the transfer service
The files in the AWS bucket are transferred to a separate Google Cloud storage bucket using the storage transfer service. To use the transfer service it is required to enable the Storage Transfer API and to set the correct permissions on the Google Cloud Storage bucket as well as the AWS bucket.
Enabling the Storage Transfer API
The API should already be enabled from the Terraform set-up. If this is not the case, see the google support answer for info on how to enable an API. Search for the Storage Transfer API and enable this.
Setting permissions on Google Cloud bucket
The data is transferred to the standard download bucket and the following permissions are required on this Google Cloud bucket for the transfer service to work:
storage.buckets.get
storage.objects.list
storage.objects.get
storage.objects.create
The roles/storage.objectViewer and roles/storage.legacyBucketWriter roles together contain the permissions that are always required. These roles or permissions need to be assigned at the specific bucket to the service account performing the transfer.
The Storage Transfer Service uses the project-[$PROJECT_NUMBER]@storage-transfer-service.iam.gserviceaccount.com
service account.
Setting permissions on AWS bucket
The AWS bucket is managed by OpenAlex, the bucket that is used is s3://openalex
.
The data in this bucket is publicly available and there aren’t any permissions required to download or inspect the
data using the AWS s3 CLI.
However, the transfer service in GCP does require permissions to transfer the data, so it is required to create a user from the AWS console with programmatic access (using a key id and secret key).
The key id and secret access key that are created can then be used for the Airflow connection that is described below.
The required policy that needs to be assigned to this user is:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::openalex"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::openalex/*"
]
}
]
}
Airflow connections
Note that all values need to be urlencoded. In the config.yaml file, the following airflow connections are required:
openalex
This connection contains the AWS access key id and secret access key that are used to access data in the AWS buckets. Make sure to URL encode each of the fields ‘access_key_id’ and ‘secret_access_key’.
openalex: aws://<access_key_id>:<secret_access_key>@
Latest schema
Concept
name |
type |
mode |
description |
---|---|---|---|
ancestors |
RECORD |
REPEATED |
List of concepts that this concept descends from, as dehydrated Concept objects. See the concept tree section for more details on how the different layers of concepts work together. |
ancestors.display_name |
STRING |
NULLABLE |
The English-language label of the concept. |
ancestors.id |
STRING |
NULLABLE |
The OpenAlex ID for this concept. |
ancestors.level |
INTEGER |
NULLABLE |
The level in the concept tree where this concept lives. |
ancestors.wikidata |
STRING |
NULLABLE |
The Wikidata ID for this concept. |
cited_by_count |
INTEGER |
NULLABLE |
The number citations to works that have been tagged with this concept. Or less formally: the number of citations to this concept. For example, if there are just two works tagged with this concept and one of them has been cited 10 times, and the other has been cited 1 time, cited_by_count for this concept would be 11. |
counts_by_year |
RECORD |
REPEATED |
The values of works_count and cited_by_count for each of the last ten years, binned by year. To put it another way: for every listed year, you can see how many new works were tagged with this concept, and how many times any work tagged with this concept got cited. |
counts_by_year.cited_by_count |
INTEGER |
NULLABLE |
The number citations to works that have been tagged with this concept. Or less formally: the number of citations to this concept. For example, if there are just two works tagged with this concept and one of them has been cited 10 times, and the other has been cited 1 time, cited_by_count for this concept would be 11. |
counts_by_year.oa_works_count |
INTEGER |
NULLABLE |
|
counts_by_year.works_count |
INTEGER |
NULLABLE |
The number of works tagged with this concept. |
counts_by_year.year |
INTEGER |
NULLABLE |
The year. |
created_date |
DATE |
NULLABLE |
The date this Concept object was created in the OpenAlex dataset, expressed as an ISO 8601 date string. |
description |
STRING |
NULLABLE |
A brief description of this concept. |
display_name |
STRING |
NULLABLE |
The English-language label of the concept. |
id |
STRING |
NULLABLE |
The OpenAlex ID for this concept. |
ids |
RECORD |
NULLABLE |
All the persistent identifiers (PIDs) that we know about for this venue, as key: value pairs, where key is the PID namespace, and value is the PID. IDs are expressed as URIs where possible. umls_aui and umls_cui refer to the Unified Medical Language System Atom Unique Identifier and Concept Unique Identifier respectively. These are lists. The other IDs are all strings, except except for mag, which is a long integer. |
ids.mag |
INTEGER |
NULLABLE |
this concept’s Microsoft Academic Graph ID |
ids.openalex |
STRING |
NULLABLE |
this concept’s OpenAlex ID. Same as Concept.id |
ids.umls_aui |
STRING |
REPEATED |
this concept’s Unified Medical Language System Atom Unique Identifiers |
ids.umls_cui |
STRING |
REPEATED |
this concept’s Unified Medical Language System Concept Unique Identifiers |
ids.wikidata |
STRING |
NULLABLE |
this concept’s Wikidata ID. Same as Concept.wikidata |
ids.wikipedia |
STRING |
NULLABLE |
this concept’s Wikipedia page URL |
image_thumbnail_url |
STRING |
NULLABLE |
Same as image_url, but it’s a smaller image. |
image_url |
STRING |
NULLABLE |
URL where you can get an image representing this concept, where available. Usually this is hosted on Wikipedia. |
international |
RECORD |
NULLABLE |
Translation of the display_name and description into multiple languages. |
international.description |
RECORD |
NULLABLE |
This concept’s description in many languages, derived from article titles on each language’s wikipedia. |
international.description.keys |
STRING |
REPEATED |
The language codes in wikidata language code format. |
international.description.values |
STRING |
REPEATED |
The translated descriptions in each language. |
international.display_name |
RECORD |
NULLABLE |
This concept’s display name in many languages, derived from article titles on each language’s wikipedia. |
international.display_name.keys |
STRING |
REPEATED |
The language codes in wikidata language code format. |
international.display_name.values |
STRING |
REPEATED |
The translated display_names in each language. |
level |
INTEGER |
NULLABLE |
The level in the concept tree where this concept lives. Lower-level concepts are more general, and higher-level concepts are more specific. Computer Science has a level of 0; Java Bytecode has a level of 5. Level 0 concepts have no ancestors and level 5 concepts have no descendants. |
related_concepts |
RECORD |
REPEATED |
Concepts that are similar to this one. Each listed concept is a dehydrated Concept object, with one additional attribute |
related_concepts.display_name |
STRING |
NULLABLE |
The English-language label of the concept. |
related_concepts.id |
STRING |
NULLABLE |
The OpenAlex ID for this concept. |
related_concepts.level |
INTEGER |
NULLABLE |
The level in the concept tree where this concept lives. |
related_concepts.score |
FLOAT |
NULLABLE |
The strength of association between this concept and the listed concept, on a scale of 0-100. |
related_concepts.wikidata |
STRING |
NULLABLE |
The Wikidata ID for this concept. |
summary_stats |
RECORD |
NULLABLE |
Citation metrics for this concept. |
summary_stats.2yr_cited_by_count |
INTEGER |
NULLABLE |
|
summary_stats.2yr_h_index |
INTEGER |
NULLABLE |
|
summary_stats.2yr_i10_index |
INTEGER |
NULLABLE |
|
summary_stats.2yr_mean_citedness |
FLOAT |
NULLABLE |
|
summary_stats.2yr_works_count |
INTEGER |
NULLABLE |
|
summary_stats.cited_by_count |
INTEGER |
NULLABLE |
|
summary_stats.h_index |
INTEGER |
NULLABLE |
|
summary_stats.i10_index |
INTEGER |
NULLABLE |
|
summary_stats.oa_percent |
FLOAT |
NULLABLE |
|
summary_stats.works_count |
INTEGER |
NULLABLE |
|
updated_date |
TIMESTAMP |
NULLABLE |
The last time anything in this concept object changed, expressed as an ISO 8601 date string. This date is updated for any change at all, including increases in various counts. |
wikidata |
STRING |
NULLABLE |
The Wikidata ID for this concept. This is the Canonical External ID for concepts. |
works_api_url |
STRING |
NULLABLE |
An URL that will get you a list of all the works tagged with this concept. We express this as an API URL (instead of just listing the works themselves) because there might be millions of works tagged with this concept, and that’s too many to fit here. |
works_count |
INTEGER |
NULLABLE |
The number of works tagged with this concept. |
Institution
name |
type |
mode |
description |
---|---|---|---|
associated_institutions |
RECORD |
REPEATED |
|
associated_institutions.country_code |
STRING |
NULLABLE |
The country where this institution is located, represented as an ISO two-letter country code. |
associated_institutions.display_name |
STRING |
NULLABLE |
The primary name of the institution. |
associated_institutions.id |
STRING |
NULLABLE |
The OpenAlex ID for this institution. |
associated_institutions.relationship |
STRING |
NULLABLE |
The type of relationship between this institution and the listed institution. Possible values: parent, child, and related. |
associated_institutions.ror |
STRING |
NULLABLE |
The ROR ID for this institution. The ROR (Research Organization Registry) identifier is a globally unique ID for research organization. ROR is the successor to GRiD, which is no longer being updated. |
associated_institutions.type |
STRING |
NULLABLE |
The institution’s primary type, using the ROR “type” controlled vocabulary. Possible values are: Education, Healthcare, Company, Archive, Nonprofit, Government, Facility, and Other. |
cited_by_count |
INTEGER |
NULLABLE |
The total number Works that cite a work created by an author affiliated with this institution. Or less formally: the number of citations this institution has collected. |
country_code |
STRING |
NULLABLE |
The country where this institution is located, represented as an ISO two-letter country code. |
counts_by_year |
RECORD |
REPEATED |
works_count and cited_by_count for each of the last ten years, binned by year. To put it another way: each year, you can see how many new works this venue started hosting, and how many times any work in this venue got cited. |
counts_by_year.cited_by_count |
INTEGER |
NULLABLE |
The total number Works that cite a work created by an author affiliated with this institution. Or less formally: the number of citations this institution has collected. |
counts_by_year.oa_works_count |
INTEGER |
NULLABLE |
|
counts_by_year.works_count |
INTEGER |
NULLABLE |
The number of Works created by authors affiliated with this institution. Or less formally: the number of works coming out of this institution. |
counts_by_year.year |
INTEGER |
NULLABLE |
The year. |
created_date |
DATE |
NULLABLE |
The date this Institution object was created in the OpenAlex dataset, expressed as an ISO 8601 date string. |
display_name |
STRING |
NULLABLE |
The primary name of the institution. |
display_name_acronyms |
STRING |
REPEATED |
Acronyms or initialisms that people sometimes use instead of the full display_name. |
display_name_alternatives |
STRING |
REPEATED |
Other names people may use for this institution. |
geo |
RECORD |
NULLABLE |
A bunch of stuff we know about the location of this institution |
geo.city |
STRING |
NULLABLE |
The city where this institution lives. |
geo.country |
STRING |
NULLABLE |
The country where this institution lives. |
geo.country_code |
STRING |
NULLABLE |
The country where this institution lives, represented as an ISO two-letter country code. |
geo.geonames_city_id |
STRING |
NULLABLE |
The city where this institution lives, as a GeoNames database ID. |
geo.latitude |
FLOAT |
NULLABLE |
Does what it says. |
geo.longitude |
FLOAT |
NULLABLE |
Does what it says. |
geo.region |
STRING |
NULLABLE |
The sub-national region (state, province) where this institution lives. |
homepage_url |
STRING |
NULLABLE |
The URL for institution’s primary homepage |
id |
STRING |
NULLABLE |
The OpenAlex ID for this institution. |
ids |
RECORD |
NULLABLE |
All the persistent identifiers (PIDs) that we know about for this institution, as key: value pairs, where key is the PID namespace, and value is the PID. IDs are expressed as URIs where possible. They’re all strings except for mag, which is a long integer. |
ids.grid |
STRING |
NULLABLE |
this institution’s GRID ID |
ids.mag |
INTEGER |
NULLABLE |
this institution’s Microsoft Academic Graph ID |
ids.openalex |
STRING |
NULLABLE |
this institution’s OpenAlex ID. Same as Institution.id |
ids.ror |
STRING |
NULLABLE |
this institution’s ROR ID. Same as Institution.ror |
ids.wikidata |
STRING |
NULLABLE |
this institution’s Wikidata ID |
ids.wikipedia |
STRING |
NULLABLE |
this institution’s Wikipedia page URL |
image_thumbnail_url |
STRING |
NULLABLE |
Same as image_url, but it’s a smaller image. |
image_url |
STRING |
NULLABLE |
URL where you can get an image representing this institution. Usually this is hosted on Wikipedia, and usually it’s a seal or logo. |
international |
RECORD |
NULLABLE |
Translation of the display_name and description into multiple languages. |
international.display_name |
RECORD |
NULLABLE |
The institution’s display name in different languages. Derived from the wikipedia page for the institution in the given language. |
international.display_name.keys |
STRING |
REPEATED |
The language codes in wikidata language code format. |
international.display_name.values |
STRING |
REPEATED |
The translated display_names in each language. |
lineage |
STRING |
REPEATED |
OpenAlex IDs of institutions. The list will include this institution’s ID, as well as any parent institutions. If this institution has no parent institutions, this list will only contain its own ID. |
repositories |
RECORD |
REPEATED |
Repositories (Sources with type: repository) that have this institution as their host_organization |
repositories.display_name |
STRING |
NULLABLE |
The repositories display name. |
repositories.host_organization |
STRING |
NULLABLE |
The OpenAlex ID of the host organisation. |
repositories.host_organization_lineage |
STRING |
REPEATED |
OpenAlex IDs — See Publisher.lineage. This will only be included if the host_organization is a publisher (and not if the host_organization is an institution). |
repositories.host_organization_lineage_names |
STRING |
REPEATED |
The names of the organisations in host_organization_lineage. |
repositories.host_organization_name |
STRING |
NULLABLE |
The display_name from the host_organization, shown for convenience. |
repositories.id |
STRING |
NULLABLE |
The OpenAlex ID of the repository. |
repositories.issn |
STRING |
REPEATED |
|
repositories.issn_l |
STRING |
NULLABLE |
|
repositories.publisher |
STRING |
NULLABLE |
|
repositories.publisher_id |
STRING |
NULLABLE |
|
repositories.type |
STRING |
NULLABLE |
|
roles |
RECORD |
REPEATED |
|
roles.id |
STRING |
NULLABLE |
|
roles.role |
STRING |
NULLABLE |
|
roles.works_count |
INTEGER |
NULLABLE |
|
ror |
STRING |
NULLABLE |
The ROR ID for this institution. The ROR (Research Organization Registry) identifier is a globally unique ID for research organization. ROR is the successor to GRiD, which is no longer being updated. |
summary_stats |
RECORD |
NULLABLE |
Citation metrics for this institutions. |
summary_stats.2yr_cited_by_count |
INTEGER |
NULLABLE |
|
summary_stats.2yr_h_index |
INTEGER |
NULLABLE |
|
summary_stats.2yr_i10_index |
INTEGER |
NULLABLE |
|
summary_stats.2yr_mean_citedness |
FLOAT |
NULLABLE |
|
summary_stats.2yr_works_count |
INTEGER |
NULLABLE |
|
summary_stats.cited_by_count |
INTEGER |
NULLABLE |
|
summary_stats.h_index |
INTEGER |
NULLABLE |
|
summary_stats.i10_index |
INTEGER |
NULLABLE |
|
summary_stats.oa_percent |
FLOAT |
NULLABLE |
|
summary_stats.works_count |
INTEGER |
NULLABLE |
|
type |
STRING |
NULLABLE |
The institution’s primary type, using the ROR “type” controlled vocabulary. Possible values are: Education, Healthcare, Company, Archive, Nonprofit, Government, Facility, and Other. |
updated_date |
TIMESTAMP |
NULLABLE |
The last time anything in this Institution changed, expressed as an ISO 8601 date string. This date is updated for any change at all, including increases in various counts. |
works_api_url |
STRING |
NULLABLE |
A URL that will get you a list of all the Works affiliated with this institution. We express this as an API URL (instead of just listing the Works themselves) because most institutions have way too many works to reasonably fit into a single return object. |
works_count |
INTEGER |
NULLABLE |
The number of Works created by authors affiliated with this institution. Or less formally: the number of works coming out of this institution. |
x_concepts |
RECORD |
REPEATED |
The “x” in x_concepts is because it’s experimental and subject to removal with very little warning. We plan to replace it with a custom link to the Concepts API endpoint. The Concepts most frequently applied to works affiliated with this institution. Each is represented as a dehydrated Concept object, with one additional attribute |
x_concepts.display_name |
STRING |
NULLABLE |
The English-language label of the concept. |
x_concepts.id |
STRING |
NULLABLE |
The OpenAlex ID for this concept. |
x_concepts.level |
INTEGER |
NULLABLE |
The level in the concept tree where this concept lives. |
x_concepts.score |
FLOAT |
NULLABLE |
The strength of association between this institution and the listed concept, from 0-100. |
x_concepts.wikidata |
STRING |
NULLABLE |
The Wikidata ID for this concept. |
Funders
name |
type |
mode |
description |
---|---|---|---|
alternate_titles |
STRING |
REPEATED |
A list of alternate titles for this funder. |
cited_by_count |
INTEGER |
NULLABLE |
The total number Works that cite a work linked to this funder. |
country_code |
STRING |
NULLABLE |
The country where this funder is located, represented as an ISO two-letter country code. |
counts_by_year |
RECORD |
REPEATED |
The values of works_count and cited_by_count for each of the last ten years, binned by year. To put it another way: for every listed year, you can see how many new works are linked to this funder, and how many times any work linked to this funder was cited. Years with zero citations and zero works have been removed so you will need to add those back in if you need them. |
counts_by_year.cited_by_count |
INTEGER |
NULLABLE |
|
counts_by_year.oa_works_count |
INTEGER |
NULLABLE |
|
counts_by_year.works_count |
INTEGER |
NULLABLE |
|
counts_by_year.year |
INTEGER |
NULLABLE |
|
created_date |
DATE |
NULLABLE |
The date this Funder object was created in the OpenAlex dataset, expressed as an ISO 8601 date string. |
description |
STRING |
NULLABLE |
A short description of this funder, taken from Wikidata. |
display_name |
STRING |
NULLABLE |
The primary name of the funder. |
homepage_url |
STRING |
NULLABLE |
The URL for this funder’s primary homepage. |
id |
STRING |
NULLABLE |
The OpenAlex ID for this funder. |
ids |
RECORD |
NULLABLE |
All the external identifiers that we know about for this funder. IDs are expressed as URIs whenever possible. |
ids.openalex |
STRING |
NULLABLE |
this funder’s OpenAlex ID |
ids.ror |
STRING |
NULLABLE |
this funder’s ROR ID |
ids.wikidata |
STRING |
NULLABLE |
this funder’s Wikidata ID |
ids.crossref |
STRING |
NULLABLE |
this funder’s Crossref ID |
ids.doi |
STRING |
NULLABLE |
this funder’s DOI |
image_thumbnail_url |
STRING |
NULLABLE |
Same as image_url, but it’s a smaller image. This is usually a hotlink to a wikimedia image. You can change the width=300 parameter in the URL if you want a different thumbnail size. |
image_url |
STRING |
NULLABLE |
URL where you can get an image representing this funder. Usually this a hotlink to a Wikimedia image, and usually it’s a seal or logo. |
roles |
RECORD |
REPEATED |
List of role objects, which include the role (one of institution, funder, or publisher), the id (OpenAlex ID), and the works_count. In many cases, a single organization does not fit neatly into one role. For example, Yale University is a single organization that is a research university, funds research studies, and publishes an academic journal. The roles property links the OpenAlex entities together for a single organization, and includes counts for the works associated with each role. The roles list of an entity (Funder, Publisher, or Institution) always includes itself. In the case where an organization only has one role, the roles will be a list of length one, with itself as the only item. |
roles.id |
STRING |
NULLABLE |
|
roles.role |
STRING |
NULLABLE |
|
roles.works_count |
INTEGER |
NULLABLE |
|
summary_stats |
RECORD |
NULLABLE |
Citation metrics for this funder. While the h-index and the i-10 index are normally author-level metrics and the 2-year mean citedness is normally a journal-level metric, they can be calculated for any set of papers, so we include them for funders. |
summary_stats.2yr_cited_by_count |
INTEGER |
NULLABLE |
|
summary_stats.2yr_h_index |
INTEGER |
NULLABLE |
|
summary_stats.2yr_i10_index |
INTEGER |
NULLABLE |
|
summary_stats.2yr_mean_citedness |
FLOAT |
NULLABLE |
The 2-year mean citedness for this funder. Also known as impact factor. |
summary_stats.2yr_works_count |
INTEGER |
NULLABLE |
|
summary_stats.cited_by_count |
INTEGER |
NULLABLE |
|
summary_stats.h_index |
INTEGER |
NULLABLE |
The h-index for this funder. |
summary_stats.i10_index |
INTEGER |
NULLABLE |
The i-10 index for this funder. |
summary_stats.oa_percent |
FLOAT |
NULLABLE |
|
summary_stats.works_count |
INTEGER |
NULLABLE |
|
updated_date |
TIMESTAMP |
NULLABLE |
The last time anything in this funder object changed, expressed as an ISO 8601 date string. This date is updated for any change at all, including increases in various counts. |
works_count |
INTEGER |
NULLABLE |
The number of works linked to this funder. |
Institutions
name |
type |
mode |
description |
---|---|---|---|
associated_institutions |
RECORD |
REPEATED |
|
associated_institutions.country_code |
STRING |
NULLABLE |
The country where this institution is located, represented as an ISO two-letter country code. |
associated_institutions.display_name |
STRING |
NULLABLE |
The primary name of the institution. |
associated_institutions.id |
STRING |
NULLABLE |
The OpenAlex ID for this institution. |
associated_institutions.relationship |
STRING |
NULLABLE |
The type of relationship between this institution and the listed institution. Possible values: parent, child, and related. |
associated_institutions.ror |
STRING |
NULLABLE |
The ROR ID for this institution. The ROR (Research Organization Registry) identifier is a globally unique ID for research organization. ROR is the successor to GRiD, which is no longer being updated. |
associated_institutions.type |
STRING |
NULLABLE |
The institution’s primary type, using the ROR “type” controlled vocabulary. Possible values are: Education, Healthcare, Company, Archive, Nonprofit, Government, Facility, and Other. |
cited_by_count |
INTEGER |
NULLABLE |
The total number Works that cite a work created by an author affiliated with this institution. Or less formally: the number of citations this institution has collected. |
country_code |
STRING |
NULLABLE |
The country where this institution is located, represented as an ISO two-letter country code. |
counts_by_year |
RECORD |
REPEATED |
works_count and cited_by_count for each of the last ten years, binned by year. To put it another way: each year, you can see how many new works this venue started hosting, and how many times any work in this venue got cited. |
counts_by_year.cited_by_count |
INTEGER |
NULLABLE |
The total number Works that cite a work created by an author affiliated with this institution. Or less formally: the number of citations this institution has collected. |
counts_by_year.oa_works_count |
INTEGER |
NULLABLE |
|
counts_by_year.works_count |
INTEGER |
NULLABLE |
The number of Works created by authors affiliated with this institution. Or less formally: the number of works coming out of this institution. |
counts_by_year.year |
INTEGER |
NULLABLE |
The year. |
created_date |
DATE |
NULLABLE |
The date this Institution object was created in the OpenAlex dataset, expressed as an ISO 8601 date string. |
display_name |
STRING |
NULLABLE |
The primary name of the institution. |
display_name_acronyms |
STRING |
REPEATED |
Acronyms or initialisms that people sometimes use instead of the full display_name. |
display_name_alternatives |
STRING |
REPEATED |
Other names people may use for this institution. |
geo |
RECORD |
NULLABLE |
A bunch of stuff we know about the location of this institution |
geo.city |
STRING |
NULLABLE |
The city where this institution lives. |
geo.country |
STRING |
NULLABLE |
The country where this institution lives. |
geo.country_code |
STRING |
NULLABLE |
The country where this institution lives, represented as an ISO two-letter country code. |
geo.geonames_city_id |
STRING |
NULLABLE |
The city where this institution lives, as a GeoNames database ID. |
geo.latitude |
FLOAT |
NULLABLE |
Does what it says. |
geo.longitude |
FLOAT |
NULLABLE |
Does what it says. |
geo.region |
STRING |
NULLABLE |
The sub-national region (state, province) where this institution lives. |
homepage_url |
STRING |
NULLABLE |
The URL for institution’s primary homepage |
id |
STRING |
NULLABLE |
The OpenAlex ID for this institution. |
ids |
RECORD |
NULLABLE |
All the persistent identifiers (PIDs) that we know about for this institution, as key: value pairs, where key is the PID namespace, and value is the PID. IDs are expressed as URIs where possible. They’re all strings except for mag, which is a long integer. |
ids.grid |
STRING |
NULLABLE |
this institution’s GRID ID |
ids.mag |
INTEGER |
NULLABLE |
this institution’s Microsoft Academic Graph ID |
ids.openalex |
STRING |
NULLABLE |
this institution’s OpenAlex ID. Same as Institution.id |
ids.ror |
STRING |
NULLABLE |
this institution’s ROR ID. Same as Institution.ror |
ids.wikidata |
STRING |
NULLABLE |
this institution’s Wikidata ID |
ids.wikipedia |
STRING |
NULLABLE |
this institution’s Wikipedia page URL |
image_thumbnail_url |
STRING |
NULLABLE |
Same as image_url, but it’s a smaller image. |
image_url |
STRING |
NULLABLE |
URL where you can get an image representing this institution. Usually this is hosted on Wikipedia, and usually it’s a seal or logo. |
international |
RECORD |
NULLABLE |
Translation of the display_name and description into multiple languages. |
international.display_name |
RECORD |
NULLABLE |
The institution’s display name in different languages. Derived from the wikipedia page for the institution in the given language. |
international.display_name.keys |
STRING |
REPEATED |
The language codes in wikidata language code format. |
international.display_name.values |
STRING |
REPEATED |
The translated display_names in each language. |
lineage |
STRING |
REPEATED |
OpenAlex IDs of institutions. The list will include this institution’s ID, as well as any parent institutions. If this institution has no parent institutions, this list will only contain its own ID. |
repositories |
RECORD |
REPEATED |
Repositories (Sources with type: repository) that have this institution as their host_organization |
repositories.display_name |
STRING |
NULLABLE |
The repositories display name. |
repositories.host_organization |
STRING |
NULLABLE |
The OpenAlex ID of the host organisation. |
repositories.host_organization_lineage |
STRING |
REPEATED |
OpenAlex IDs — See Publisher.lineage. This will only be included if the host_organization is a publisher (and not if the host_organization is an institution). |
repositories.host_organization_lineage_names |
STRING |
REPEATED |
The names of the organisations in host_organization_lineage. |
repositories.host_organization_name |
STRING |
NULLABLE |
The display_name from the host_organization, shown for convenience. |
repositories.id |
STRING |
NULLABLE |
The OpenAlex ID of the repository. |
repositories.issn |
STRING |
REPEATED |
|
repositories.issn_l |
STRING |
NULLABLE |
|
repositories.publisher |
STRING |
NULLABLE |
|
repositories.publisher_id |
STRING |
NULLABLE |
|
repositories.type |
STRING |
NULLABLE |
|
roles |
RECORD |
REPEATED |
|
roles.id |
STRING |
NULLABLE |
|
roles.role |
STRING |
NULLABLE |
|
roles.works_count |
INTEGER |
NULLABLE |
|
ror |
STRING |
NULLABLE |
The ROR ID for this institution. The ROR (Research Organization Registry) identifier is a globally unique ID for research organization. ROR is the successor to GRiD, which is no longer being updated. |
summary_stats |
RECORD |
NULLABLE |
Citation metrics for this institutions. |
summary_stats.2yr_cited_by_count |
INTEGER |
NULLABLE |
|
summary_stats.2yr_h_index |
INTEGER |
NULLABLE |
|
summary_stats.2yr_i10_index |
INTEGER |
NULLABLE |
|
summary_stats.2yr_mean_citedness |
FLOAT |
NULLABLE |
|
summary_stats.2yr_works_count |
INTEGER |
NULLABLE |
|
summary_stats.cited_by_count |
INTEGER |
NULLABLE |
|
summary_stats.h_index |
INTEGER |
NULLABLE |
|
summary_stats.i10_index |
INTEGER |
NULLABLE |
|
summary_stats.oa_percent |
FLOAT |
NULLABLE |
|
summary_stats.works_count |
INTEGER |
NULLABLE |
|
type |
STRING |
NULLABLE |
The institution’s primary type, using the ROR “type” controlled vocabulary. Possible values are: Education, Healthcare, Company, Archive, Nonprofit, Government, Facility, and Other. |
updated_date |
TIMESTAMP |
NULLABLE |
The last time anything in this Institution changed, expressed as an ISO 8601 date string. This date is updated for any change at all, including increases in various counts. |
works_api_url |
STRING |
NULLABLE |
A URL that will get you a list of all the Works affiliated with this institution. We express this as an API URL (instead of just listing the Works themselves) because most institutions have way too many works to reasonably fit into a single return object. |
works_count |
INTEGER |
NULLABLE |
The number of Works created by authors affiliated with this institution. Or less formally: the number of works coming out of this institution. |
x_concepts |
RECORD |
REPEATED |
The “x” in x_concepts is because it’s experimental and subject to removal with very little warning. We plan to replace it with a custom link to the Concepts API endpoint. The Concepts most frequently applied to works affiliated with this institution. Each is represented as a dehydrated Concept object, with one additional attribute |
x_concepts.display_name |
STRING |
NULLABLE |
The English-language label of the concept. |
x_concepts.id |
STRING |
NULLABLE |
The OpenAlex ID for this concept. |
x_concepts.level |
INTEGER |
NULLABLE |
The level in the concept tree where this concept lives. |
x_concepts.score |
FLOAT |
NULLABLE |
The strength of association between this institution and the listed concept, from 0-100. |
x_concepts.wikidata |
STRING |
NULLABLE |
The Wikidata ID for this concept. |
Publishers
name |
type |
mode |
description |
---|---|---|---|
alternate_titles |
STRING |
REPEATED |
A list of alternate titles for this publisher. |
cited_by_count |
INTEGER |
NULLABLE |
The number of citations to works that are linked to this publisher through journals or other sources. For example, if a publisher publishes 27 journals and those 27 journals have 3,050 works, this number is the sum of the cited_by_count values for all of those 3,050 works. |
country_codes |
STRING |
REPEATED |
The countries where the publisher is primarily located, as an ISO two-letter country code. |
counts_by_year |
RECORD |
REPEATED |
The values of works_count and cited_by_count for each of the last ten years, binned by year. To put it another way: for every listed year, you can see how many new works are linked to this publisher, and how many times any work linked to this publisher was cited. Years with zero citations and zero works have been removed so you will need to add those back in if you need them. |
counts_by_year.cited_by_count |
INTEGER |
NULLABLE |
The total number of Works that cite a Work published by this publisher. |
counts_by_year.oa_works_count |
INTEGER |
NULLABLE |
|
counts_by_year.works_count |
INTEGER |
NULLABLE |
The total number of Works that are published by this publisher. |
counts_by_year.year |
INTEGER |
NULLABLE |
The year. |
created_date |
DATE |
NULLABLE |
The date this Publisher object was created in the OpenAlex dataset, expressed as an ISO 8601 date string. |
display_name |
STRING |
NULLABLE |
The primary name of the publisher. |
hierarchy_level |
INTEGER |
NULLABLE |
The hierarchy level for this publisher. A publisher with hierarchy level 0 has no parent publishers. A hierarchy level 1 publisher has one parent above it, and so on. |
id |
STRING |
NULLABLE |
The OpenAlex ID for this publisher. |
ids |
RECORD |
NULLABLE |
All the external identifiers that we know about for this publisher. IDs are expressed as URIs whenever possible. |
ids.openalex |
STRING |
NULLABLE |
this publishers’s OpenAlex ID |
ids.ror |
STRING |
NULLABLE |
this publisher’s ROR ID |
ids.wikidata |
STRING |
NULLABLE |
this publisher’s Wikidata ID |
image_thumbnail_url |
STRING |
NULLABLE |
This is usually a hotlink to a wikimedia image. You can change the width=300 parameter in the URL if you want a different thumbnail size. |
image_url |
STRING |
NULLABLE |
URL where you can get an image representing this publisher. Usually this a hotlink to a Wikimedia image, and usually it’s a seal or logo. |
lineage |
STRING |
REPEATED |
OpenAlex IDs of publishers. The list will include this publisher’s ID, as well as any parent publishers. If this publisher’s hierarchy_level is 0, this list will only contain its own ID. |
parent_publisher |
RECORD |
NULLABLE |
An OpenAlex ID linking to the direct parent of the publisher and display name. This will be null if the publisher’s hierarchy_level is 0. |
parent_publisher.display_name |
STRING |
NULLABLE |
|
parent_publisher.id |
STRING |
NULLABLE |
|
roles |
RECORD |
REPEATED |
|
roles.id |
STRING |
NULLABLE |
|
roles.role |
STRING |
NULLABLE |
|
roles.works_count |
INTEGER |
NULLABLE |
|
sources_api_url |
STRING |
NULLABLE |
An URL that will get you a list of all the sources published by this publisher. We express this as an API URL (instead of just listing the sources themselves) because there might be thousands of sources linked to a publisher, and that’s too many to fit here. |
summary_stats |
RECORD |
NULLABLE |
Citation metrics for this publisher |
summary_stats.2yr_cited_by_count |
INTEGER |
NULLABLE |
|
summary_stats.2yr_h_index |
INTEGER |
NULLABLE |
|
summary_stats.2yr_i10_index |
INTEGER |
NULLABLE |
|
summary_stats.2yr_mean_citedness |
FLOAT |
NULLABLE |
|
summary_stats.2yr_works_count |
INTEGER |
NULLABLE |
|
summary_stats.cited_by_count |
INTEGER |
NULLABLE |
|
summary_stats.h_index |
INTEGER |
NULLABLE |
|
summary_stats.i10_index |
INTEGER |
NULLABLE |
|
summary_stats.oa_percent |
FLOAT |
NULLABLE |
|
summary_stats.works_count |
INTEGER |
NULLABLE |
|
updated_date |
TIMESTAMP |
NULLABLE |
The last time anything in this publisher object changed, expressed as an ISO 8601 date string. This date is updated for any change at all, including increases in various counts. |
works_count |
INTEGER |
NULLABLE |
The number of works published by this publisher. |
Sources
name |
type |
mode |
description |
---|---|---|---|
abbreviated_title |
STRING |
NULLABLE |
An abbreviated title obtained from the ISSN Centre. |
alternate_titles |
STRING |
REPEATED |
Alternate titles for this source, as obtained from the ISSN Centre and individual work records, like Crossref DOIs, that carry the source name as a string. These are commonly abbreviations or translations of the source’s canonical name. |
apc_prices |
RECORD |
REPEATED |
List of objects, each with price (Integer) and currency (String). Article processing charge information, taken directly from DOAJ. |
apc_prices.currency |
STRING |
NULLABLE |
Currency. |
apc_prices.price |
INTEGER |
NULLABLE |
Price. |
apc_usd |
INTEGER |
NULLABLE |
The source’s article processing charge in US Dollars, if available from DOAJ. The apc_usd value is calculated by taking the APC price (see apc_prices) with a currency of USD if it is available. If it’s not available, we convert the first available value from apc_prices into USD, using recent exchange rates. |
cited_by_count |
INTEGER |
NULLABLE |
The total number of Works that cite a Work hosted in this source. |
country_code |
STRING |
NULLABLE |
The country that this source is associated with, represented as an ISO two-letter country code. |
counts_by_year |
RECORD |
REPEATED |
works_count and cited_by_count for each of the last ten years, binned by year. To put it another way: each year, you can see how many new works this source started hosting, and how many times any work in this source got cited. If the source was founded less than ten years ago, there will naturally be fewer than ten years in this list. Years with zero citations and zero works have been removed so you will need to add those in if you need them. |
counts_by_year.cited_by_count |
INTEGER |
NULLABLE |
The total number of Works that cite a Work hosted in this source. |
counts_by_year.oa_works_count |
INTEGER |
NULLABLE |
|
counts_by_year.works_count |
INTEGER |
NULLABLE |
The number of Works this this source hosts. |
counts_by_year.year |
INTEGER |
NULLABLE |
The year. |
created_date |
DATE |
NULLABLE |
The date this Source object was created in the OpenAlex dataset, expressed as an ISO 8601 date string. |
display_name |
STRING |
NULLABLE |
The name of the source. |
homepage_url |
STRING |
NULLABLE |
The starting page for navigating the contents of this source; the homepage for this source’s website. |
host_organization |
STRING |
NULLABLE |
The host organization for this source as an OpenAlex ID. This will be an Institution.id if the source is a repository, and a Publisher.id if the source is a journal, conference, or eBook platform (based on the type field). |
host_organization_lineage |
STRING |
REPEATED |
OpenAlex IDs — See Publisher.lineage. This will only be included if the host_organization is a publisher (and not if the host_organization is an institution). |
host_organization_lineage_names |
STRING |
REPEATED |
The names of the organisations in host_organization_lineage. |
host_organization_name |
STRING |
NULLABLE |
The display_name from the host_organization, shown for convenience. |
id |
STRING |
NULLABLE |
The OpenAlex ID for this source. |
ids |
RECORD |
NULLABLE |
All the external identifiers that we know about for this source. IDs are expressed as URIs whenever possible. |
ids.fatcat |
STRING |
NULLABLE |
this source’s Fatcat ID |
ids.issn |
STRING |
REPEATED |
a list of this source’s ISSNs. Same as Source.issn |
ids.issn_l |
STRING |
NULLABLE |
this source’s ISSN-L. Same as Source.issn_l |
ids.mag |
INTEGER |
NULLABLE |
this source’s Microsoft Academic Graph ID |
ids.openalex |
STRING |
NULLABLE |
this source’s OpenAlex ID. Same as Source.id |
ids.wikidata |
STRING |
NULLABLE |
this source’s Wikidata ID |
is_in_doaj |
BOOLEAN |
NULLABLE |
Whether this is a journal listed in the Directory of Open Access Journals (DOAJ). |
is_oa |
BOOLEAN |
NULLABLE |
Whether this is currently fully-open-access source. This could be true for a preprint repository where everything uploaded is free to read, or for a Gold or Diamond open access journal, where all newly published Works are available for free under an open license. We say “currently” because the status of a source can change over time. It’s common for journals to “flip” to Gold OA, after which they may make only future articles open or also open their back catalogs. It’s entirely possible for a source to say is_oa: true, but for an article from last year to require a subscription. |
issn |
STRING |
REPEATED |
The ISSNs used by this source. Many publications have multiple ISSNs (see above), so ISSN-L should be used when possible. |
issn_l |
STRING |
NULLABLE |
The ISSN-L identifying this source. ISSN is a global and unique ID for serial publications. However, different media versions of a given publication (e.g., print and electronic) often have different ISSNs. This is why we can’t have nice things. The ISSN-L or Linking ISSN solves the problem by designating a single canonical ISSN for all media versions of the title. It’s usually the same as the print ISSN. |
publisher |
STRING |
NULLABLE |
The name of this source’s publisher. Publisher is a tricky category, as journals often change publishers, publishers merge, publishers have subsidiaries (“imprints”), and of course no one is consistent in their naming. In the future, we plan to roll out support for a more structured publisher field, but for now it’s just a string. |
publisher_id |
STRING |
NULLABLE |
|
societies |
RECORD |
REPEATED |
Societies on whose behalf the source is published and maintained, obtained from our crowdsourced list. Thanks! |
societies.organization |
STRING |
NULLABLE |
The society organisation name. |
societies.url |
STRING |
NULLABLE |
The society URL. |
summary_stats |
RECORD |
NULLABLE |
Citation metrics for this source. |
summary_stats.2yr_cited_by_count |
INTEGER |
NULLABLE |
|
summary_stats.2yr_h_index |
INTEGER |
NULLABLE |
|
summary_stats.2yr_i10_index |
INTEGER |
NULLABLE |
|
summary_stats.2yr_mean_citedness |
FLOAT |
NULLABLE |
|
summary_stats.2yr_works_count |
INTEGER |
NULLABLE |
|
summary_stats.cited_by_count |
INTEGER |
NULLABLE |
|
summary_stats.h_index |
INTEGER |
NULLABLE |
|
summary_stats.i10_index |
INTEGER |
NULLABLE |
|
summary_stats.oa_percent |
FLOAT |
NULLABLE |
|
summary_stats.works_count |
INTEGER |
NULLABLE |
|
type |
STRING |
NULLABLE |
The type of source, which will be one of the following from the Type column: journal, repository, conference, ebook platform. |
updated_date |
TIMESTAMP |
NULLABLE |
The last time anything in this Source object changed, expressed as an ISO 8601 date string. This date is updated for any change at all, including increases in various counts. |
works_api_url |
STRING |
NULLABLE |
A URL that will get you a list of all this Source’s Works. We express this as an API URL (instead of just listing the works themselves) because sometimes a source’s publication list is too long to reasonably fit into a single Source object. |
works_count |
INTEGER |
NULLABLE |
The number of Works this this Source hosts. |
x_concepts |
RECORD |
REPEATED |
The “x” in x_concepts is because it’s experimental and subject to removal with very little warning. We plan to replace it with a custom link to the Concepts API endpoint. The Concepts most frequently applied to works hosted by this source. Each is represented as a dehydrated Concept object, with one additional attribute: |
x_concepts.display_name |
STRING |
NULLABLE |
The English-language label of the concept. |
x_concepts.id |
STRING |
NULLABLE |
The OpenAlex ID for this concept. |
x_concepts.level |
INTEGER |
NULLABLE |
The level in the concept tree where this concept lives. |
x_concepts.score |
FLOAT |
NULLABLE |
The strength of association between this source and the listed concept, from 0-100. |
x_concepts.wikidata |
STRING |
NULLABLE |
The Wikidata ID for this concept. |
Work
name |
type |
mode |
description |
---|---|---|---|
abstract_inverted_index |
RECORD |
NULLABLE |
The abstract of the work, as an inverted index, which encodes information about the abstract’s words and their positions within the text. Like Microsoft Academic Graph, OpenAlex doesn’t include plaintext abstracts due to legal constraints. |
abstract_inverted_index.keys |
STRING |
REPEATED |
Custom field created by COKI. Originally each word in the abstract was a key and the indices of where this word occurred inside the abstract the corresponding value. |
abstract_inverted_index.values |
STRING |
REPEATED |
Custom field created by COKI. Originally each word in the abstract was a key and the indices of where this word occurred inside the abstract the corresponding value. |
apc_list |
RECORD |
NULLABLE |
Objects containing information about the APC (article processing charge) for this work. This value is the APC list price–the price as listed by the journal’s publisher. That’s not always the price actually paid, because publishers may offer various discounts to authors. Unfortunately we don’t always know this discounted price, but when we do you can find it in apc_paid. Currently our only source for this data is DOAJ, and so doaj is the only value for apc_list.provenance, but we’ll add other sources over time. |
apc_list.currency |
STRING |
NULLABLE |
|
apc_list.price |
INTEGER |
NULLABLE |
|
apc_list.price_usd |
INTEGER |
NULLABLE |
APC converted to USD |
apc_list.value |
INTEGER |
NULLABLE |
|
apc_list.value_usd |
INTEGER |
NULLABLE |
|
apc_list.provenance |
STRING |
NULLABLE |
|
apc_paid |
RECORD |
NULLABLE |
Object: Information about the paid APC (article processing charge) for this work. You can find the listed APC price (when we know it) for a given work using apc_list. However, authors don’t always pay the listed price; often they get a discounted price from publishers. So it’s useful to know the APC actually paid by authors, as distinct from the list price. This is our effort to provide this. Our best source for the actually paid price is the OpenAPC project. Where available, we use that data, and so apc_paid.provenance is openapc. Where OpenAPC data is unavailable (and unfortunately this is common) we make our best guess by assuming the author paid the APC list price, and apc_paid.provenance will be set to wherever we got the list price from. |
apc_paid.currency |
STRING |
NULLABLE |
|
apc_paid.price |
INTEGER |
NULLABLE |
|
apc_paid.price_usd |
INTEGER |
NULLABLE |
APC converted to USD |
apc_paid.value |
INTEGER |
NULLABLE |
|
apc_paid.value_usd |
INTEGER |
NULLABLE |
|
apc_paid.provenance |
STRING |
NULLABLE |
|
authors_count |
INTEGER |
NULLABLE |
|
authorships_truncated |
BOOLEAN |
NULLABLE |
|
authorships |
RECORD |
REPEATED |
List of Authorship objects, each representing an author and their institution. |
authorships.author |
RECORD |
NULLABLE |
An author of this work, as a dehydrated Author object. |
authorships.author.display_name |
STRING |
NULLABLE |
The name of the author as a single string. |
authorships.author.id |
STRING |
NULLABLE |
The OpenAlex ID for this author. |
authorships.author.orcid |
STRING |
NULLABLE |
The ORCID for this author. ORCID global and unique ID for authors. |
authorships.author_position |
STRING |
NULLABLE |
A summarized description of this author’s position in the work’s author list. Possible values are first, middle, and last. It’s not strictly necessary, because author order is already implicitly recorded by the list order of Authorship objects; however it’s useful in some contexts to have this as a categorical value. |
authorships.countries |
STRING |
REPEATED |
|
authorships.institutions |
RECORD |
REPEATED |
The institutional affiliations this author claimed in the context of this work, as dehydrated Institution objects. |
authorships.institutions.country_code |
STRING |
NULLABLE |
The country where this institution is located, represented as an ISO two-letter country code. |
authorships.institutions.display_name |
STRING |
NULLABLE |
The primary name of the institution. |
authorships.institutions.id |
STRING |
NULLABLE |
The OpenAlex ID for this institution. |
authorships.institutions.lineage |
STRING |
REPEATED |
OpenAlex IDs of institutions. The list will include this institution’s ID, as well as any parent institutions. If this institution has no parent institutions, this list will only contain its own ID. |
authorships.institutions.ror |
STRING |
NULLABLE |
The ROR ID for this institution. The ROR (Research Organization Registry) identifier is a globally unique ID for research organization. ROR is the successor to GRiD, which is no longer being updated. |
authorships.institutions.type |
STRING |
NULLABLE |
The institution’s primary type, using the ROR “type” controlled vocabulary. Possible values are: Education, Healthcare, Company, Archive, Nonprofit, Government, Facility, and Other. |
authorships.is_corresponding |
BOOLEAN |
NULLABLE |
|
authorships.raw_affiliation_string |
STRING |
NULLABLE |
This author’s affiliation as it originally came to us (on a webpage or in an API), as a raw unformatted string. |
authorships.raw_affiliation_strings |
STRING |
REPEATED |
|
authorships.raw_author_name |
STRING |
NULLABLE |
This author’s name as it originally came to us (on a webpage or in an API), as a raw unformatted string. |
best_oa_location |
RECORD |
NULLABLE |
A Location object with the best available open access location for this work. |
best_oa_location.doi |
STRING |
NULLABLE |
|
best_oa_location.is_accepted |
BOOLEAN |
NULLABLE |
|
best_oa_location.is_oa |
BOOLEAN |
NULLABLE |
True if this work is Open Access (OA). |
best_oa_location.is_published |
BOOLEAN |
NULLABLE |
|
best_oa_location.landing_page_url |
STRING |
NULLABLE |
The landing page URL for this location. |
best_oa_location.license |
STRING |
NULLABLE |
The location’s publishing license. This can be a Create Commons license such as cc0 or cc-by, a publisher-specific license, or null which means we are not able to determine a license for this location. |
best_oa_location.pdf_url |
STRING |
NULLABLE |
A URL where you can find this location as a PDF. |
best_oa_location.source |
RECORD |
NULLABLE |
Information about the source of this location, as a DehydratedSource object. |
best_oa_location.source.display_name |
STRING |
NULLABLE |
The name of the source. |
best_oa_location.source.host_institution_lineage |
STRING |
REPEATED |
|
best_oa_location.source.host_institution_lineage_names |
STRING |
REPEATED |
|
best_oa_location.source.host_organization |
STRING |
NULLABLE |
The host organization for this source as an OpenAlex ID. This will be an Institution.id if the source is a repository, and a Publisher.id if the source is a journal, conference, or eBook platform (based on the type field). |
best_oa_location.source.host_organization_lineage |
STRING |
REPEATED |
OpenAlex IDs — See Publisher.lineage. This will only be included if the host_organization is a publisher (and not if the host_organization is an institution). |
best_oa_location.source.host_organization_lineage_names |
STRING |
REPEATED |
The names of the organisations in host_organization_lineage. |
best_oa_location.source.host_organization_name |
STRING |
NULLABLE |
The display_name from the host_organization, shown for convenience. |
best_oa_location.source.id |
STRING |
NULLABLE |
The OpenAlex ID for this source. |
best_oa_location.source.is_in_doaj |
BOOLEAN |
NULLABLE |
Whether this is a journal listed in the Directory of Open Access Journals (DOAJ). |
best_oa_location.source.is_oa |
BOOLEAN |
NULLABLE |
|
best_oa_location.source.issn |
STRING |
REPEATED |
The ISSNs used by this source. Many publications have multiple ISSNs (see above), so ISSN-L should be used when possible. |
best_oa_location.source.issn_l |
STRING |
NULLABLE |
The ISSN-L identifying this source. This is the Canonical External ID for sources. |
best_oa_location.source.publisher |
STRING |
NULLABLE |
The publisher name. |
best_oa_location.source.publisher_id |
STRING |
NULLABLE |
The OpenAlex ID of the publisher. |
best_oa_location.source.publisher_lineage |
STRING |
REPEATED |
|
best_oa_location.source.publisher_lineage_names |
STRING |
REPEATED |
|
best_oa_location.source.type |
STRING |
NULLABLE |
The type of source. |
best_oa_location.version |
STRING |
NULLABLE |
The version of the work, based on the DRIVER Guidelines versioning scheme. |
biblio |
RECORD |
NULLABLE |
Old-timey bibliographic info for this work. This is mostly useful only in citation/reference contexts. These are all strings because sometimes you’ll get fun values like “Spring” and “Inside cover.” |
biblio.first_page |
STRING |
NULLABLE |
|
biblio.issue |
STRING |
NULLABLE |
|
biblio.last_page |
STRING |
NULLABLE |
|
biblio.volume |
STRING |
NULLABLE |
|
cited_by_api_url |
STRING |
NULLABLE |
|
cited_by_count |
INTEGER |
NULLABLE |
The number of citations to this work. These are the times that other works have cited this work: Other works ➞ This work. |
concepts |
RECORD |
REPEATED |
List of dehydrated Concept objects. Each Concept object in the list also has one additional property |
concepts.display_name |
STRING |
NULLABLE |
The English-language label of the concept. |
concepts.id |
STRING |
NULLABLE |
The OpenAlex ID for this concept. |
concepts.level |
INTEGER |
NULLABLE |
The level in the concept tree where this concept lives. |
concepts.score |
FLOAT |
NULLABLE |
The strength of the connection between the work and this concept (higher is stronger). |
concepts.wikidata |
STRING |
NULLABLE |
The Wikidata ID for this concept. |
concepts_count |
INTEGER |
NULLABLE |
|
corresponding_author_ids |
STRING |
REPEATED |
OpenAlex IDs of any authors for which authorships.is_corresponding is true. |
corresponding_institution_ids |
STRING |
REPEATED |
OpenAlex IDs of any institutions found within an authorship for which authorships.is_corresponding is true. |
countries_distinct_count |
INTEGER |
NULLABLE |
Number of distinct country_codes among the authorships for this work. |
counts_by_year |
RECORD |
REPEATED |
Works.cited_by_count for each of the last ten years, binned by year. To put it another way: each year, you can see how many times this work was cited. |
counts_by_year.cited_by_count |
INTEGER |
NULLABLE |
The number of times this work is cited in this year. |
counts_by_year.oa_works_count |
INTEGER |
NULLABLE |
|
counts_by_year.year |
INTEGER |
NULLABLE |
The year. |
created_date |
DATE |
NULLABLE |
The date this Work object was created in the OpenAlex dataset, expressed as an ISO 8601 date string. |
display_name |
STRING |
NULLABLE |
Exactly the same as Work.title. It’s useful for Works to include a display_name property, since all the other entities have one. |
doi |
STRING |
NULLABLE |
The DOI for the work. This is the Canonical External ID for works. Occasionally, a work has more than one DOI–for example, there might be one DOI for a preprint version hosted on bioRxiv, and another DOI for the published version. However, this field always has just one DOI, the DOI for the published work. |
doi_registration_agency |
STRING |
NULLABLE |
|
fulltext_origin |
STRING |
NULLABLE |
|
grants |
RECORD |
REPEATED |
List of grant objects, which include the Funder and the award ID, if available. Our grants data comes from Crossref, and is currently fairly limited. |
grants.award_id |
STRING |
NULLABLE |
|
grants.funder |
STRING |
NULLABLE |
|
grants.funder_display_name |
STRING |
NULLABLE |
|
has_fulltext |
BOOLEAN |
NULLABLE |
|
id |
STRING |
NULLABLE |
The OpenAlex ID for this work. |
ids |
RECORD |
NULLABLE |
All the persistent identifiers (PIDs) that we know about for this work, as key: value pairs, where key is the PID namespace, and value is the PID. IDs are expressed as URIs where possible. |
ids.arxiv_id |
STRING |
NULLABLE |
|
ids.doi |
STRING |
NULLABLE |
The DOI. Same as Work.doi |
ids.mag |
INTEGER |
NULLABLE |
The Microsoft Academic Graph ID |
ids.openalex |
STRING |
NULLABLE |
The OpenAlex ID. Same as Work.id |
ids.pmcid |
STRING |
NULLABLE |
the Pubmed Central identifier |
ids.pmid |
STRING |
NULLABLE |
The Pubmed Identifier |
institutions_distinct_count |
INTEGER |
NULLABLE |
Number of distinct institutions among the authorships for this work. |
is_paratext |
BOOLEAN |
NULLABLE |
True if we think this work is paratext. In our context, paratext is stuff that’s in scholarly venue (like a journal) but is about the venue rather than a scholarly work properly speaking. Some examples and nonexamples: yep it’s paratext: front cover, back cover, table of contents, editorial board listing, issue information, masthead. no, not paratext: research paper, dataset, letters to the editor, figures Turns out there is a lot of paratext in registries like Crossref. That’s not a bad thing… but we’ve found that it’s good to have a way to filter it out. We determine is_paratext algorithmically using title heuristics. |
is_retracted |
BOOLEAN |
NULLABLE |
True if we know this work has been retracted. This field has high precision but low recall. In other words, if is_retracted is true, the article is definitely retracted. But if is_retracted is False, it still might be retracted, but we just don’t know. This is because unfortunately, the open sources for retraction data aren’t currently very comprehensive, and the more comprehensive ones aren’t sufficiently open for us to use here. |
language |
STRING |
NULLABLE |
The language of the work in ISO 639-1 format. The language is automatically detected using the information we have about the work. We use the langdetect software library on the words in the work’s abstract, or the title if we do not have the abstract. The source code for this procedure is here. Keep in mind that this method is not perfect, and that in some cases the language of the title or abstract could be different from the body of the work. |
license |
STRING |
NULLABLE |
The license applied to this work at this host. Most toll-access works don’t have an explicit license (they’re under “all rights reserved” copyright), so this field generally has content only if is_oa is true. |
locations |
RECORD |
REPEATED |
A list of Location objects describing all unique places where this work lives. |
locations.doi |
STRING |
NULLABLE |
|
locations.is_accepted |
BOOLEAN |
NULLABLE |
True if this location’s version is either acceptedVersion or publishedVersion; otherwise false. |
locations.is_oa |
BOOLEAN |
NULLABLE |
True if this work is Open Access (OA). |
locations.is_published |
BOOLEAN |
NULLABLE |
True if this location’s version is publishedVersion; otherwise false. |
locations.landing_page_url |
STRING |
NULLABLE |
The landing page URL for this location. |
locations.license |
STRING |
NULLABLE |
The location’s publishing license. This can be a Create Commons license such as cc0 or cc-by, a publisher-specific license, or null which means we are not able to determine a license for this location. |
locations.pdf_url |
STRING |
NULLABLE |
A URL where you can find this location as a PDF. |
locations.source |
RECORD |
NULLABLE |
|
locations.source.display_name |
STRING |
NULLABLE |
The name of the source. |
locations.source.host_institution_lineage |
STRING |
REPEATED |
|
locations.source.host_institution_lineage_names |
STRING |
REPEATED |
|
locations.source.host_organization |
STRING |
NULLABLE |
The host organization for this source as an OpenAlex ID. This will be an Institution.id if the source is a repository, and a Publisher.id if the source is a journal, conference, or eBook platform (based on the type field). |
locations.source.host_organization_lineage |
STRING |
REPEATED |
OpenAlex IDs — See Publisher.lineage. This will only be included if the host_organization is a publisher (and not if the host_organization is an institution). |
locations.source.host_organization_lineage_names |
STRING |
REPEATED |
The names of the organisations in host_organization_lineage. |
locations.source.host_organization_name |
STRING |
NULLABLE |
The display_name from the host_organization, shown for convenience. |
locations.source.id |
STRING |
NULLABLE |
The OpenAlex ID for this source. |
locations.source.is_in_doaj |
BOOLEAN |
NULLABLE |
Whether this is a journal listed in the Directory of Open Access Journals (DOAJ). |
locations.source.is_oa |
BOOLEAN |
NULLABLE |
|
locations.source.issn |
STRING |
REPEATED |
The ISSNs used by this source. Many publications have multiple ISSNs (see above), so ISSN-L should be used when possible. |
locations.source.issn_l |
STRING |
NULLABLE |
The ISSN-L identifying this source. This is the Canonical External ID for sources. |
locations.source.publisher |
STRING |
NULLABLE |
The publisher name. |
locations.source.publisher_id |
STRING |
NULLABLE |
The OpenAlex ID of the publisher. |
locations.source.publisher_lineage |
STRING |
REPEATED |
|
locations.source.publisher_lineage_names |
STRING |
REPEATED |
|
locations.source.type |
STRING |
NULLABLE |
The type of source. |
locations.version |
STRING |
NULLABLE |
The version of the work, based on the DRIVER Guidelines versioning scheme. Possible values are: publishedVersion: The document’s version of record. This is the most authoritative version. acceptedVersion: The document after having completed peer review and being officially accepted for publication. It will lack publisher formatting, but the content should be interchangeable with the that of the publishedVersion. submittedVersion: the document as submitted to the publisher by the authors, but before peer-review. Its content may differ significantly from that of the accepted article. |
locations_count |
INTEGER |
NULLABLE |
Number of locations for this work. |
mesh |
RECORD |
REPEATED |
List of MeSH tag objects. Only works found in PubMed have MeSH tags; for all other works, this is an empty list. |
mesh.descriptor_name |
STRING |
NULLABLE |
|
mesh.descriptor_ui |
STRING |
NULLABLE |
|
mesh.is_major_topic |
BOOLEAN |
NULLABLE |
|
mesh.qualifier_name |
STRING |
NULLABLE |
|
mesh.qualifier_ui |
STRING |
NULLABLE |
|
open_access |
RECORD |
NULLABLE |
Information about the access status of this work, as an OpenAccess object. |
open_access.any_repository_has_fulltext |
BOOLEAN |
NULLABLE |
|
open_access.is_oa |
BOOLEAN |
NULLABLE |
True if this work is Open Access (OA). There are many ways to define OA. OpenAlex uses a broad definition: having a URL where you can read the fulltext of this work without needing to pay money or log in. You can use the locations and oa_status fields to narrow your results further, accommodating any definition of OA you like. |
open_access.oa_status |
STRING |
NULLABLE |
The Open Access (OA) status of this work. Possible values are: -gold: Published in an OA journal that is indexed by the DOAJ. -green: Toll-access on the publisher landing page, but there is a free copy in an OA repository. -hybrid: Free under an open license in a toll-access journal. -bronze: Free to read on the publisher landing page, but without any identifiable license. -closed: All other articles. |
open_access.oa_url |
STRING |
NULLABLE |
The best Open Access (OA) URL for this work. Although there are many ways to define OA, in this context an OA URL is one where you can read the fulltext of this work without needing to pay money or log in. The “best” such URL is the one closest to the version of record. This URL might be a direct link to a PDF, or it might be to a landing page that links to the free PDF |
primary_location |
RECORD |
NULLABLE |
A Location object with the primary location of this work. |
primary_location.doi |
STRING |
NULLABLE |
|
primary_location.is_accepted |
BOOLEAN |
NULLABLE |
|
primary_location.is_oa |
BOOLEAN |
NULLABLE |
True if this work is Open Access (OA). |
primary_location.is_published |
BOOLEAN |
NULLABLE |
|
primary_location.landing_page_url |
STRING |
NULLABLE |
The landing page URL for this location. |
primary_location.license |
STRING |
NULLABLE |
The location’s publishing license. This can be a Create Commons license such as cc0 or cc-by, a publisher-specific license, or null which means we are not able to determine a license for this location. |
primary_location.pdf_url |
STRING |
NULLABLE |
A URL where you can find this location as a PDF. |
primary_location.source |
RECORD |
NULLABLE |
|
primary_location.source.display_name |
STRING |
NULLABLE |
The name of the source. |
primary_location.source.host_institution_lineage |
STRING |
REPEATED |
|
primary_location.source.host_institution_lineage_names |
STRING |
REPEATED |
|
primary_location.source.host_organization |
STRING |
NULLABLE |
The host organization for this source as an OpenAlex ID. This will be an Institution.id if the source is a repository, and a Publisher.id if the source is a journal, conference, or eBook platform (based on the type field). |
primary_location.source.host_organization_lineage |
STRING |
REPEATED |
OpenAlex IDs — See Publisher.lineage. This will only be included if the host_organization is a publisher (and not if the host_organization is an institution). |
primary_location.source.host_organization_lineage_names |
STRING |
REPEATED |
The names of the organisations in host_organization_lineage. |
primary_location.source.host_organization_name |
STRING |
NULLABLE |
The display_name from the host_organization, shown for convenience. |
primary_location.source.id |
STRING |
NULLABLE |
The OpenAlex ID for this source. |
primary_location.source.is_in_doaj |
BOOLEAN |
NULLABLE |
Whether this is a journal listed in the Directory of Open Access Journals (DOAJ). |
primary_location.source.is_oa |
BOOLEAN |
NULLABLE |
|
primary_location.source.issn |
STRING |
REPEATED |
The ISSNs used by this source. Many publications have multiple ISSNs (see above), so ISSN-L should be used when possible. |
primary_location.source.issn_l |
STRING |
NULLABLE |
The ISSN-L identifying this source. This is the Canonical External ID for sources. |
primary_location.source.publisher |
STRING |
NULLABLE |
The publisher name. |
primary_location.source.publisher_id |
STRING |
NULLABLE |
The OpenAlex ID of the publisher. |
primary_location.source.publisher_lineage |
STRING |
REPEATED |
|
primary_location.source.publisher_lineage_names |
STRING |
REPEATED |
|
primary_location.source.type |
STRING |
NULLABLE |
The type of source. |
primary_location.version |
STRING |
NULLABLE |
The version of the work, based on the DRIVER Guidelines versioning scheme. Possible values are:. publishedVersion: The document’s version of record. This is the most authoritative version. acceptedVersion: The document after having completed peer review and being officially accepted for publication. It will lack publisher formatting, but the content should be interchangeable with the that of the publishedVersion. submittedVersion: the document as submitted to the publisher by the authors, but before peer-review. Its content may differ significantly from that of the accepted article. |
publication_date |
DATE |
NULLABLE |
The day when this work was published, formatted as an ISO 8601 date. Where different publication dates exist, we select the earliest available date of electronic publication. This date applies to the version found at Work.url. The other versions, found in Work.locations, may have been published at different (earlier) dates. |
publication_year |
INTEGER |
NULLABLE |
The year this work was published. |
referenced_works |
STRING |
REPEATED |
OpenAlex IDs for works that this work cites. These are citations that go from this work out to another work: This work ➞ Other works. |
referenced_works_count |
INTEGER |
NULLABLE |
|
related_works |
STRING |
REPEATED |
OpenAlex IDs for works related to this work. |
summary_stats |
RECORD |
NULLABLE |
|
summary_stats.2yr_cited_by_count |
INTEGER |
NULLABLE |
|
summary_stats.2yr_h_index |
INTEGER |
NULLABLE |
|
summary_stats.2yr_i10_index |
INTEGER |
NULLABLE |
|
summary_stats.2yr_mean_citedness |
FLOAT |
NULLABLE |
|
summary_stats.cited_by_count |
INTEGER |
NULLABLE |
|
summary_stats.h_index |
INTEGER |
NULLABLE |
|
summary_stats.i10_index |
INTEGER |
NULLABLE |
|
summary_stats.oa_percent |
FLOAT |
NULLABLE |
|
sustainable_development_goals |
RECORD |
REPEATED |
List of sustainable developement goal objects. The United Nations’ 17 Sustainable Development Goals are a collection of goals at the heart of a global “shared blueprint for peace and prosperity for people and the planet.” We use a machine learning model to tag works with their relevance to these goals based on our OpenAlex SDG Classifier, an mBERT machine learning model developed by the Aurora Universities Network. The score represents the model’s predicted probability of the work’s relevance for a particular goal. |
sustainable_development_goals.display_name |
STRING |
NULLABLE |
|
sustainable_development_goals.id |
STRING |
NULLABLE |
|
sustainable_development_goals.score |
FLOAT |
NULLABLE |
All of the SDGs with a prediction score higher than 0.1. |
title |
STRING |
NULLABLE |
The title of this work. |
type |
STRING |
NULLABLE |
The type or genre of the work. This field uses Crossref’s “type” controlled vocabulary; you can see all possible values via the Crossref api here: https://api.crossref.org/types. Where possible, we just pass along Crossref’s type value for each work. When that’s impossible (eg the work isn’t in Crossref), we do our best to figure out the type ourselves. Unfortunately the accuracy of Crossref’s data for this isn’t great, and ours isn’t much better. We’re working to develop better type classification. |
type_crossref |
STRING |
NULLABLE |
Legacy type information, using Crossref’s “type” controlled vocabulary. |
updated |
TIMESTAMP |
NULLABLE |
|
updated_date |
TIMESTAMP |
NULLABLE |
The last time anything in this Work object changed, expressed as an ISO 8601 date string. This date is updated for any change at all, including increases in various counts. |
url |
STRING |
NULLABLE |
The URL where you can access this work. |
version |
STRING |
NULLABLE |
The version of the work, based on the DRIVER Guidelines versioning scheme. Possible values are: publishedVersion, acceptedVersion or submittedVersion. |
keywords |
RECORD |
REPEATED |
|
keywords.keyword |
STRING |
NULLABLE |
|
keywords.score |
FLOAT |
NULLABLE |
|
cited_by_percentile_year |
RECORD |
NULLABLE |
|
cited_by_percentile_year.min |
FLOAT |
NULLABLE |
|
cited_by_percentile_year.max |
FLOAT |
NULLABLE |