academic_observatory_workflows.oa_dashboard_workflow.tasks

Attributes

INCLUSION_THRESHOLD

MAX_REPOSITORIES

START_YEAR

END_YEAR

README

Classes

ZenodoVersion

Histogram

EntityHistograms

EntityStats

Stats

Functions

upload_institution_ids(*, release)

create_entity_tables(*, release, entity_types, ...)

add_wiki_descriptions(*, release, entity_type)

download_assets(*, release, bucket_name)

download_institution_logos(*, release)

export_tables(*, release, entity_types, download_bucket)

download_data(*, release, download_bucket)

make_draft_zenodo_version(*, zenodo_conn_id, ...)

fetch_zenodo_versions(*, zenodo_conn_id, zenodo_host, ...)

build_datasets(*, release, entity_types, ...)

publish_zenodo_version(*, release, version, ...)

upload_dataset(*, release, version, bucket_name)

repository_dispatch(*, github_conn_id)

bq_query_to_gcs(→ bool)

Run a BigQuery query and save the results on Google Cloud Storage.

save_oa_dashboard_dataset(download_folder, ...)

save_zenodo_dataset(download_folder, dataset_path, ...)

Save the COKI Open Access Dataset to a zip file.

oa_dashboard_subset(→ Dict)

zenodo_subset(item)

save_json(path, data)

Save data to JSON.

data_file_pattern(download_folder, entity_type)

yield_data_glob(→ List[Dict])

Load country or institution data files into a Pandas DataFrame.

make_entity_stats(→ EntityStats)

Calculate stats for entities.

make_logo_url(→ str)

Make a logo url.

fetch_institution_logo(→ Tuple[str, str])

Get the path to the logo for an institution.

clean_url(→ str)

Remove path and query from URL.

fetch_institution_logos(→ List[Dict])

Update the index with logos, downloading logos if they don't exist.

Module Contents

academic_observatory_workflows.oa_dashboard_workflow.tasks.INCLUSION_THRESHOLD[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.MAX_REPOSITORIES = 200[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.START_YEAR = 2000[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.END_YEAR[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.README = Multiline-String[source]
Show Value
"""# COKI Open Access Dataset
The COKI Open Access Dataset measures open access performance for {{ n_countries }} countries and {{ n_institutions }} institutions
and is available in JSON Lines format. The data is visualised at the COKI Open Access Dashboard: https://open.coki.ac/.

## Licence
[COKI Open Access Dataset](https://open.coki.ac/data/) © {{ year }} by [Curtin University](https://www.curtin.edu.au/)
is licenced under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)

## Citing
To cite the COKI Open Access Dashboard please use the following citation:
> Diprose, J., Hosking, R., Rigoni, R., Roelofs, A., Chien, T., Napier, K., Wilson, K., Huang, C., Handcock, R., Montgomery, L., & Neylon, C. (2023). A User-Friendly Dashboard for Tracking Global Open Access Performance. The Journal of Electronic Publishing 26(1). doi: https://doi.org/10.3998/jep.3398

If you use the website code, please cite it as below:
> James P. Diprose, Richard Hosking, Richard Rigoni, Aniek Roelofs, Kathryn R. Napier, Tuan-Yow Chien, Alex Massen-Hane, Katie S. Wilson, Lucy Montgomery, & Cameron Neylon. (2022). COKI Open Access Website. Zenodo. https://doi.org/10.5281/zenodo.6374486

If you use this dataset, please cite it as below:
> Richard Hosking, James P. Diprose, Aniek Roelofs, Tuan-Yow Chien, Lucy Montgomery, & Cameron Neylon. (2022). COKI Open Access Dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6399463

## Attributions
The COKI Open Access Dataset contains information from:
* [Open Alex](https://openalex.org/) which is made available under a [CC0 licence](https://creativecommons.org/publicdomain/zero/1.0/).
* [Crossref Metadata](https://www.crossref.org/documentation/metadata-plus/) via the Metadata Plus program. Bibliographic metadata is made available without copyright restriction and Crossref generated data with a [CC0 licence](https://creativecommons.org/share-your-work/public-domain/cc0/). See [metadata licence information](https://www.crossref.org/documentation/retrieve-metadata/rest-api/rest-api-metadata-license-information/) for more details.
* [Unpaywall](https://unpaywall.org/). The [Unpaywall Data Feed](https://unpaywall.org/products/data-feed) is used under license. Data is freely available from Unpaywall via the API, data dumps and as a data feed.
* [Research Organization Registry](https://ror.org/) which is made available under a [CC0 licence](https://creativecommons.org/share-your-work/public-domain/cc0/).
"""
academic_observatory_workflows.oa_dashboard_workflow.tasks.upload_institution_ids(*, release: academic_observatory_workflows.oa_dashboard_workflow.release.OaDashboardRelease)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.create_entity_tables(*, release: academic_observatory_workflows.oa_dashboard_workflow.release.OaDashboardRelease, entity_types: list[str], start_year: int, end_year: int, inclusion_thresholds: dict)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.add_wiki_descriptions(*, release: academic_observatory_workflows.oa_dashboard_workflow.release.OaDashboardRelease, entity_type: str)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.download_assets(*, release: academic_observatory_workflows.oa_dashboard_workflow.release.OaDashboardRelease, bucket_name: str)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.download_institution_logos(*, release: academic_observatory_workflows.oa_dashboard_workflow.release.OaDashboardRelease)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.export_tables(*, release: academic_observatory_workflows.oa_dashboard_workflow.release.OaDashboardRelease, entity_types: list[str], download_bucket: str)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.download_data(*, release: academic_observatory_workflows.oa_dashboard_workflow.release.OaDashboardRelease, download_bucket: str)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.make_draft_zenodo_version(*, zenodo_conn_id: str, zenodo_host: str, conceptrecid: int)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.fetch_zenodo_versions(*, zenodo_conn_id: str, zenodo_host: str, conceptrecid: int)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.build_datasets(*, release: academic_observatory_workflows.oa_dashboard_workflow.release.OaDashboardRelease, entity_types: list[str], zenodo_versions: list[ZenodoVersion], start_year: int, end_year: int, readme_text: str)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.publish_zenodo_version(*, release: academic_observatory_workflows.oa_dashboard_workflow.release.OaDashboardRelease, version: str, bucket_name: str, zenodo_conn_id: str, zenodo_host: str, conceptrecid: int)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.upload_dataset(*, release: academic_observatory_workflows.oa_dashboard_workflow.release.OaDashboardRelease, version: str, bucket_name: str)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.repository_dispatch(*, github_conn_id: str)[source]
class academic_observatory_workflows.oa_dashboard_workflow.tasks.ZenodoVersion[source]
release_date: pendulum.DateTime[source]
download_url: str[source]
static from_dict(dict_: dict) ZenodoVersion[source]
to_dict() Dict[source]
class academic_observatory_workflows.oa_dashboard_workflow.tasks.Histogram[source]
data: List[int][source]
bins: List[float][source]
to_dict() Dict[source]
class academic_observatory_workflows.oa_dashboard_workflow.tasks.EntityHistograms[source]
p_outputs_open: Histogram[source]
n_outputs: Histogram[source]
n_outputs_open: Histogram[source]
to_dict() Dict[source]
class academic_observatory_workflows.oa_dashboard_workflow.tasks.EntityStats[source]
n_items: int[source]
min: Dict[source]
max: Dict[source]
median: Dict[source]
histograms: EntityHistograms[source]
to_dict() Dict[source]
class academic_observatory_workflows.oa_dashboard_workflow.tasks.Stats[source]
start_year: int[source]
end_year: int[source]
last_updated: str[source]
zenodo_versions: List[ZenodoVersion][source]
country: EntityStats[source]
institution: EntityStats[source]
to_dict() Dict[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.bq_query_to_gcs(*, query: str, project_id: str, destination_uri: str, location: str = 'us') bool[source]

Run a BigQuery query and save the results on Google Cloud Storage.

Parameters:
  • query – the query string.

  • project_id – the Google Cloud project id.

  • destination_uri – the Google Cloud Storage destination uri.

  • location – the BigQuery dataset location.

Returns:

the status of the job.

academic_observatory_workflows.oa_dashboard_workflow.tasks.save_oa_dashboard_dataset(download_folder: str, build_data_path: str, entity_types: List[str], zenodo_versions: List[ZenodoVersion], start_year: int, end_year: int)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.save_zenodo_dataset(download_folder: str, dataset_path: str, entity_types: List[str], readme_text: str)[source]

Save the COKI Open Access Dataset to a zip file.

Parameters:
  • download_folder – the path where the downloaded data files can be found.

  • dataset_path – the path to the folder where the dataset should be saved.

  • entity_types – the entity types.

  • readme_text – the readme text.

Returns:

None.

academic_observatory_workflows.oa_dashboard_workflow.tasks.oa_dashboard_subset(item: Dict) Dict[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.zenodo_subset(item: Dict)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.save_json(path: str, data: Dict | List)[source]

Save data to JSON.

Parameters:
  • path – the output path.

  • data – the data to save.

Returns:

None.

academic_observatory_workflows.oa_dashboard_workflow.tasks.data_file_pattern(download_folder: str, entity_type: str)[source]
academic_observatory_workflows.oa_dashboard_workflow.tasks.yield_data_glob(pattern: str) List[Dict][source]

Load country or institution data files into a Pandas DataFrame.

Parameters:

pattern – the file path including a glob pattern.

Returns:

the list of dicts.

academic_observatory_workflows.oa_dashboard_workflow.tasks.make_entity_stats(entities: List[Dict]) EntityStats[source]

Calculate stats for entities.

Parameters:

entities – a list of entities.

Returns:

the entity stats object.

academic_observatory_workflows.oa_dashboard_workflow.tasks.make_logo_url(*, entity_type: str, entity_id: str, size: str, fmt: str) str[source]

Make a logo url.

Parameters:
  • entity_type – the entity entity_type: country or institution.

  • entity_id – the entity id.

  • size – the size of the logo: s or l.

  • fmt – the format of the logo.

Returns:

the logo url.

Get the path to the logo for an institution. If the logo does not exist in the build path yet, download from the Clearbit Logo API tool. If the logo does not exist and failed to download, the path will default to “unknown.svg”.

Parameters:
  • ror_id – the institution’s ROR id

  • url – the URL of the company domain + suffix e.g. spotify.com

  • size – the image size of the small logo for tables etc.

  • width – the width of the image.

  • fmt – the image format.

  • build_path – the build path for files of this workflow

Returns:

The ROR id and relative path (from build path) to the logo

academic_observatory_workflows.oa_dashboard_workflow.tasks.clean_url(url: str) str[source]

Remove path and query from URL.

Parameters:

url – the url.

Returns:

the cleaned url.

academic_observatory_workflows.oa_dashboard_workflow.tasks.fetch_institution_logos(build_path: str, entities: List[Tuple[str, str]]) List[Dict][source]

Update the index with logos, downloading logos if they don’t exist.

Parameters:
  • build_path – the path to the build folder.

  • entities – the entities to process consisting of their id and url.

Returns:

None.