academic_observatory_workflows.model ==================================== .. py:module:: academic_observatory_workflows.model Attributes ---------- .. autoapisummary:: academic_observatory_workflows.model.LICENSES academic_observatory_workflows.model.EVENT_TYPES academic_observatory_workflows.model.OUTPUT_TYPES academic_observatory_workflows.model.FUNDREF_COUNTRY_CODES academic_observatory_workflows.model.FUNDREF_REGIONS academic_observatory_workflows.model.FUNDING_BODY_TYPES academic_observatory_workflows.model.FUNDING_BODY_SUBTYPES academic_observatory_workflows.model.InstitutionList academic_observatory_workflows.model.AuthorList academic_observatory_workflows.model.FunderList academic_observatory_workflows.model.PublisherList academic_observatory_workflows.model.PaperList academic_observatory_workflows.model.FieldOfStudyList academic_observatory_workflows.model.EventsList academic_observatory_workflows.model.RepositoryList Classes ------- .. autoapisummary:: academic_observatory_workflows.model.Repository academic_observatory_workflows.model.Institution academic_observatory_workflows.model.Paper academic_observatory_workflows.model.AccessType academic_observatory_workflows.model.COKIOpenAccess academic_observatory_workflows.model.PublisherCategories academic_observatory_workflows.model.OtherPlatformCategories academic_observatory_workflows.model.Author academic_observatory_workflows.model.Funder academic_observatory_workflows.model.Publisher academic_observatory_workflows.model.FieldOfStudy academic_observatory_workflows.model.Journal academic_observatory_workflows.model.Event academic_observatory_workflows.model.ObservatoryDataset Functions --------- .. autoapisummary:: academic_observatory_workflows.model.date_between_dates academic_observatory_workflows.model.make_doi academic_observatory_workflows.model.make_observatory_dataset academic_observatory_workflows.model.make_funders academic_observatory_workflows.model.make_publishers academic_observatory_workflows.model.make_fields_of_study academic_observatory_workflows.model.make_authors academic_observatory_workflows.model.make_papers academic_observatory_workflows.model.make_open_citations academic_observatory_workflows.model.make_crossref_events academic_observatory_workflows.model.make_scihub academic_observatory_workflows.model.make_unpaywall academic_observatory_workflows.model.make_openalex_dataset academic_observatory_workflows.model.make_orcid academic_observatory_workflows.model.make_pubmed academic_observatory_workflows.model.make_crossref_fundref academic_observatory_workflows.model.make_crossref_metadata academic_observatory_workflows.model.bq_load_observatory_dataset academic_observatory_workflows.model.aggregate_events academic_observatory_workflows.model.sort_events academic_observatory_workflows.model.make_doi_table academic_observatory_workflows.model.make_doi_events academic_observatory_workflows.model.make_doi_funders academic_observatory_workflows.model.make_doi_journals academic_observatory_workflows.model.to_affiliations_list academic_observatory_workflows.model.make_doi_publishers academic_observatory_workflows.model.make_doi_institutions academic_observatory_workflows.model.make_doi_countries academic_observatory_workflows.model.make_doi_regions academic_observatory_workflows.model.make_doi_subregions academic_observatory_workflows.model.calc_percent academic_observatory_workflows.model.make_aggregate_table Module Contents --------------- .. py:data:: LICENSES :value: ['cc-by', None] .. py:data:: EVENT_TYPES :value: ['f1000', 'stackexchange', 'datacite', 'twitter', 'reddit-links', 'wordpressdotcom', 'plaudit',... .. py:data:: OUTPUT_TYPES :value: ['journal_articles', 'book_sections', 'authored_books', 'edited_volumes', 'reports', 'datasets',... .. py:data:: FUNDREF_COUNTRY_CODES :value: ['usa', 'gbr', 'aus', 'can'] .. py:data:: FUNDREF_REGIONS .. py:data:: FUNDING_BODY_TYPES :value: ['For-profit companies (industry)', 'Trusts, charities, foundations (both public and private)',... .. py:data:: FUNDING_BODY_SUBTYPES .. py:class:: Repository A repository. .. py:attribute:: name :type: str .. py:attribute:: endpoint_id :type: str :value: None .. py:attribute:: pmh_domain :type: str :value: None .. py:attribute:: url_domain :type: str :value: None .. py:attribute:: category :type: str :value: None .. py:attribute:: ror_id :type: str :value: None .. py:method:: _key() .. py:method:: __eq__(other) .. py:method:: __hash__() .. py:method:: from_dict(dict_: Dict) :staticmethod: .. py:class:: Institution An institution. :param id: unique identifier. :param name: the institution's name. :param grid_id: the institution's GRID id. :param ror_id: the institution's ROR id. :param country_code: the institution's country code. :param country_code_2: the institution's country code. :param subregion: the institution's subregion. :param papers: the papers published by the institution. :param types: the institution type. :param country: the institution country name. :param coordinates: the institution's coordinates. .. py:attribute:: id :type: int .. py:attribute:: name :type: str :value: None .. py:attribute:: grid_id :type: str :value: None .. py:attribute:: ror_id :type: str :value: None .. py:attribute:: country_code :type: str :value: None .. py:attribute:: country_code_2 :type: str :value: None .. py:attribute:: region :type: str :value: None .. py:attribute:: subregion :type: str :value: None .. py:attribute:: papers :type: List[Paper] :value: None .. py:attribute:: types :type: str :value: None .. py:attribute:: country :type: str :value: None .. py:attribute:: coordinates :type: str :value: None .. py:attribute:: repository :type: Repository :value: None .. py:function:: date_between_dates(start_ts: int, end_ts: int) -> pendulum.DateTime Return a datetime between two timestamps. :param start_ts: the start timestamp. :param end_ts: the end timestamp. :return: the DateTime datetime. .. py:class:: Paper A paper. :param id: unique identifier. :param doi: the DOI of the paper. :param title: the title of the paper. :param published_date: the date the paper was published. :param output_type: the output type, see OUTPUT_TYPES. :param authors: the authors of the paper. :param funders: the funders of the research published in the paper. :param journal: the journal this paper is published in. :param publisher: the publisher of this paper (the owner of the journal). :param events: a list of events related to this paper. :param cited_by: a list of papers that this paper is cited by. :param fields_of_study: a list of the fields of study of the paper. :param license: the papers license at the publisher. :param is_free_to_read_at_publisher: whether the paper is free to read at the publisher. :param repositories: the list of repositories where the paper can be read. .. py:attribute:: id :type: int .. py:attribute:: doi :type: str :value: None .. py:attribute:: title :type: str :value: None .. py:attribute:: type :type: str :value: None .. py:attribute:: published_date :type: pendulum.Date :value: None .. py:attribute:: output_type :type: str :value: None .. py:attribute:: authors :type: List[Author] :value: None .. py:attribute:: funders :type: List[Funder] :value: None .. py:attribute:: journal :type: Journal :value: None .. py:attribute:: publisher :type: Publisher :value: None .. py:attribute:: events :type: List[Event] :value: None .. py:attribute:: cited_by :type: List[Paper] :value: None .. py:attribute:: fields_of_study :type: List[FieldOfStudy] :value: None .. py:attribute:: publisher_license :type: str :value: None .. py:attribute:: publisher_is_free_to_read :type: bool :value: False .. py:attribute:: repositories :type: List[Repository] :value: None .. py:attribute:: in_scihub :type: bool :value: False .. py:attribute:: in_unpaywall :type: bool :value: True .. py:property:: access_type :type: AccessType Return the access type for the paper. :return: AccessType. .. py:property:: oa_coki :type: COKIOpenAccess Return the access type for the paper. :return: AccessType. .. py:class:: AccessType The access type of a paper. :param oa: whether the paper is open access or not. :param green: when the paper is available in an institutional repository. :param gold: when the paper is an open access journal or (it is not in an open access journal and is free to read at the publisher and has an open access license). :param gold_doaj: when the paper is an open access journal. :param hybrid: where the paper is free to read at the publisher, it has an open access license and the journal is not open access. :param bronze: when the paper is free to read at the publisher website however there is no license. :param green_only: where the paper is not free to read from the publisher, however it is available at an :param black: where the paper is available at SciHub. institutional repository. .. py:attribute:: oa :type: bool :value: None .. py:attribute:: green :type: bool :value: None .. py:attribute:: gold :type: bool :value: None .. py:attribute:: gold_doaj :type: bool :value: None .. py:attribute:: hybrid :type: bool :value: None .. py:attribute:: bronze :type: bool :value: None .. py:attribute:: green_only :type: bool :value: None .. py:attribute:: black :type: bool :value: None .. py:class:: COKIOpenAccess The COKI Open Access types. :param open: . :param closed: . :param publisher: . :param other_platform: . :param publisher_only: . :param both: . :param other_platform_only: . :param publisher_categories: . :param other_platform_categories: . .. py:attribute:: open :type: bool :value: None .. py:attribute:: closed :type: bool :value: None .. py:attribute:: publisher :type: bool :value: None .. py:attribute:: other_platform :type: bool :value: None .. py:attribute:: publisher_only :type: bool :value: None .. py:attribute:: both :type: bool :value: None .. py:attribute:: other_platform_only :type: bool :value: None .. py:attribute:: publisher_categories :type: PublisherCategories :value: None .. py:attribute:: other_platform_categories :type: OtherPlatformCategories :value: None .. py:class:: PublisherCategories The publisher open subcategories. :param oa_journal: . :param hybrid: . :param no_guarantees: . .. py:attribute:: oa_journal :type: bool :value: None .. py:attribute:: hybrid :type: bool :value: None .. py:attribute:: no_guarantees :type: bool :value: None .. py:class:: OtherPlatformCategories The other platform open subcategories :param preprint: . :param domain: . :param institution: . :param public: . :param aggregator: . :param other_internet: . :param unknown: . .. py:attribute:: preprint :type: bool :value: None .. py:attribute:: domain :type: bool :value: None .. py:attribute:: institution :type: bool :value: None .. py:attribute:: public :type: bool :value: None .. py:attribute:: aggregator :type: bool :value: None .. py:attribute:: other_internet :type: bool :value: None .. py:attribute:: unknown :type: bool :value: None .. py:class:: Author An author. :param id: unique identifier. :param name: the name of the author. :param institution: the author's institution. .. py:attribute:: id :type: int .. py:attribute:: name :type: str :value: None .. py:attribute:: institution :type: Institution :value: None .. py:class:: Funder A research funder. :param id: unique identifier. :param name: the name of the funder. :param doi: the DOI of the funder. :param country_code: the country code of the funder. :param region: the region the funder is located in. :param funding_body_type: the funding body type, see FUNDING_BODY_TYPES. :param funding_body_subtype: the funding body subtype, see FUNDING_BODY_SUBTYPES. .. py:attribute:: id :type: int .. py:attribute:: name :type: str :value: None .. py:attribute:: doi :type: str :value: None .. py:attribute:: country_code :type: str :value: None .. py:attribute:: region :type: str :value: None .. py:attribute:: funding_body_type :type: str :value: None .. py:attribute:: funding_body_subtype :type: str :value: None .. py:class:: Publisher A publisher. :param id: unique identifier. :param name: the name of the publisher. :param doi_prefix: the publisher DOI prefix. :param journals: the journals owned by the publisher. .. py:attribute:: id :type: int .. py:attribute:: name :type: str :value: None .. py:attribute:: doi_prefix :type: int :value: None .. py:attribute:: journals :type: List[Journal] :value: None .. py:class:: FieldOfStudy A field of study. :param id: unique identifier. :param name: the field of study name. :param level: the field of study level. .. py:attribute:: id :type: int .. py:attribute:: name :type: str :value: None .. py:attribute:: level :type: int :value: None .. py:class:: Journal A journal :param id: unique identifier. :param name: the journal name. :param name: the license that articles are published under by the journal. .. py:attribute:: id :type: int .. py:attribute:: name :type: str :value: None .. py:attribute:: license :type: str :value: None .. py:class:: Event An event. :param source: the source of the event, see EVENT_TYPES. :param event_date: the date of the event. .. py:attribute:: source :type: str :value: None .. py:attribute:: event_date :type: pendulum.DateTime :value: None .. py:data:: InstitutionList .. py:data:: AuthorList .. py:data:: FunderList .. py:data:: PublisherList .. py:data:: PaperList .. py:data:: FieldOfStudyList .. py:data:: EventsList .. py:data:: RepositoryList .. py:class:: ObservatoryDataset The generated observatory dataset. :param institutions: list of institutions. :param authors: list of authors. :param funders: list of funders. :param publishers: list of publishers. :param papers: list of papers. :param fields_of_study: list of fields of study. :param fields_of_study: list of fields of study. .. py:attribute:: institutions :type: InstitutionList .. py:attribute:: authors :type: AuthorList .. py:attribute:: funders :type: FunderList .. py:attribute:: publishers :type: PublisherList .. py:attribute:: papers :type: PaperList .. py:attribute:: fields_of_study :type: FieldOfStudyList .. py:attribute:: repositories :type: RepositoryList .. py:function:: make_doi(doi_prefix: int) Makes a randomised DOI given a DOI prefix. :param doi_prefix: the DOI prefix. :return: the DOI. .. py:function:: make_observatory_dataset(institutions: List[Institution], repositories: List[Repository], n_funders: int = 5, n_publishers: int = 5, n_authors: int = 10, n_papers: int = 100, n_fields_of_study_per_level: int = 5) -> ObservatoryDataset Generate an observatory dataset. :param institutions: a list of institutions. :param repositories: a list of known repositories. :param n_funders: the number of funders to generate. :param n_publishers: the number of publishers to generate. :param n_authors: the number of authors to generate. :param n_papers: the number of papers to generate. :param n_fields_of_study_per_level: the number of fields of study to generate per level. :return: the observatory dataset. .. py:function:: make_funders(*, n_funders: int, doi_prefix: int, faker: faker.Faker) -> FunderList Make the funders ground truth dataset. :param n_funders: number of funders to generate. :param doi_prefix: the DOI prefix for the funders. :param faker: the faker instance. :return: a list of funders. .. py:function:: make_publishers(*, n_publishers: int, doi_prefix: int, faker: faker.Faker, min_journals_per_publisher: int = 1, max_journals_per_publisher: int = 3) -> PublisherList Make publishers ground truth dataset. :param n_publishers: number of publishers. :param doi_prefix: the publisher DOI prefix. :param faker: the faker instance. :param min_journals_per_publisher: the min number of journals to generate per publisher. :param max_journals_per_publisher: the max number of journals to generate per publisher. :return: .. py:function:: make_fields_of_study(*, n_fields_of_study_per_level: int, faker: faker.Faker, n_levels: int = 6, min_title_length: int = 1, max_title_length: int = 3) -> FieldOfStudyList Generate the fields of study for the ground truth dataset. :param n_fields_of_study_per_level: the number of fields of study per level. :param faker: the faker instance. :param n_levels: the number of levels. :param min_title_length: the minimum field of study title length (words). :param max_title_length: the maximum field of study title length (words). :return: a list of the fields of study. .. py:function:: make_authors(*, n_authors: int, institutions: InstitutionList, faker: faker.Faker) -> AuthorList Generate the authors ground truth dataset. :param n_authors: the number of authors to generate. :param institutions: the institutions. :param faker: the faker instance. :return: a list of authors. .. py:function:: make_papers(*, n_papers: int, authors: AuthorList, funders: FunderList, publishers: PublisherList, fields_of_study: List, repositories: List[Repository], faker: faker.Faker, min_title_length: int = 2, max_title_length: int = 10, min_authors: int = 1, max_authors: int = 10, min_funders: int = 0, max_funders: int = 3, min_events: int = 0, max_events: int = 100, min_fields_of_study: int = 1, max_fields_of_study: int = 20, min_repos: int = 1, max_repos: int = 10, min_year: int = 2017, max_year: int = 2021) -> PaperList Generate the list of ground truth papers. :param n_papers: the number of papers to generate. :param authors: the authors list. :param funders: the funders list. :param publishers: the publishers list. :param fields_of_study: the fields of study list. :param repositories: the repositories. :param faker: the faker instance. :param min_title_length: the min paper title length. :param max_title_length: the max paper title length. :param min_authors: the min number of authors for each paper. :param max_authors: the max number of authors for each paper. :param min_funders: the min number of funders for each paper. :param max_funders: the max number of funders for each paper. :param min_events: the min number of events per paper. :param max_events: the max number of events per paper. :param min_fields_of_study: the min fields of study per paper. :param max_fields_of_study: the max fields of study per paper. :param min_repos: the min repos per paper when green. :param max_repos: the max repos per paper when green. :param min_year: the min year. :param max_year: the max year. :return: the list of papers. .. py:function:: make_open_citations(dataset: ObservatoryDataset) -> List[Dict] Generate an Open Citations table from an ObservatoryDataset instance. :param dataset: the Observatory Dataset. :return: table rows. .. py:function:: make_crossref_events(dataset: ObservatoryDataset) -> List[Dict] Generate the Crossref Events table from an ObservatoryDataset instance. :param dataset: the Observatory Dataset. :return: table rows. .. py:function:: make_scihub(dataset: ObservatoryDataset) -> List[Dict] Generate the SciHub table from an ObservatoryDataset instance. :param dataset: the Observatory Dataset. :return: table rows. .. py:function:: make_unpaywall(dataset: ObservatoryDataset) -> List[Dict] Generate the Unpaywall table from an ObservatoryDataset instance. :param dataset: the Observatory Dataset. :return: table rows. .. py:function:: make_openalex_dataset(dataset: ObservatoryDataset) -> List[dict] Generate the OpenAlex table data from an ObservatoryDataset instance. :param dataset: the Observatory Dataset. :return: OpenAlex table data. .. py:function:: make_orcid(dataset: ObservatoryDataset) -> List[Dict] .. py:function:: make_pubmed(dataset: ObservatoryDataset) -> List[Dict] Generate the Pubmed table from an ObservatoryDataset instance. :param dataset: the Observatory Dataset. :return: table rows. .. py:function:: make_crossref_fundref(dataset: ObservatoryDataset) -> List[Dict] Generate the Crossref Fundref table from an ObservatoryDataset instance. :param dataset: the Observatory Dataset. :return: table rows. .. py:function:: make_crossref_metadata(dataset: ObservatoryDataset) -> List[Dict] Generate the Crossref Metadata table from an ObservatoryDataset instance. :param dataset: the Observatory Dataset. :return: table rows. .. py:function:: bq_load_observatory_dataset(observatory_dataset: ObservatoryDataset, repository: List[Dict], bucket_name: str, dataset_id_all: str, dataset_id_settings: str, snapshot_date: pendulum.DateTime, project_id: str) Load the fake Observatory Dataset in BigQuery. :param observatory_dataset: the Observatory Dataset. :param repository: the repository table data. :param bucket_name: the Google Cloud Storage bucket name. :param dataset_id_all: the dataset id for all data tables. :param dataset_id_settings: the dataset id for settings tables. :param snapshot_date: the release date for the observatory dataset. :param project_id: api project id. :return: None. .. py:function:: aggregate_events(events: List[Event]) -> Tuple[List[Dict], List[Dict], List[Dict]] Aggregate events by source into total events for all time, monthly and yearly counts. :param events: list of events. :return: list of events for each source aggregated by all time, months and years. .. py:function:: sort_events(events: List[Dict], months: List[Dict], years: List[Dict]) Sort events in-place. :param events: events all time. :param months: events by month. :param years: events by year. :return: None. .. py:function:: make_doi_table(dataset: ObservatoryDataset) -> List[Dict] Generate the DOI table from an ObservatoryDataset instance. :param dataset: the Observatory Dataset. :return: table rows. .. py:function:: make_doi_events(doi: str, event_list: EventsList) -> Dict Make the events for a DOI table row. :param doi: the DOI. :param event_list: a list of events for the paper. :return: the events for the DOI table. .. py:function:: make_doi_funders(funder_list: FunderList) -> List[Dict] Make a DOI table row funders affiliation list. :param funder_list: the funders list. :return: the funders affiliation list. .. py:function:: make_doi_journals(in_unpaywall: bool, journal: Journal) -> List[Dict] Make the journal affiliation list for a DOI table row. :param in_unpaywall: whether the work is in Unpaywall or not. At the moment the journal IDs come from Unpaywall, and if the work is not in Unpaywall then the journal id and name will be None. :param journal: the paper's journal. :return: the journal affiliation list. .. py:function:: to_affiliations_list(dict_: Dict) Convert affiliation dict into a list. :param dict_: affiliation dict. :return: affiliation list. .. py:function:: make_doi_publishers(publisher: Publisher) -> List[Dict] Make the publisher affiliations for a DOI table row. :param publisher: the paper's publisher. :return: the publisher affiliations list. .. py:function:: make_doi_institutions(author_list: AuthorList) -> List[Dict] Make the institution affiliations for a DOI table row. :param author_list: the paper's author list. :return: the institution affiliation list. .. py:function:: make_doi_countries(author_list: AuthorList) Make the countries affiliations for a DOI table row. :param author_list: the paper's author list. :return: the countries affiliation list. .. py:function:: make_doi_regions(author_list: AuthorList) Make the regions affiliations for a DOI table row. :param author_list: the paper's author list. :return: the regions affiliation list. .. py:function:: make_doi_subregions(author_list: AuthorList) Make the subregions affiliations for a DOI table row. :param author_list: the paper's author list. :return: the subregions affiliation list. .. py:function:: calc_percent(value: float, total: float) -> float Calculate a percentage and round to 2dp. :param value: the value. :param total: the total. :return: the percentage. .. py:function:: make_aggregate_table(agg: str, dataset: ObservatoryDataset) -> List[Dict] Generate an aggregate table from an ObservatoryDataset instance. :param agg: the aggregation type, e.g. country, institution. :param dataset: the Observatory Dataset. :return: table rows.