academic_observatory_workflows.wikipedia ======================================== .. py:module:: academic_observatory_workflows.wikipedia Attributes ---------- .. autoapisummary:: academic_observatory_workflows.wikipedia.WIKI_MAX_TITLES Functions --------- .. autoapisummary:: academic_observatory_workflows.wikipedia.fetch_wikipedia_descriptions academic_observatory_workflows.wikipedia.get_wikipedia_title academic_observatory_workflows.wikipedia.fetch_wikipedia_descriptions_batch academic_observatory_workflows.wikipedia.remove_text_between_brackets academic_observatory_workflows.wikipedia.shorten_text_full_sentences Module Contents --------------- .. py:data:: WIKI_MAX_TITLES :value: 20 .. py:function:: fetch_wikipedia_descriptions(wikipedia_urls: List[str]) -> List[Tuple[str, str]] Get the wikipedia descriptions for each entity (institution or country). :param wikipedia_urls: a list of Wikipedia URLs. :return: a list of tuples containing Wikipedia URL and Wikipedia description. .. py:function:: get_wikipedia_title(url: str) -> str Get a Wikipedia title from a Wikipedia URL. :param url: a Wikipedia URL. :return: the title. .. py:function:: fetch_wikipedia_descriptions_batch(urls: List) -> List[Tuple[str, str]] Fetch the wikipedia descriptions for a set of Wikipedia URLs :param urls: a list of Wikipedia URLs. :return: List with tuples (id, wiki description) .. py:function:: remove_text_between_brackets(text: str) -> str Remove any text between (nested) brackets. If there is a space after the opening bracket, this is removed as well. E.g. 'Like this (foo, (bar)) example' -> 'Like this example' :param text: The text to modify :return: The modified text .. py:function:: shorten_text_full_sentences(text: str, *, char_limit: int = 300) -> str Shorten a text to as many complete sentences as possible, while the total number of characters stays below the char_limit. Always return at least one sentence, even if this exceeds the char_limit. :param text: A string with the complete text :param char_limit: The max number of characters :return: The shortened text.