NEW PUBLICATION
Exploring the Intersections of Open Data and Generative AI: Recent Additions to the Observatory
Posted on 25th of October 2024 by Roshni Singh, Hannah Chafetz, Andrew Zahuranec, Stefaan Verhulst
The Open Data Policy Lab’s Observatory of Examples of How Open Data and Generative AI Intersect provides real-world use cases of where open data from official sources intersects with generative artificial intelligence (AI), building from the learnings from our report, "A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI.”
The Observatory includes over 80 examples from several domains and geographies–ranging from supporting administrative work within the legal department of the Government of France to assisting researchers across the African continent in navigating cross-border data sharing laws. The examples include generative AI chatbots to improve access to services, conversational tools to help analyze data, datasets to improve the quality of the AI output, and more. A key feature of the Observatory is its categorization across our Spectrum of Scenarios framework, shown below. Through this effort, we aim to bring together the work already being done and identify ways to use generative AI for the public good.
This Observatory is an attempt to grapple with the work currently being done to apply generative AI in conjunction with official open data. It does not make a value judgment on their efficacy or practices. Many of these examples have ethical implications, which merit further attention and study.
From September through October, we added to the Observatory:
Bayaan Platform: A conversational tool by the Statistics Centre Abu Dhabi that provides decision makers with data analytics and visualization support.
Berufsinfomat: A generative AI tool for career coaching in Austria.
ChatTCU: A chatbot for Brazil's Federal Court of Accounts.
City of Helsinki's AI Register: An initiative aimed at leveraging open city data to enhance civic services and facilitate better engagement with residents.
Climate Q&A: A generative AI chatbot that provides information about climate change based on scientific reports.
DataLaw.Bot: A generative AI tool that disseminates data sharing regulations with researchers across several African countries.
Farmer.chat: A chatbot that gives agricultural advice, drawing from research papers and data sources.
GeneSilico Copilot: A tool designed to provide oncologists with treatment decision support by utilizing comprehensive open data sources.
GenSpectrum Chat: A chatbot focused on COVID-19 genomic sequencing data.
IN.gov: A generative AI chatbot that aims to support citizens in navigating public services in Indiana.
KemenkeuGPT: Developed for Indonesia's Ministry of Finance, this chatbot uses various data sources to support policy makers.
LLM-POTUS Score: A tool that employs large language models (LLMs) to analyze United States presidential debate transcripts.
Quantitative Reasoning with Data Benchmark: A benchmark to assess LLM performance on data analysis tasks.
RAG for Culturally Inclusive Hakka Chatbots: A project focused on integrating culturally relevant data to improve the capabilities of large language models (LLMs).
The Virtual Intelligent Chat Assistant's Department of Statistics Proof of Concept: A proof of concept for analyzing data from the Department of Statistics.
Key Themes
We note several unique approaches in how these projects are deployed and the stakeholders involved in them:
Government Engagement in Generative AI Initiatives: Some of the examples highlight how government entities are using existing technical infrastructure and platforms to create context-specific tools or provide funding to researchers and companies to develop chatbots. For instance, KemenkeuGPT, was developed by researchers at the University of Nottingham and funded by Indonesia's Ministry of Finance.
Culturally-Specific Generative AI Solutions: Other additions focus on tailoring generative AI chatbots to specific cultural contexts. These chatbots are designed to address information gaps in general-purpose large language models (LLMs) by incorporating local languages, cultural traditions, and practices. For example, the RAG for Culturally Inclusive Hakka Chatbots aims to integrate culturally relevant data about the Hakka community in Taiwan. Likewise, Jugalbandi AI provides services in multiple local languages, aiming to improve access to government programs and rights information in India.
Enhancing Service Delivery and User Experience: Generative AI is being explored as a means to support public service delivery and improve user experiences on government websites. For example, the IN.gov chatbot assists residents in navigating public services in Indiana. Further examples are needed to better understand the overall impact and implementation of such technologies across various government platforms.
Improving Statistical Reasoning Capabilities in Generative AI: The Bayaan Platform is using generative AI to analyze and create visualizations from official statistics. Additionally, the Quantitative Reasoning with Data Benchmark provides a dataset that aims to improve the statistical reasoning capabilities of generative AI technologies. Further exploration of similar initiatives would provide a clearer understanding of this capability's implementation in practice.
***
Do you know of any real-world examples of generative AI and open data that should be included in the Observatory? Submit an example by visiting our Observatory.
Have any suggestions for improvements or are interested in collaborating? Please contact us at [email protected].