NEW RESOURCE

LAUNCH: First Ever Catalog of Examples on How Open Data and Generative AI Intersect

We are thrilled to announce the launch of our new Observatory of Examples of How Open Data and Generative Artificial Intelligence Intersect!

Posted on 29th of July 2024 by Jacob Isaacs, Hannah Chafetz, María Esther Cervantes, Stefaan G. Verhulst

LAUNCH: First Ever Catalog of Examples on How Open Data and Generative AI Intersect

The last few months, The Open Data Policy Lab has investigated how generative artificial intelligence (AI) and large language models (LLMs) can be augmented with open data from official sources (i.e. open government and research data, and official statistics) and how to democratize access to open data using generative AI – potentially enabling a new Fourth Wave of Open Data. This resulted in the report, "A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI.” The report includes five scenarios where open data and generative AI can intersect - each supported by various case studies from the field.

This new Observatory, launched today, goes a step further by providing additional concrete examples of how each of these scenarios is being operationalized across several domains, from government services to medical research. It offers valuable insights into the many ways generative AI and open data are intersecting and the evolving landscape of AI applications across various sectors.

What's in the Repository?

The Observatory features a diverse array of examples from across the globe, spanning public, private, and academic sectors. A key feature is the categorization of examples by scenario, which includes:

Screenshot 2024 07 29 at 8.38.12 am

Additionally, Each entry provides key details such as:

Project name and description
Sector and location
Use case scenario (pretraining, adaptation, inference and insight generation, data augmentation, and open ended exploration)
Start date
Links to additional information (where available)

Researchers, policymakers, and technology innovators can use this Observatory to:

Identify emerging trends in generative AI and open data use
Discover potential collaborators or case studies
Understand the global landscape of generative AI initiatives
Gain inspiration for new generative AI applications

How We Curated the Observatory

We found these case studies by conducting desk research, interviews, Open Data Action Labs (or remote studios with diverse experts), and by attending several industry events. We reviewed the latest academic papers, government announcements, and technology blogs among other sources. Additionally, we included all examples from our recent report, “A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI.” Through these activities, we focused on projects and initiatives that specifically use generative AI techniques with open government and research data, ensuring a targeted and relevant collection.

Key Trends and Takeaways

Public Sector Innovation: Governments worldwide are beginning to use generative AI to improve public services, from chatbots like DC Compass and Hamburg's LLMoin to more specialized tools like France's LLaMandement for analyzing legal bills.
Health and Biomedical Advancements: We found several initiatives (e.g. SELENA+, ChatDoctor, and Med-PaLM2) applying generative AI to improve medical research and healthcare delivery, often utilizing public health and open medical research datasets.
Multilingual and Cross-Border Initiatives: While the majority of the use cases in the Observatory are in English (which may be the result of the research team’s primary language), initiatives focused on using generative AI tools for non-English languages such as CroissantLLM and TitiBot are emerging.
Environmental and Geospatial Applications: We identified examples of generative AI for mapping and earth observation such as NASA's Harmonized Landsat and Sentinel-2 (HLS) and Clay.
Synthetic Data Generation: Several initiatives, including those by Statistics Canada and Australian researchers, are exploring the use of synthetic data to generate data for AI training while protecting privacy.

Explore and Contribute to the Observatory

Whether you're a researcher, policymaker, or simply curious about the latest generative AI innovations, our collection offers valuable insights into this rapidly evolving field.

Click here to access the full database. We encourage you to share your thoughts, suggest additional projects, or reach out if you'd like to collaborate on further research in this area.

The Observatory will be continuously updated as new developments arise. If you notice any use cases not listed or have suggestions for improvements, please contact us at [email protected].

Back to the Blog

Blog

Stay up-to-date with what we are doing