Event
Fourth Wave of Open Data Seminar: Why the Fourth Wave Matters
The GovLab talks with experts on the past, present, and future of open data
Posted on 10th of March 2025 by Andrew Zahuranec, Stefaan Verhulst
Generative AI tools have attracted enormous attention. Yet, questions remain about how it intersects with open data. How can generative AI help people better engage with open data? How can open data be used to enable societal beneficial use cases of generative AI? How can we address the many risks and challenges facing these systems?
Last year, we released a report arguing in favor of there being a “fourth wave of open data”. Building on previous waves in the open data movement, we argued that the intersection of open data and generative AI could improve both tools individually. It could improve open data by helping people better interface with it and improve generative AI by providing availability of data for training. We followed this by releasing a catalogue of examples demonstrating real applications of this approach.
Yet this is a fast moving environment. To keep current on the ways that generative AI is being harnessed for the public good, we kicked off the first discussion in a series focused on the Fourth Wave of Open Data.
Emerging from the intersection of AI and open data, this new era envisions open data as more accessible through conversational interfaces, AI-ready for advanced applications, and foundational to data commons that can drive innovation while addressing public interest challenges.
In this introductory episode, The GovLab’s Stefaan Verhulst spoke with Renata Ávila (Open Knowledge), Gretchen Deo (Microsoft), and Dr. Anastasija Nikiforova (University of Tartu) on the fourth wave approach and what it means conceptually.
The Past, Present, and Future of Open Data
Stefaan opened the panel with a brief explanation of how the open data movement has changed over the last few years — moving from a focus on freedom of information to open-by-default to publish with purpose. In this latest wave, the fourth wave, we see efforts to use AI services to transform how people access and process open data.
This includes making data readable for AI systems, embracing data commons infrastructure that can support well-designed AI tools, making generative AI more conversational, and improving data quality and provenance.
Stefaan highlighted some examples of how open data and generative AI are already intersecting.
Movement Toward Public Empowerment
Renata Ávila of Open Knowledge argued that citizens often have a more sophisticated understanding of their data and how it shapes systems than is commonly appreciated.
“The first thing that many people do when they interact with open and generative systems is to ask about themselves, and then they ask about the place they are from, and then they ask about the discipline they are experts on,” she noted.
They are aware of data biases and gaps. They test systems to check whether these problems are taken into account, often checking against the subjects and disciplines they are experts in.
While the robust open data movement of 2010 has faded away—along with the enthusiasm and tools it carried with it—Renata argued that generative AI allows people to do much more and connect in unprecedented ways. It enables public administrations to better respond to and connect to citizen needs.
The problem is that the support, programs, and funding for citizens initiatives is no longer present. While new technological innovations could allow organizations to achieve much more with generative AI and data, a combination of austerity and shifting public administration priorities means that enthusiasm and material support is no longer present. Meanwhile, activism and citizen engagement has moved onto other problems.
Instead of a standalone movement, Renata argued for the use of public interest data and AI to be transversal. The skills and tools allocated inside the initiatives ought to be directed toward tackling the most pressing problems of our times, such as climate change.
The Challenge of a “Data Winter”
Dr. Anastasija Nikiforova reinforced these points. While noting that there is indeed a lot of enthusiasm around these new systems, she cautioned that open data (movement) had stagnated since 2010.
“We are seeing the democratization of AI. But in regards to the data, that is not the case [...] there’s been a stagnation,” she said.
Instead of greater openness, Dr. Nikiforova worried about “data hoarding” by governments and AI leaders. Understanding that there is a competitive advantage in having data to train systems, data holders worldwide (including businesses) seem increasingly unwilling to share their assets. There have been efforts by the EU and others to encourage sharing to counteract this phenomenon.
Much of this connects to what we describe as a “data winter”, where data assets for the common good are frozen and immobilized.
If data can be made accessible, Dr. Nikiforova argued that it could allow for better systems, where users are not merely publishers of data but users as well. However, data quality needs to be a prerequisite of availability and accessibility. It needs to go beyond a common open data understanding that is limited to questions of completeness and accuracy and look at metadata as well. Today, metadata and data quality is often questionable, leading to poor outcomes that undermine public trust.
She echoed the words of Renata, “metadata is the queen”.
She highlighted the increasing advances around conversational agents and how they were helping reduce language barriers among other use-cases. She closed by offering one possible approach to addressing these issues—synthetic data.
“This will definitely allow us to have more data and especially the data that is coming from crucial sectors where we, in the past, would not be able to really imagine having this data—such as in medical and healthcare-related fields where privacy concerns limited access.”
Private Sector Perspective on Generative AI and Open Data
Finally, Gretchen Deo of Microsoft offered her own reflections on generative AI. Noting the technology’s rapid development, she described how industry was interested in ways that AI tools could be scaled in a way to be broadly beneficial and benefit multiple sectors. “Generative AI and the developments in generative AI are really moving fast,” Gretchen said.
She then spoke about how, as a maker of AI, Microsoft looked at it in terms of inputs and outputs. On inputs, generative AI really does rely on vast and varied datasets for training a model that can identify themes, patterns, and correlations. Access to data at scale is critical to ensuring that AI systems are accurate and produce useful insights.
On the output side, she highlighted the exciting potential given what has already been done with traditional data portals. She highlighted the opportunity offered by having an AI interface on open data so users can interact with it in a more natural way and ask questions about the data without having a data science skillset.
***
These were just a few of the reflections offered in our first seminar on the Fourth Wave of Open Data. To follow the full discussion, watch the video above.
We plan to host our second webinar on “Making Open Data Conversational”. Stay tuned for the announcement of when it is scheduled.