Blog Post

Reimagining Data Access, Readiness, and Governance in the Age of AI: Reflections from the Fourth Annual State of Open Data Policy Summit

Posted on 25th of September 2025 by Christopher Rosselot, Hannah Chafetz, Andrew Zahuranec, Stefaan Verhulst

The State of Open Data Policy Summit is an annual conference hosted by the Open Data Policy Lab (a collaboration between The GovLab and Microsoft) to explore how policy and technology shape access to data for public interest re-use.

In past years, the summit has looked at ways institutions have pursued purpose-driven data re-use and operationalized collaborative governance models. It has also examined the possible challenges represented by a “data winter”—a period marked by reduced data access.

On 4 September 2025, the Open Data Policy Lab hosted its Fourth Annual State of Open Data Policy Summit to focus on how generative AI and open data intersect. In the below, we provide major takeaways. Watch the full recording here.

Introduction to the Summit

The GovLab’s Co-Founder Stefaan Verhulst introduced the summit by offering three observations to reflect on the current state of open data:

Observation 1: Mass adoption of generative AI signals entrance into the Fourth Wave of Open Data, where open data augments artificial intelligence (AI) and AI democratizes access to open data.
Observation 2: Despite an "AI Summer" of AI increasingly applied in the public interest, the availability and accessibility of open data is rapidly declining - leading to a “Data Winter”. Countries restrict access to national climate data, for instance, while social media companies tightly control platform data.
Observation 3: New types of data policies are emerging. Some countries are reconsidering whether existing data governance frameworks are still fit-for-purpose in the age of AI. New policies like the United Kingdom's Data (Use and Access) Act, reinvigorate access to public and private sector data. Other countries are developing platforms and data libraries to amass national collections of data, like India's National Data Platform. Additionally, international bodies are producing steering frameworks like the Global Digital Compact to ensure that the benefits of digital data are distributed equitably.

Framing the Moment: From Open by Default to AI-Ready by Design

Amandeep Singh Gill, Under-Secretary-General and Special Envoy for Digital and Emerging Technologies at the United Nations, then shared opening remarks. Under-Secretary-General Gill noted that the world needs a mindset shift where “data is the soil for inclusive innovation, soil for the growth of the digital economy. How can we incentivize this mindset shift from oil [implying competition] to soil, and how can we incentivize access and sharing of open data in the public interest?” He explained that this mindset shift requires global frameworks and standards that foreground transparency and accountability in shaping future digital and data markets.

One such framework is the Global Digital Compact (GDC), which 193 countries adopted by consensus last year. He noted that one of the GDC's five pillars is data, which led to the creation of a high-level multi-stakeholder working group on data governance. This working group will focus on the public-interest themes of access, openness, data commons, and data for development.

In the age of AI, "openness" needs to be clearly defined throughout the cycle of producing, curating, processing, and (re)-using data.

Public Sector Panel - Open Data Policy and Data Governance in a Fragmented World

Stefaan Verhulst moderated a discussion between Johannes Jütting (Executive Head of PARIS21), Oliver Wise, (Former Chief Data Officer and Acting Under Secretary for Economic Affairs at the United States Department of Commerce), and Ambassador Philip Thigo (Special Envoy on Technology for the Republic of Kenya). The discussion focused on how governments can provide official and reliable information for the AI era.

Ambassador Thigo reminded the audience that the majority of the world is grappling with the “First Wave”, that being the process of producing and opening data. Before reaching the "Fourth Wave" question of making sure that AI training data is debiased and representative, the principles of previous waves—transparency, accountability, and interoperability—need to be put into practice. He also highlighted the equalizing and market-shaping role of government in development contexts, saying, “We need to have sort of a shared stewardship that no one actor owns these sort of high value datasets that have potentially high impact, especially for development, decision making, outcomes.”

Oliver Wise steered the discussion towards making official data findable through AI. The vision, according to Wise, is, “a user experience where a user of any sophistication level can use free and widely available generative AI to get answers to their questions, and if the answers to their questions rely on public statistical data, that those AI systems can reliably find [accurate] data.” Based on United States Department of Commerce guidelines, Wise outlined three approaches:

Chatbots: Training GPTs on the nuance of public data (CensusGPT)
Data Management: Using variable-level metadata to ensure that official data is machine interpretable so that existing AI tools can find and interpret data (schema.org)
Model Context Protocols (MCPs): Exposing instructions to AI agents on how to interpret data without having to change the data itself (Anthropic MCP builder)

Johannes Jütting (Paris21) paused to appreciate the current "summer" of national statistical data, that being the increased capacity in over 162 countries to produce and govern official data. However, he had concerns as well, fearing that the public may not understand the importance of high-quality official data. National statistical offices need to invest more resources in engaging and educating citizens.

The panel closed with reflections on how to produce and use high-quality data in the age of AI. All panelists recognized the importance of creating standards and practices for producing unstructured data and Johannes noted that, "[Producing] quality data isn't cheap. It costs money, it costs resources, it costs technical skills. And I think we need to be more vocal that [...] good quality data is important for democracies and for the SDGs."

Private Sector and Civil Society Panel - Unlocking Value Responsibly

The Summit then hosted a panel with private sector and civil society representatives on inclusive and collaborative forms of governing open data. John Wilbanks (Expert in scientific ML/AI data products, platforms, and governance) moderated the conversation between Claire Foulquier-Gazagnes (Member of the Board at Current AI), Michele Jawando (President of the Omidyar Network), Venkatesh Hariharan (India Representative in the Open Invention Network), and Jeremy Rollison (Head of EU Policy for Microsoft).

“Data is not just important; it's truly foundational. Without high quality and diverse data, we cannot have public interest AI,” said Claire Foulquier-Gazagnes. Claire highlighted three open data trends she has witnessed over the last 10 years:

Multi-sectoral embrace of Open Data: Open data is no longer just the realm of government. Instead, efforts like Open Food Facts demonstrate actors across borders are collaborating to generate high-quality datasets.
Open-source and open data collaboration: Platforms like Hugging Face are facilitating cooperation between open-source developers and open data providers, resulting in initiatives like Common Corpus.
Reluctance to "Open Up" data: Because generative AI relies on vast training data that is scraped from many sources, Claire expressed fear that many data holders have little incentive to open up their data.

Michele Jawando was excited by growing private-sector awareness of the need for responsible governance. She saw this awareness in the growing number of review systems, technical boards, ethics committees, external advisors, executive accountability standards, risk-based classification systems, and similar mechanisms. Michele also emphasized building community trust. “When you have engagement infrastructure, when you have co-design processes, when you have independent technical advisors, you’re going to see a shift [...] where people have confidence in data and the deployment of AI models.”

Venkatesh Hariharan underscored the importance of large, high-quality datasets for developing large language models (LLMs) in many languages. Venkatesh stressed that datasets must be fully culturally inclusive and diverse, even within areas that share the same language. This was the experience of Telugu SLM, where 45,000 volunteers collected language samples across Telangana and, Venkatesh noted, “Every 20 kilometers, the dialect changes. Every 20 kilometers, the type of food you eat changes. Every Indian state has massive diversity, even within a single linguistic formulation.” According to Venkatesh, these datasets should only be licensed to open source projects, ensuring that the benefit is received by those who provided data. “Personal data is also important for training AI models and therefore consent becomes essential, especially in countries that have data privacy laws,” Venkatesh said. “Consent networks like India's Account Aggregator will be needed to enable consented data flows.”

Jeremy Rollison supported data commons for ensuring that datasets are not only diverse, representative, and inclusive, but accessible to smaller organizations that might not otherwise be able to access large amounts of high-quality data. “It's really unlikely that any one organization has enough data on their own to be successful at AI [...], so data sharing is going to be more important than ever, and it needs to be done in a way that addresses risks and reservations.”

Each panelist offered a view of what widespread public trust in open data would look like in five years: community demand for replication of data commons (Jeremy), creation and use of verified credentials to track the creation of AI-generated content (Venkatesh), government data as easily usable as commercial APIs and subsequent public confidence (Michele), increased investment in data literacy to create genuine co-design (Claire).

Closing Remarks - What’s Next for the Open Data Agenda?

Building a Smarter Fairer and More Durable Open Data Future

Stefaan Verhulst closed the Summit by articulating steps needed to ensure that open data serves the public interest. Echoing Under-Secretary-General Gill and Claire Foulquier-Gazagnes, Stefaan emphasized that open data no longer just means “open by default,” so “openness” must be defined by communities in governance structures like data commons, as Jeremy Rollison suggested.

Stefaan underscored that data readiness in the age of AI requires new datasets, as Venkatesh Hariharan discussed. In order to prevent data enclosure and avoid extraction, institutions must build the community trust that Michele Jawando envisioned. “If there is no social license, if there is no trust, then data will not be made available, or shared, and will be much harder to sustain public interest AI or even the potential summer of AI.”

Finally, Stefaan highlighted that the open data community must engage in technology and policy forecasting to navigate the implications of geopolitics for emerging technology.

***

We hope you found these conversations as productive as we did. In addition to reading the reflections above or watching the video recordings, we encourage you to follow the Open Data Policy Lab as it pursues these topics.

Have any questions or interested in collaborating? Email us at [email protected].

Back to the Blog