Blog

Stay up-to-date with what we are doing

Year in Review

Reimagining Data Governance in the Age of AI :Our 2025 Year in Review

Posted on 15th of December 2025 by Andrew Zahuranec, Roshni Singh, Stefaan Verhulst, Hannah Chafetz

Reimagining Data Governance in the Age of AI :Our 2025 Year in Review
Reimagining Data Governance in the Age of AI :Our 2025 Year in Review

Photo by Ian Schneider/Unsplash licensed under CC0.

In 2025, the use of generative AI continued to accelerate, even as open data initiatives, and data access policies began to slow; signaling what we have started to describe as an emerging data winter. This growing imbalance underscored a central challenge of the AI era: while demand for data is expanding rapidly, the governance models that once enabled access, trust, and reuse are no longer keeping pace.

Against this backdrop, The GovLab’s Data Program focused on re-imagining data governance for an AI-driven world—one that moves beyond narrow notions of openness toward data access that is systematic, sustainable, and responsible by design. Throughout the year, we worked across a wide range of domains—from humanitarian response and health and wellbeing to local decision-making—testing how new governance approaches can enable data reuse while preserving legitimacy, accountability, and public value.

2025 was therefore both a busy and a reflective year. Alongside major initiatives exploring the opportunities and risks of generative AI, we also critically examined long-standing assumptions about data sharing, openness, and access. In doing so, our work increasingly centered on the institutional, social, and governance conditions—such as stewardship, purpose-driven access, and social license—that will be required to ensure data ecosystems remain viable and trustworthy in the age of AI.

In this blog, we summarize some of that work. We discuss our work with a host of partners ranging from Microsoft, UNICEF, UNESCO and UNHCR, the Wellcome Trust, Siegel Family Endowment, Henry Luce Foundation, Kluz Ventures, the Institutional Data Initiative at Harvard University’s Law Library, the French Development Agency, and many others. We highlight four major areas of practice that guided much of our efforts and describe some projects indicative of that work.

 In this way, we hope to both take stock of our achievements and describe the foundation we hope to build on for the forthcoming year. We hope this collection is as useful to you as it is to us and that it can help ring out 2025.

Improving Access to Non-Traditional Data for the Public Good

Throughout the year, The GovLab continued its research on non-traditional data sources—digitally captured, mediated, or observed data such as mobile phone records, online transactions, or satellite imagery—and the ways it is reshaping how we identify, understand, and respond to public interest challenges. This work builds on the research we conducted as part of the Third Wave of Open Data where we argued that privately held datasets, responsibly reused and harnessed through cross-sector collaboration, could generate public value at scale.

A cornerstone of our work this year was our collaboration with the Wellcome Trust’s Discovery Research Programme—Social Data for Health: Advancing Health and Wellbeing Research Across Disciplines. Written by Hannah Chafetz, Adam Zable, Sara Marcucci, Christopher Rosselot, and Stefaan Verhulst, it offers an in-depth study exploring how non-traditional data is being used in health and wellbeing research around the world—with a focus on the United Kingdom and low- and middle-income countries—and detailed reference guide. Drawing on over 290 studies, 5 interactive studios, numerous interviews, and insights from 23 expert reviewers, we found that social data is being used to:

  • Enhance early detection of health risks and outcomes

  • Reveal behavioral and environmental determinants of wellbeing

  • Illuminate structural inequities and systemic health drivers

  • Power cross-disciplinary research at the intersection of technology, ethics, and society

We supplemented this report with a variety of blogs, articles, and other pieces of analysis. “Monitoring the Re-Use and Impact of Non-Traditional Data” and “Unlocking Public Value with Non-Traditional Data: Recent Use Cases and Emerging Trends, published on the Open Data Policy Lab, both highlighted innovative examples of how organizations around the world used non-traditional data sources. They noted that while many examples tend to center on research (particularly health research), NTD can fill gaps in official data.

Additionally, with the support of The Second Lancet Commission on Adolescent Health and Wellbeing and UNICEF, we conducted the Youth Solutions Labs. The Youth Solutions Labs were a series of remote co-design workshops with over 120 young people from around the world. Among other topics, we sought to understand how young people feel about the re-use of data—including non-traditional data—as it relates to critical issues impacting their health and wellbeing. Our report, Designing Shared Data Futures: Engaging Young People on How to Re-use Data Responsibly for Health and Well-being, provides several insights from these discussions and the types of non-traditional data that may be harnessed.  

Separately, Stefaan Verhulst contributed to an editorial in the open source journal Data & Policy on “Anticipating human mobility: Methods, data, and policy in forecasting and foresight”. The article describes the methodological frontier of forecasting and foresight and the ways that mobility datasets can contribute to that. It looks at the ethical and governance challenges of new data sources, the need for methodological pluralism, and the importance of ethical frameworks.

Advancing Data Commons Infrastructure for Responsible AI

In 2023 and 2024, the topic of generative AI rose to the forefront of the public consciousness. Conversations emerged about the opportunities and risks they presented, the rules that might govern their development and application, and what was required for responsible, effective use. The GovLab led the way in some of this work, arguing in A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI that generative AI offered significant potential for the open data movement, particularly when combined with data commons—collaboratively governed ecosystems that pool diverse datasets with shared standards, enabling responsible access for public benefit.

In 2025, we built on that foundation. We not only continued our knowledge leadership—developing repositories on data commons and generative AI applications, publishing a blueprint demonstrating to data stewards how they could create data commons for public-interest AI uses—but we also put our theory into practice with the New Commons Challenge.

Launched in March, The New Commons Challenge was an innovation challenge that offered two $100,000 grants to global changemakers who could provide compelling examples of how data commons might serve the public interest and unlock AI’s potential to solve complex challenges. The challenge received hundreds of applicants, which our team and an international panel of experts reviewed. In October, we selected two winners—the Amazon Rainforest Evolution Index and Malawi Voice Data Commons—and held a ceremony in New York.

In this way, we hope to show that fostering a responsible data ecosystem requires action. We will spend the next year guiding our winners and reporting back on their progress. We will also continue looking at ways we can incubate additional proposals to success. We aim to have more details on this work in the new year.

Advancing Digital Self Determination

A major gap in much of the current work on data and AI is not only trust, but rethinking agency and self-determination. While technology advocates often emphasize the benefits of data-driven tools, many individuals and communities lack meaningful ways to understand, shape, or contest how these systems affect their lives. As a result, the issue is not simply skepticism, but a structural absence of voice, choice, and agency.

When data and AI systems are deployed without mechanisms that enable people to express preferences, set boundaries, or influence governance, data use is experienced as extractive rather than empowering. Openness, rather than being a public good, can come to feel weaponized - serving institutional or commercial interests while bypassing those from whom data originate or whom AI systems most impact. This erodes public support not because people are anti-innovation, but because innovation is perceived as happening to them, not with them.

Framed this way, data sovereignty alone is insufficient. While it focuses on who controls data at an institutional or jurisdictional level, it often leaves unanswered questions about how individuals and communities exercise agency within those structures. Digital self-determination complements sovereignty by emphasizing the capacity of people and groups to meaningfully participate in decisions about data reuse, AI deployment, and value distribution—across the entire data lifecycle.

Without embedding agency into governance - through participatory processes, social license mechanisms, and accountable intermediaries - efforts to scale AI will continue to face legitimacy deficits. Reimagining data governance in the age of AI therefore requires shifting from a model of permission and compliance toward one of collective agency, negotiated use, and ongoing legitimacy.

In 2025, we looked at various ways to advance and operationalize digital self determination. With the French Development Agency, we wrote Reimagining Data Governance: Operationalizing DSD through Social Licensing. It analyzes limitations of traditional consent frameworks and outlines a structured approach—establishing community preferences, documenting conditions for reuse, and identifying enforcement pathways—to support more participatory and trusted data governance. The report also identifies key steps needed to scale social licensing, including real-world pilots, capacity building, improved oversight mechanisms, and stronger institutional support.

We also began work with several partners in Switzerland focused on digital self-determination and social license. A component of this was our participation in the Swiss Data Space Forum in Rotkreuz. Through a session on “Digital Self-Determination: Unlocking Data with Agency”, we introduced core questions about how DSD principles can move from high-level concepts to actionable design and governance features, especially in contexts where concerns about data extraction, AI-driven demand for data, and limits of consent frameworks are increasing. Through two workshops convened with partners, discussions focused on how individuals and communities can articulate, document, and enforce their data-use preferences, and how these expectations can be embedded into governance models, data-sharing agreements, and accountability mechanisms. 

Developing a New Science of Questions

A major gap in much of the current work on data and AI is not only one of trust or public support, but of misaligned demand. Data access, openness, and AI development continue to be largely supply-driven—shaped by what data are available, what technologies are possible, or what institutions are willing to share - rather than by clearly articulated societal priorities and public needs.

Making data access more meaningful, legitimate, and effective therefore requires renewed attention to the science and practice of asking questions. Without well-defined questions—grounded in lived experience, policy priorities, and public interest goals - data reuse risks becoming extractive, performative, or disconnected from real-world problems. When questions are poorly specified or absent, openness can feel abstract, and AI systems appear to optimize for technical feasibility rather than social value.

With support of the Henry Luce Foundation, we convened the International Committee on Questions which led to a new concept of the Q-Lab—a new venture to develop a knowledge center devoted to developing well-defined questions. We developed several knowledge products as part of this work. 

Inquiry as Infrastructure is a paper that explores what constitutes a good question—one that is not only technically feasible but also ethically grounded, socially legitimate, and aligned with real-world needs. Beyond Answers Presented by AI, meanwhile, looks at the limitations of AI systems, the roles questions can play within societies, and proposes the concept of The QLab. Stefaan Verhulst further expanded on these arguments with his blog “Generative AI and the New Tabula Rasa”, an examination of the “paradox of AI” and how it can become not merely a collaborator but a crutch.

We also worked with the Siegel Family Endowment to expand our DATA4Philanthropy initiative with new resources on the role of questions in philanthropic grant making processes. We published new blogs on questions-driven methods and a case study detailing lessons learned from Siegel’s methodology. 

Beyond this work, The GovLab built on its 100 Questions Initiative—an effort to map the world’s 100 most pressing, high-impact questions that could be answered if relevant datasets were leveraged in a responsible manner. In 2025, the initiative expanded its reach with the launch of a new domain on Women’s Health Innovation, developed in partnership with the Centre for European Policy Studies (CEPS) under the Gates-funded R&I project. To surface the most pressing, data-actionable questions in this field, we created a topic map, narrative, and convened a global cohort of 77 “bilinguals”—practitioners, researchers, policymakers, clinicians, innovators, and community leaders working at the intersection of women’s health and data—from more than 30 countries. 

This topic map was then refined through an iterative process of consulting with data bilinguals and public voting, leading to 10 urgent questions on women’s health. These 10 questions offer a clear, demand-driven agenda for the future of women’s health innovation. We aim to expand on this work in the new year and see how this agenda can be meaningfully acted upon.

Cultivating a Community of Data Stewards

Finally, The GovLab continued its years-long effort to foster an international, multidisciplinary cohort of data stewards, responsible data leaders empowered to exchange data for the public benefit. 

In May, we worked with The Data Tank, Swiss Data Alliance, Opendata.ch, and Hasler Stifung to host the Data Stewards Intensive Course in Switzerland, an exercise that brought together 21 participants from federal agencies, cantonal authorities, the private sector, and civil society for a five-day immersion into responsible data reuse and data stewardship. The program combined expert-led sessions, field visits, and hands-on exercises focused on developing the human and institutional capacities needed to advance data collaboratives and data spaces. 

In July, The GovLab worked with the Maryland Department of Information Technology to lead a multi-day Data Stewardship Bootcamp for Maryland’s Chief Data Officers, aimed at strengthening responsible data use and re-use across the state government. The event brought in guest speakers from across the region and challenged chief data officers through simulations, exercises, and more. 

A final major highlight of the year was our Fourth Annual State of Open Data Policy Summit, which brought together global policymakers, researchers, and practitioners to examine how generative AI is reshaping the open data landscape. In his opening remarks, Stefaan Verhulst outlined the emergence of a “Fourth Wave of Open Data,” emphasizing the dual reality of AI’s potential to democratize access alongside growing risks of a “data winter” as governments and platforms restrict key datasets. 

***

The work above only represents a fraction of our total output. Throughout the year, we participated in countless events, produced dozens of blogs, engaged in interviews and podcasts, and relaunched various websites (a full listing of which can be found here). Still, the above represents a useful cross-section of our contributions to the field and a collection of the impactful work we hope to build on for the following year—particularly the role that data commons can play in securing community agency and control and reinforcing responsible models for generative AI development.

We at The GovLab wish you a happy holiday season and look forward to engaging you all in the New Year. If you’d like to work with us on the work listed or any other projects, please don’t hesitate to reach out at [email protected]

Back to the Blog

Supported by