Blog Post

Anticipating Data Policy in the Age of AI

Seven Signals Shaping the Future of Data Access and Reuse

Posted on 22nd of June 2026 by Stefaan Verhulst, Adam Zable

Anticipating Data Policy in the Age of AI

The world’s relationship to data is changing rapidly. Artificial intelligence has generated significant excitement for its potential to help solve public problems, improve decision-making, and create new forms of economic and social value. At the same time, it has intensified longstanding debates around access, ownership, attribution, privacy, labor rights, security, and the responsible reuse of public-interest data. Questions that once sat at the margins of data policy have moved to the center.

Governments, international organizations, civil society groups, and companies are responding with a growing range of governance approaches, from executive orders and regulatory frameworks to international agreements, industry standards, and new institutional models. Yet it remains difficult to distinguish short-term developments from deeper structural shifts. While discussion often focuses on the latest AI breakthrough, the more important question for policymakers is how these technologies are reshaping the systems, assumptions, and governance models that determine how data is accessed, shared, and used.

To better understand these changes, The GovLab convened two forecasting studios bringing together experts working in data governance, digital policy, open science, AI governance, and public-sector innovation. Participants explored emerging trends in data access, governance, and reuse, examined the forces driving those trends, and considered what they may mean for the future of open data and public-interest data ecosystems.

The discussions identified seven signals that point to significant changes already underway. These signals suggest that the future of data governance will be shaped by advances in AI alongside evolving expectations around trust, stewardship, infrastructure, sovereignty, reciprocity, and public value. They offer a starting point for understanding how data policy may need to evolve in the years ahead. A longer version of this analysis, with a full list of studio participants and more detailed discussion of each signal, will be published separately.

Screenshot 2026 06 22 163404

Signal 1: The Open Data Paradigm Is Under Strain

Chat GPT Image 24. Juni 2026, 06 51 36

One theme that surfaced repeatedly across the studios was the growing tension between traditional open data models and the realities of the AI era. Open data policies were built in part on the assumption that data reuse would be relatively visible and understandable. Institutions could generally see who was using their data, for what purposes, and with what outcomes. Generative AI challenges that logic. Data can now be absorbed into training pipelines and transformed into model capabilities, making downstream use difficult to trace, attribute, audit, or contest. At the same time, the category of “open data” has expanded to include everything from government statistics and scientific datasets to scraped web content, synthetic data, and machine-generated information, despite significant differences in their risks and governance needs.

These developments are creating a growing legitimacy challenge. Data shared for transparency, research, public service, or collective benefit may later be incorporated into commercial AI systems with limited visibility, accountability, or return to the original providers. As a result, openness is increasingly experienced by some creators, institutions, and communities as exposure to extraction, raising the risk that contributors may respond by restricting access to critical information. In response, policymakers are increasingly exploring more differentiated approaches to access that account for context, purpose, trust, and reciprocity.

Signal 2: Data Ecosystems Are Becoming Machine-Centric and AI-Mediated

Chat GPT Image 24. Juni 2026, 07 00 05

A second theme centered on the changing relationship between humans, machines, and data. Data ecosystems were largely designed for people to browse, interpret, and use information. Increasingly, however, machines are becoming the primary consumers and producers of data. APIs, bots, AI systems, and automated pipelines now account for a growing share of activity in public data infrastructures, while sensors, satellites, vehicles, and other systems generate vast volumes of machine-native data, often processed at the edge before it ever reaches a centralized system or human reviewer. AI is also elevating the importance of certain data types, such as text, images, audio, video, and synthetic data that can be consumed, combined, and repurposed at scale.

The meaning of access is shifting as a result. Data may be technically public but unusable for machines if it lacks metadata, documentation, provenance, or machine-readable formats. Meanwhile, information originally intended for human audiences can be harvested and operationalized in ways institutions never anticipated. AI systems are also becoming intermediaries between people and public information, as citizens increasingly turn to AI platforms instead of official sources. This creates new challenges around data quality, misinformation, infrastructure resilience, and the ability of public institutions to maintain authoritative information in a machine-mediated environment.

Signal 3: Inference Is Reshaping the Foundations of Data Governance

Chat GPT Image 24. Juni 2026, 07 06 39

Several examples discussed during the studios pointed to another challenge: valuable or sensitive knowledge can now be generated without direct access to the underlying data. Data governance has traditionally focused on collection, ownership, publication, and access. AI-enabled inference complicates that model. A navigation app in Taiwan, for example, reportedly displayed accurate red-light countdowns by aggregating vehicle stop patterns and GPS traces, even though the official traffic-light data had not been publicly released. The company inferred the rhythms of the city’s infrastructure, without any data access.

This may require governance to expand beyond control of datasets to include who is affected by inferred knowledge, who can audit or challenge it, and what recourse exists when it produces real-world consequences. Personal, anonymized, behavioral, environmental, and community-level data may generate sensitive downstream inferences when combined at scale, even if individual datasets appear harmless in isolation. AI-driven inference will require governance tools focused on provenance, contestability, recourse, reciprocity, and equitable value-sharing alongside traditional approaches to ownership and access.

Signal 4: Data Infrastructure Is Becoming Harder to Sustain

Chat GPT Image 24. Juni 2026, 07 10 30

The discussions also highlighted growing concerns about the future of data infrastructure. AI-driven scraping, bot traffic, larger datasets, aging systems, and rising maintenance demands are placing increasing pressure on institutions responsible for maintaining public data resources. Repositories and portals designed for human browsing are being strained by machine-scale demand, while automated traffic can consume additional hosting, security, and bandwidth capacity without producing equivalent public value.

Maintaining data access is increasingly a resilience and sustainability challenge. Cyberattacks, funding cuts, market consolidation, power grid instability, weak capacity, and geopolitical tensions all threaten the long-term availability of public-interest data. At the same time, the cost of openness continues to rise as organizations must maintain, secure, document, authenticate, and preserve data over time. Efforts to improve resilience through backup systems, distributed preservation, and cloud infrastructure carry their own financial and environmental costs, raising difficult questions about how open data ecosystems can be sustained.

Signal 5: Data Governance Is Fragmenting Across Institutions and Jurisdictions

A recurring concern from both studios was the growing fragmentation of the data governance landscape. Within countries, different levels of government and agencies often pursue separate approaches to data access, privacy, AI, cybersecurity, and digital infrastructure. Internationally, national regulations, regional frameworks, UN processes, development initiatives, and private-sector standards are evolving simultaneously, often with different assumptions, timelines, and definitions of risk. The result is a governance environment that is increasingly difficult to navigate.

At the same time, new mechanisms are emerging to bridge these divides. Digital economy agreements and digital trade agreements are creating ways to support cross-border data exchange and operational trust without requiring full regulatory alignment. Yet fragmentation remains a major implementation challenge, particularly for institutions with limited technical, legal, or financial capacity. Participants also highlighted a growing gap between policy design and operational reality, with many organizations lacking the stewardship capacity, multidisciplinary expertise, and technical translation functions needed to turn governance principles into practice.

Signal 6: Sovereignty and Security Are Driving a Turn Toward Strategic Control

5c45e7e4 7c0b 4c79 8df5 36be0d9c71be

Participants spent considerable time discussing how geopolitical competition, security concerns, and dependence on foreign digital infrastructure are reshaping data governance. Concerns about adversarial access, cloud dependence, infrastructure exposure, and control over strategically valuable data resources are increasingly replacing assumptions that cross-border data flows are inherently beneficial. Underlying these discussions was a concern about the weaponization of digital dependencies, as reliance on foreign cloud infrastructure, platforms, AI models, semiconductors, and software ecosystems becomes a source of geopolitical leverage.

These dynamics are contributing to a shift toward approaches described as "open but local" and "interoperable but separate." Controlled-access systems, data localization requirements, and sovereign cloud and AI strategies are gaining traction as governments seek to reduce external dependence and retain greater control over data assets. In many parts of the Global South, these concerns are also linked to fears of digital colonialism, where local data and knowledge generate value elsewhere. Yet sovereignty remains easier to pursue politically than operationally. Many governments continue to depend heavily on foreign infrastructure and technical services even while seeking greater autonomy, creating difficult trade-offs between openness, interoperability, development priorities, economic dependence, and national control. Participants also noted that sovereignty can reflect a legitimate effort to strengthen local capacity and reduce vulnerability, but it can also become a reactive response that prioritizes restriction before addressing deeper questions of governance, capability, incentives, and public value.

The discussions also suggested that data sharing is increasingly contingent on control. Governments, communities, workers, and data holders are more willing to share when they can define conditions, monitor use, and rely on credible safeguards against misuse. Visibility, recourse, and benefit-sharing are becoming important foundations for trust in a machine-mediated data ecosystem.

Signal 7: Data Sharing Models Need Stronger Incentives and Benefit-Sharing Mechanisms

Participants repeatedly pointed to a gap between the promise and reality of data sharing, as well as the failure to solve the political economy of reuse. Despite significant public investment, data collaboratives, data spaces, marketplaces, trusts, exchanges, and other sharing models have so far struggled to scale. Weak incentives, unclear value propositions, legal uncertainty, technical complexity, high transaction costs, and uneven trust continue to limit participation. Organizations that control valuable data may see little reason to share when risks are immediate and benefits are uncertain or captured elsewhere.

A related challenge is the absence of widely accepted approaches for valuing and sharing the benefits generated from data. There is little agreement on how to measure, compensate, govern, or distribute the value of non-personal, public-interest, and community data. As outlined earlier, this results in institutions often struggling to distinguish responsible reuse from extraction, and communities and data holders seeing little return from the value their data helps create. Uncertainty around reciprocity, benefit-sharing, and public value makes voluntary sharing less attractive and contributes to pressure to revisit existing data sharing regulations.

The discussions also highlighted the potential role of new intermediaries, including data stewards, technical mediators, and multidisciplinary trust brokers that can help negotiate access conditions, build trust, identify systemic risks, and connect data supply to public demand. These roles will need clear mandates, sustainable resources, enforceable governance, and credible mechanisms for distributing value and managing risk if they are to avoid the same barriers that have limited earlier sharing models.

Conclusion

Throughout the two studios, participants described a data policy landscape undergoing profound change. Generative AI, machine-scale data reuse, geopolitical competition, infrastructure pressures, and growing concerns about extraction are challenging many of the assumptions that shaped open data policy over the past two decades. Questions of access, trust, stewardship, sovereignty, infrastructure, and public value are becoming increasingly intertwined.

The seven signals outlined above suggest that the future of data governance will require new forms of stewardship, stronger interoperability mechanisms, more sustainable infrastructure, clearer benefit-sharing arrangements, and governance models capable of operating in increasingly complex and machine-mediated data ecosystems. While openness remains a foundational principle, the conditions under which data is shared, reused, and governed are being redefined.

The discussions also highlighted the importance of anticipatory governance. Forecasting specific technological breakthroughs may be difficult, but the pressures reshaping data ecosystems are already visible. Governments, institutions, and communities can prepare by strengthening their ability to identify emerging shifts, understand their implications, and adapt before risks, dependencies, or missed opportunities become embedded.

Ultimately, many of the challenges raised across the studios converge on a common question: how can societies continue to unlock the value of data while ensuring that its benefits are shared, its risks are governed, and its use remains aligned with the public interest? The next phase of data policy will play a critical role in shaping that future.

Back to the Blog