Blog Post
Emerging Funding and Business Models for Data Commons: A Comparison
Posted on 29th of October 2025 by Stefaan Verhulst, Hannah Chafetz, Andrew Zahuranec
As data becomes an essential input for innovation and AI training, questions of how to sustain its equitable and responsible reuse have gained urgency. Beyond technical infrastructure and governance frameworks, the long-term viability of data commons depends on the institutional and financial models that underpin them.
Over the last year, The Open Data Policy Lab collected over 60 examples of data commons that support public-interest AI. We analyzed different aspects of these commons including their governance structure, funding models, and the actors involved. We found that these initiatives are relying on an array of funding models ranging from philanthropic grants to government partnerships to donations to membership programs.
In what follows, we curated 10 of the data commons gathered. For each example, we briefly summarize the main focus of the commons, the geographic region, the funding model, and how data is accessed. We selected these examples based on the breadth of funding models being used. They are intended to be exemplary, not comprehensive and are listed in alphabetical order. The latter part of this blog outlines several patterns from these examples and reflections on how the data commons landscape is evolving.
Data Commons Examples
Initiative | Country | Funding model | Data access model |
The All of Us Research Program is an initiative from the National Institutes of Health (NIH) and based on the NIH Precision Medicine Initiative Working Group of the Advisory Committee to the Director. The initiative aims to crowdsource health data from 1 million people across the United States to improve needs-based healthcare and precision medicine. | United States | The program is funded by the U.S. government through the 21st Century Cures Act which set out 10 years of funding beginning in 2016 (total of $1.5 billion) and is managed by the NIH | Tiered access using a cloud-based platform (Researcher Workbench) |
AIDA Data Hub is an open data repository in Sweden for medical research led by the Linköping University Center for Medical Image Science and Visualization (CMIV). The platform provides the infrastructure for researchers to share their own data and research using FAIR principles and DOI. | Sweden | Membership program that provides access to Base Service Data Science Platform, options to purchase additional services including compute and storage, certain groups can apply for discounts | Member access through a secure platform |
The India AI Platform aims to accelerate access to non-personal data for AI researchers and startups. The platform is hosted by the Ministry of Electronics and Information Technology and Digital India. | India | Government funded and hosted by the Ministry of Electronics and Information Technology | Maintains a repository of datasets hosted on external websites and platforms |
GainForest.Earth is a decentralized science nonprofit that archives global biodiversity and environmental data. The initiative facilitates community-owned data commons for biodiversity, aiming to enable local and Indigenous communities to collect, manage, and share environmental data. | Switzerland | Donations from organizations and individuals through a decentralized payment system using blockchain technology and partners with philanthropic foundations and other organizations | Available through the GainForest application |
INSIGHT is a repository of 35 million eye images that is made available for research purposes only. By providing image data, it allows researchers to use advanced analytics and AI on anonymized patient records. This initiative is led by Moorfields Eye Hospital NHS Foundation Trust in collaboration with University Hospitals Birmingham NHS Foundation Trust. | United Kingdom | Funded as a digital innovation hub within UK Research and Innovation Industrial Strategy Challenge Fund (2019) in collaboration with several organizations with support from Health Data Research UK In 2022, Moorfields Eye Hospital became the lead partner and is supported by NHS, continues to collaborate with several partners Researcher data access applications are priced based on their use (including potential commercial value) | Access through a trusted research environment (Secure Research Environment) |
The Medical Imaging & Data Resource Center (MIDRC) facilitates and curates a data commons of medical imaging data, patient demographics and outcomes, and other clinical data. The MIDRC Data Commons, a subset of the MIDRC data, provides researchers with AI-ready data that can be accessed via an online portal. | United States | Follows a multi-institutional collaborative model Funded by the National Institute of Biomedical Imaging and Bioengineering, hosted at the University of Chicago, co-led by the American College of Radiology, the Radiological Society of North America, and the American Association of Physicists in Medicine Received a grant from the Chan Zuckerberg Initiative to expand globally | After making an account, data can be accessed through the Gen3 client application |
MLCommons is an AI benchmarking organization that provides open datasets for machine learning research and testing. Its objective is to support engineers in building AI technologies for the public good. | United States | Membership fees (fees are waived for academics, individuals, and small startups), founded with several corporate members | Datasets housed on Github or available for download through the website, limited member access only for specific datasets |
Mozilla’s Common Voice is a crowdsourced open voice dataset that can be used to train AI-driven voice applications. The initiative aims to broaden access to voice data for non-English languages and other groups typically underrepresented in voice datasets. The initiative is led by the Mozilla Foundation. | International | Hosted by the Mozilla Foundation Funded through philanthropic grants and donations, partners include NVIDIA (2022), the Gates Foundation, and GIZ (2023) Meets with funders on a quarterly basis | Available through an external platform (Mozilla Data Collective) |
OpenStreetMap (OSM) is a free, editable map of the world, collaboratively created by a global community of nearly 5 million registered users. The project aims to provide free and accessible geospatial data for everyone, including individuals, organizations, and governments. Supported by the OSM Foundation, this initiative fosters the growth, development, and distribution of open geospatial data. | International | Donations and membership fees to the OSM Foundation The OSM Foundation has tiered membership fees that can be waived for active contributors, members help set the direction of OSM Foundation The Foundation also generates revenue from its annual State of the Map conference which is supported by donations | Available through the OpenStreetMap database |
POSMO is a data cooperative that collects and manages mobility data, which includes GPS and sensor-based movement information about how people travel through cities. This data is used in urban planning and transportation research. Contributors share their mobility data, which is governed collectively rather than by a single company. *Note: Translated using an online translator | Switzerland | Seed funding from the Migros Pioneer Fund (2021) Data cooperative membership model (purchase shares of the cooperative) Works with several partner organizations to deliver projects such as the city of Zurich | Available through the POSMO App |
Patterns
The examples above reveal that while most initiatives aspire to enable equitable data reuse, their underlying business and governance models vary significantly across contexts, reflecting differences in policy priorities, sectoral focus, and maturity of data ecosystems.
Three broad patterns emerge:
Government-led Models
Initiatives such as All of Us, AIDA Data Hub, MIDRC, and India AI Platform exemplify state-backed infrastructures, where governments act as both funders and conveners. These models benefit from long term funding commitments. For instance, All of Us was enabled through a 10-year funding commitment through the 21st Century Cures Act. They often combine public investment with partnerships across academia and industry, positioning the commons as a public good to be leveraged for research and innovation. Sustainability here depends on continued political and fiscal commitment rather than direct revenue streams.
Philanthropy and Hybrid Cooperation Models
Efforts like Mozilla’s Common Voice and GainForest.Earth illustrate philanthropic and multi-donor ecosystems where open data creation and stewardship are sustained by donations and grants. Such commons blend donor capital with community participation, aligning with the indirect benefit model identified in the literature: the value lies in mission fulfillment and ecosystem engagement rather than monetization.
Membership and Cooperative Models
OpenStreetMap, POSMO, and MLCommons demonstrate membership-based and cooperative financing, where users or contributors (including governments) become shareholders or members, effectively co-owning the commons. These models internalize governance and sustainability through collective ownership, echoing the freemium and razor-blade business archetypes where access is open but added services, tools, or governance privileges require contribution or fees.
Across these categories, the government’s role shifts from funder (All of Us) to facilitator (Mozilla) to partner (POSMO). This spectrum underscores that financial models alone do not define sustainability: legitimacy, trust, and shared value creation mechanisms are equally critical.
Reflections and Looking Forward
Building on the above and complementing it with other research and past experiences, several cross-cutting reflections and predictions emerge on how data commons may evolve robust, equitable, and sustainable business models:
1. From Cost Centers to Value Platforms
Traditional open data initiatives often operated under the assumption that releasing data was a compliance cost or public good with minimal return. The emerging paradigm reframes data commons as platforms for value creation, where investments yield operational efficiencies, innovation spillovers, and measurable social returns. This requires moving from legal obligation to operational necessity—embedding openness and reuse in core processes rather than as afterthoughts.
2. Plural Revenue Streams and “Layered Openness”
Following the traditional freemium–premium–indirect benefit typology, commons can adopt layered access models:
Freemium tiers (basic public access, with value-added services such as data quality assurance or curation).
Premium tiers (tailored APIs, enriched datasets, analytical dashboards).
Indirect models (where open data supports core business functions, e.g., transportation or health platforms leveraging open mobility data).
This mirrors the razor-and-blades logic—open access as entry, with monetization through complementary services.
3. Cooperative and Mutualized Models
The cooperative mechanisms in projects like POSMO echo a broader movement toward data cooperatives and membership-based stewardship, where users and contributors collectively govern data resources. These models blur the line between producers and consumers of data and create incentives for contribution, trust, and shared value capture—an operationalization of “data as a commons.”
4. Public–Private–Philanthropic Hybrids
Hybrid financing—where governments provide infrastructure, philanthropy underwrites experimentation, and private actors co-invest through services or technology partnerships—is increasingly viable. Such tri-sectoral approaches ensure both legitimacy and flexibility while reducing dependency on single revenue sources.
5. Sustainability Through Service Design
Sustainable commons shift from “data release” to data services. Data is costly to produce but cheap to reproduce; hence, viable models rely on scaling re-use, not access fees. Commons can provide support, visualization, integration, and analytics services—monetizing added value rather than access itself.
6. Governance as a Differentiator
In a landscape where technical barriers are declining, governance becomes the new competitive advantage. Trusted intermediaries, transparent access processes, and demonstrable compliance with ethical and legal standards enable commons to attract users and funders. When openness increases, differentiation shifts from data access to governance and algorithmic quality.
7. AI-Ready and Trust-Based Commons
Finally, as AI transforms data value chains, data commons will increasingly function as AI training and testing ecosystems. Business models will depend on their capacity to ensure data quality, representativeness, and explainability. Initiatives that align openness with responsible AI development—through privacy-preserving infrastructures, social license mechanisms, and domain-specific curation—will define the next generation of commons.
Conclusion
The evolution from open data portals to fully fledged data commons ecosystems represents both an economic and institutional shift: from data as a public artifact to data as a shared asset. The emerging business models underpinning these commons—whether government-backed, cooperative, or hybrid— may signal a maturation of the data economy toward shared data stewardship, layered value creation, and mission-oriented sustainability.
However, it is important to note that the above comparison is not fully comprehensive and many other funding models may exist. Our analysis summarizes several funding models that data commons initiatives are using and does not comment on the effectiveness of these models and whether they are sustainable over time. The information above was gathered through secondary research and was limited to the information made public on the initiatives’ websites. Additionally, we selected the examples from our repository where the majority are published in English. As such, we likely did not include funding models typically used in non-English speaking regions.
To build on this work, future research may focus on the following questions:
1. Catalytic and Enabling Roles
What roles do governments, development agencies, and philanthropic funders play in initiating and de-risking data commons?
How do policy frameworks, public investment, and regulatory incentives shape the emergence of sustainable data commons ecosystems?
2. Funding Models and Sustainability
Which funding and revenue models (e.g., government-backed, cooperative, public–private, subscription-based, or blended finance) have proven most effective in scaling and sustaining data commons—and under what conditions?
How do these models balance economic viability, openness, and public value over time?
What mechanisms exist to reinvest value generated by commons (e.g., through data dividends, shared infrastructure funds, or tiered access)?
3. Institutional and Governance Dynamics
How do governance structures (e.g., cooperative boards, data trusts, federated management) interact with funding models to influence accountability and long-term resilience?
Which institutional arrangements best mitigate collective action failures, such as underinvestment or the “tragedy of the commons”?
4. Equity, Inclusion, and Global Diversity
How are data commons in low- and middle-income countries (LMICs) and non-English-speaking regions being funded and sustained?
What contextual factors (e.g., infrastructure, donor dependence, regulatory environments) affect their financial and institutional models?
How can funding mechanisms promote inclusive participation, especially from marginalized communities and local data producers?
5. Learning and Future Directions
What metrics and indicators can assess the effectiveness and sustainability of data commons funding models?
How might emerging technologies (e.g., blockchain, decentralized finance, smart contracts) enable new forms of collective financing and stewardship?
***
Have any questions or are interested in collaborating? Reach out to us at [email protected].