Event
Averting a Data Winter: Reflections from the Third Annual State of Open Data Policy Summit
Posted on 3rd of July 2024 by Andrew Zahuranec, Stefaan Verhulst, Adrienne Schmoeker
The State of Open Data Policy Summit is an annual conference hosted by the Open Data Policy Lab (a collaboration between The GovLab and Microsoft) to explore how developments in policy and technology affect how governments, companies, and civil society provide access to data for public interest re-use. In prior years, we have looked at ways institutions have pursued purpose-driven re-use and moved from principles to practice.
On 18 June 2024, the Open Data Policy Lab hosted its Third Annual State of Open Data Policy Summit to look at the open data landscape and assess the Open Data Movement’s progress. With leading data experts from the public and private sector, the discussion sought to answer a single overarching question: “Are we entering a Data Winter or a New Wave of Open Data?”
In this blog, we summarize those discussions, highlighting each of the individual sessions, their participants, and the major takeaways.
Introduction to the Summit
Speaker: Stefaan Verhulst, Chief Research & Development Officer of The GovLab
Stefaan Verhulst set the scene for the discussion, reflecting on the current state of open data and access to data for re-use.
He highlighted that the world today was, as Charles Dickens once wrote, “the best of times, the worst of times.” While there is no shortage of examples of open data’s value, Stefaan noted how climate datasets and social media datasets had become increasingly inaccessible. Generative AI has fostered further restrictions by creating “AI anxiety.”
The GovLab’s own research appeared to reinforce this trend. Its survey of data stewards revealed that many data stewards had mixed feelings about their own country’s openness. The State of Open Data Policy Repository had seen few major additions over the last year.
However, Stefaan emphasized that there is potential to revive openness in the conception of a “Fourth Wave of Open Data,” as detailed in a recent report. In this conception, open data and generative AI are used in tandem to support each other, with open data used to train and fine tune models and generative AI used to democratize access to open datasets and facilitate better analysis and exploration.
See the full remarks here.
Opening Remarks
Speaker: Silvana Fumega (Global Data Barometer)
Silvana noted that 15 years ago, the open data movement was motivated by a single belief: “If we made government data open, people would naturally come, use it, and generate both social and individual value.” However, reality has proven more complex. It is not a given that “if you build it, they will come.”
The result has been a more nuanced conversation around privacy and data protection, human rights, and data governance. This is particularly important for the current AI age. She noted that, “as advances in AI continue, the relationship between data and AI governance becomes increasingly important as well. Data is the foundation for developing AI technologies, and ethical AI governance relies on solid data governance practices.”
Pulling from her experiences with the Global Data Barometer, Silvana offered a few reflections. First, the rate of increased access had slowed compared to just a few years ago. Second, gaps in skills, training, and infrastructure remained significant barriers to extracting value from data. Third, the rapid advances in AI highlighted the need for robust data capabilities because AI relies on vast amounts of high-quality to function appropriately.
She closed with a call for more data-sharing models and clear agreements to balance the power between data controllers and their subjects, such as data commons. She also noted the power of partnership and collaboration in fostering responsible, effective, and sustainable use of data. In her words, “We not only need to collaborate with the stakeholders in the data ecosystem from NSOs to Chief Data Officer, from CSOs to data journalists but also to those in charge of regulating, implementing and developing AI-related tools to address some of the challenges that Stefaan mentioned previously.”
See the full remarks here.
Private-Sector Panel: Does Generative AI Move Us Toward A Fourth Wave of Open Data?
Speakers: Stefaan Verhulst (The GovLab); Michael Tjalve (Microsoft Philanthropies); Kriss Dieglemeier (Splunk); Amra Dorjbayar (Wobby.AI); Francisco Celeiro (Telefónica)
In the first panel, focused on the private sector, the participants discussed a variety of topics, including trends on the availability of data, the incentive for private sector data sharing, the impact of generative AI on how companies perceive of data openness, and the impetus for harnessing open data and AI tools to solve public problems.
On the issue of trends, Kriss Dieglemeier (Splunk) spoke on her experiences bridging the data divide to ensure that marginalized communities benefit from data and technology. In her estimation, progress was “mixed.” While there have been many good examples of the power of data, the notion that “data is power” has not always been productive as it has encouraged some to silo their assets. "As we continue to advocate, data has to be structured. It can't be siloed," she said. She further emphasized how there was a need to identify specific incentives for private sector data sharing because one cannot rely on philanthropy alone. “We need to start thinking of data as organizational effectiveness, and to think of it as similar to how we think of other things. We would never invest in an organization that didn't have financials.”
Francisco Celeiro (Telefónica) noted how his company had been working on data-driven initiatives for over a decade. Over this period, it has looked to build capacity and foster “collaborative work” because “collaborative work will be much more productive and effective.” Speaking specifically on the incentives that companies have to share data and use it for good, he argued that “the external use of data can bring additional value to the datasets. This value can be capitalized on the B2B side, helping other businesses, but also on data for social good."
Michael Tjalve (Microsoft Philanthropies) discussed the ways that large-language models (LLMs) have “changed the game” and spoke on the excitement for new AI-driven tools and systems. However, he emphasized, there was a major gap: language support. While LLMs in English and other major languages have been improving, models in other languages lacked sufficient high-quality data for training. He argued that “by working with the open data movement, we can make sure the right tools and resources are available to meet those objectives” and specifically highlighted how data commons and simple tools could be a key driver in this work by making digital public goods accessible for all. In his words, “having the appropriate kind of tooling to make data easier to consume will go a long way. Generative AI certainly has a role to play here, but often it's much simpler tools that are needed to make data more easily shareable for dataset creators, more easily consumable for dataset users, and if desired more suitable for model training . Making that transition easier once you have the data will have a multiplying impact.”
Amra Dorjbayar (Wobby) discussed how he had started Wobby, a start-up that seeks to provide data-driven insights. He mentioned that Wobby was based on his experiences as a journalist and his realization that “accurate information was often needed very fast but technical knowledge was low.” Wobby tries to fill that gap, but there were challenges. Data protectionism, fueled by a lack of trust in data and how to share it, could undermine the openness of open data and limit the accuracy of tools.
The group subsequently reflected on their main points and inputs from the audience. When asked about one thing they could change about the current data ecosystem, the participants spoke on the need to promote real data privacy, the importance of articulating a “value proposition” around data, the need for comparable quality of LLMs, and the value of a “single point of access to all kinds of data in a data commons or data marketplace.”
See the full discussion here.
Public Sector Panel: How Might We Use Public Policy to Avert a Data Winter
Speakers: Adrienne Schmoeker (The GovLab), Federico Segui (National Statistics Institute of Uruguay), Philip Thigo (Thunderbird School of Global Management), Jiří Pilař, (European Commission), Natalia Carfi (Open Data Charter)
In the second panel, focused on public sector perspectives, participants discussed a variety of topics including the role of AI in shaping the open data movement, the developments around data governance and AI in the European Union, the challenges in ensuring AI is built in an inclusive way that reflects global interests, and the role of national statistical organizations in ensuring their data is used responsibly and effectively.
Nati (Open Data Charter) opened the discussion by noting the importance of high-value datasets that can meaningfully inform systems. She noted that many datasets feeding into algorithms come from the Global North and are not necessarily representative of global needs. The Open Data Charter has sought to understand how it can add value to the discussion, as she was concerned about the ability of everyday citizens to “actually understand the impact of the datasets being fed into AI” and the overall lack of transparency on “how the algorithms are being developed and their potential biases.” She emphasized the need to advocate for governments to open better quality datasets that can actually be fed into these systems in a global, transparent fashion to make AI more representative of the global south.
Jiri (European Commission) spoke briefly on developments happening in Europe around AI and data governance. He noted that many European Union member-states are overwhelmed because “there’s simply too much data” and they often need guidance on where to invest their limited resources in opening up public sector data. The European Commission has been working with member-states to build their capacity but they need to ensure that they “have done their homework” in terms of making their own data accessible and usable. He also noted that, in terms of making use of limited resources, “if you want to promote reuse, we need to make sure that the data is really reusable under very friendly conditions.” The European Commission has been building what it calls high-value datasets, datasets which will be available for free, accessible via APIs in bulk in a machine readable format. In his view, “"There are signs of winter, areas where there is declining access [...] but with other new possibilities [the EU is] trying to preserve the capacity to get valuable benefits from the data [...] We are trying to focus on the data that can be made open."
Philip Thigo (Thunderbird School of Global Management) spoke on inequities in AI development and ensuring that the Global Majority’s perspectives were well-represented. He said that his goal was to ensure that AI can be governed for the betterment of all of humanity and this responsibility rested on companies as well as countries. In addition to centering the values of the UN Charter in conversations of AI and open data, he recommended a few core principles. He suggested that AI be governed inclusively and in the global interest because it was often unclear who systems were responsible to. He also suggested that AI governance be build lock-step with data governance and that organizations work to ensure the benefits of AI are shared across stakeholders instead of just benefiting one or another organization. On the issue of openness, “it's part policy but also about ensuring there's a deliberateness, whether it's investing in inclusivity or developing principles."
Federico (National Statistics Institute of Uruguay) provided his reflections on the obligations of statistical agencies, arguing that they were “fundamental pillars of democracy who ensure democratic accountability and transparency.” In his view, national statistical organizations need to provide documentation of data and engage with the public to ensure that data is understandable and the implications are clear. While their role was not in setting policy or intervening on political challenges, they could support in other ways. "NSOs cannot participate in the AI debate but they can enrich it with quality data,” he said. “They cannot warn about every possible use of data but we can promote the open use of official statistics and create informed users."
The discussion concluded with participants brainstorming ways institutions might ensure that humanity remains central in discussions around artificial intelligence. Several of the participants spoke about harnessing existing spaces and regulations. Others talked about looking at these issues in a holistic manner, arguing that car safety could not focus solely on seatbelts but “the entire car.”
See the full discussion here.
Closing Reflections
Speaker: Stefaan Verhulst, Chief Research & Development Officer of The GovLab
The discussion concluded with Stefaan (The GovLab) providing his own overarching reflections on the discussion. He had several takeaways.
First, Stefaan highlighted the discussions on high quality, high value data and working to ensure that it is available to address current opportunities and challenges. This is a topic that has received special importance amid growing awareness about artificial intelligence and its needs, but questions remain on who determines the value of data and which kinds of issues take precedence.
The second area Stefaan highlighted was on the notion of common services, particularly data commons. He noted that, “if we actually don't start thinking about common infrastructures, common services, and common standards, how do we actually make sure that we have the kind of data we need and prevent fragmentation?” Data commons could be critical in a resource-constrained environment.
Third, Stefaan highlighted discussions on inclusion and the role of institutions to ensure that there is equity in data. Governments and others have a critical role to play in ensuring diverse, representative perspectives are available. This connected to a need for data governance and to consider the role of human rights.
Finally, Stefaan highlighted “the need for taking not just an open data, but a data ecosystem approach.” In other terms, Stefaan emphasized the need for data stewards to think critically about the kind data ecosystem that we need to promote the public good and the kind of collaboration that we want to be possible. With this note, he then closed the session.
See the full remarks here.
***
The Third Annual State of Open Data Policy Summit offered participants an opportunity to hear from experts across the world and across sectors about the challenges presented by the current data ecosystem and the ways we might avoid a “data winter.” Their reflections offer a handful of ways forward that open data advocates can use in their own work.
We hope you found these conversations as productive as we did. In addition to reading the reflections above or watching the video recordings, we encourage you to follow the Open Data Policy Lab as it pursues these topics. Sign up for updates on our activities here and learn more about the fourth wave of open data here.