Summer of Open Data
Incentives for Data Reuse, Frameworks for Collaboration, and Centering Data Responsibility
Posted on 16th of September 2020 by Andrew Young
The Summer of Open Data is a three-month project spearheaded by the Open Data Policy Lab (an initiative of The GovLab with support from Microsoft) in partnership with the Digital Trade & Data Governance Hub, Open Data Institute, the Open Data Charter, and BrightHive. Each week, we speak with data experts in local and regional governments, national statistical agencies, international bodies, and private companies to advance our understanding of how to establish a vision of open data focused on collaboration, responsibility, and purpose.
Moderated by The GovLab’s Co-Founder and Chief Research and Development Officer Stefaan Verhulst, the cross-cutting panel featured:
- Patrick McGarry, Head of Strategic Partnerships, data.world;
- John Wilbanks, Sage Bionetworks Chief Commons Officer;
- Swee Leng Harris, Luminate Principal for Data & Digital Rights; and
- Nuria Oliver, European Laboratory for Learning and Intelligent Systems (ELLIS) Board Member.
In a 45-minute conversation, Stefaan and the panelists spoke on a variety of issues, including the strategies for forging cross-sector data collaboration, the pandemic’s influence on the data space, and mechanisms for advancing responsible data reuse in the social interest.
The full conversation, as well as a brief overview of highlights, is below:
COVID-19 and Incentives for Data Collaboration and Reuse
The COVID-19 pandemic has instigated an unprecedented level of interest in and experimentation with data collaboration and reuse.
The panel began with Nuria Oliver reflecting on how COVID-19 has incentivized data collaboration using previously siloed data to address various dimensions of pandemic.
Nuria reflected on her experience working in the Valencian region of Spain.
She recalled that, “At the beginning of the pandemic, the President of the Valencian Region in Spain was supportive of the idea of creating a team of experts […] composed of 20+ scientists from universities and research centers to leverage data to have a better understanding of the situation […] and evaluate the impact of the public policies implemented during the pandemic.”
This included work on issues like human mobility modeling, epidemiological models, and capturing data through citizen science and anonymized survey reporting via the Covid19Impact Survey. Already, it has revealed information related to social contact behavior during confinement, personal economic impact, labor situation, and health status.
Patrick agreed, noting that, “The interesting thing with COVID in particular, is that we’ve started to see an exploration around ways of using data that we just haven’t seen broad scale adoption of before.”
As the consequences of the pandemic continue to accumulate, and social uprisings continue across the United States he argued, “It’s incredibly important that we get the answers right and in order to get the answers right, we need to get the data right.”
Frameworks for Collaboration
The discussion then turned to mechanisms for accelerating and de-risking data collaboration, with a focus on building and socializing collaboration frameworks.
John Wilbanks pointed to lessons learned from the open source software movement as inspiration for current efforts at Sage Bionetworks around open science and patient engagement.
He argued that a key benefit of creating collaboration frameworks is their ability to allow stakeholders to focus on their substantive work, rather than the operations of data collaboration and reuse.
“If you do not have a framework, you suddenly have a bunch of geneticists trying to develop a framework.”
Much of his team’s work at Sage is aimed at supporting coordination among scientific communities. In its work with Alzheimer’s laboratories, for example, the team asked, “How do we serve that community and get out of their way so they can do their work?”
John lamented how previous waves of open data championed the existence or availability of datasets, rather than the work that data enabled.
“I think one of the mistakes of the last decade has been around the posting of data as opposed to celebrating the number of users of data.”
He argued that subsidizing computing capacity could be a means for centering data use rather than availability, especially in fields where the datasets in question are prohibitively large and complex.
Centering Data Responsibility
Next, Swee Leng Harris reflected on issues of responsibility, ethics, and governance in data reuse — including those raised by COVID-19 contact tracing and the recent A-Levels controversy in the United Kingdom.
“One of the really interesting things that the pandemic has produced in the UK in particular is an increased consciousness of data and data systems,” she said, “For example, in relation to contact tracing apps, there’s been an increased awareness and increased consciousness about how data is going to be used, who will have it, and how it will be protected.”
“Context matters. The purpose matters. The nature of the data matters.”
Nuria agreed, noting that, “during the pandemic we are being flooded with data every day. Data has become this really important concept in people’s lives […] it’s raised a huge amount of awareness for all citizens that data matters, that high-quality data matters, that decisions will be better if they are based on evidence.”
Priorities for the Future
The discussion ended with a focus on future priorities for the open data movement.
Patrick highlighted the importance of interoperability as a means for advancing the field, especially for governments.
“Interoperability would be a huge one because it addresses a lot of these concerns in a subtle way […] Often governments are really hamstrung around issues of budgets and other things,” and moving beyond data systems that only work in isolation would help them to not only “meet the letter of the law” in terms of open data release, but also to meet the spirit of enabling purpose-driven data reuse.
Both John and Swee Leng focused on social aspects of open data. John called for “massively expanding human rights protections in the US so that some of the current problems, like genetic discrimination, would become illegal.”
Swee Leng advocated for “a greater focus on social context,” including the social implications of data provenance. “We need to understand the context in which data was produced and the potential harms that might result from unhelpful or inadvertent use.”
Finally, Nuria hopes to see a transformation of the public administration.
She indicated that, “We really need to move toward allowing people to base their decisions on knowledge and evidence. Together with that, I would add a big change on data sharing and transparency with people and their citizens.”
The Summer of Open Data will conclude its conversation series next week. Led by BrightHive’s Matt Gee, this final conversation will include:
- Daniel Jarratt, Infinite Campus Head of learning Science Technologies;
- Vanessa Brown, National Student Clearinghouse Managing Director of Strategic Initiatives; and
- Felix Shapiro, Commonwealth of Virginia Workforce Policy Analyst.
Video of this panel will be released next Wednesday, September 23, 2020.
Until then, we welcome your input into the Third Wave of Open Data. Feel free to visit us at opendatapolicylab.org or participate in the conversation by tweeting with the hashtags #SummerOfOpenData and #3rdWaveOpenData.