Summer of Open Data
Focus on Public Communication, Legal Mandates, and Data Ethics
Posted on 9th of September 2020 by Andrew Young, Andrew Zahuranec
The Summer of Open Data is a three-month project spearheaded by the Open Data Policy Lab (an initiative of The GovLab with support from Microsoft) in partnership with the Digital Trade & Data Governance Hub, Open Data Institute, the Open Data Charter, and BrightHive. Each week, we speak with data experts in local and regional governments, national statistical agencies, international bodies, and private companies to advance our understanding of how to establish a vision of open data focused on collaboration, responsibility, and purpose.
Moderated by The GovLab’s Co-Founder and Chief Research and Development Officer Stefaan Verhulst, the cross-cutting panel featured:
- Christian Troncoso, BSA | The Software Alliance Senior Director of Policy;
- Zachary Feder, New York City Open Data Program Manager; and
- Natalia Domagala, Government Digital Service, Cabinet Office, United Kingdom; Head of Data Ethics
In a 45-minute conversation, Stefaan and the panelists spoke on a variety of issues, including the evolution of the open data movement, the importance of legal mandates for directing energy, and the need for transparency and responsible use of data amid the ongoing pandemic.
The full conversation, as well as a brief overview of highlights, is below:
New York Open Data
New York City is considered a leader in open data nationally and internationally. With over 2,000 datasets published by a network of more than 90 “Open Data Coordinators” spread around the City, NYC Open Datahas sought to provide city leaders and everyday New Yorkers with the resources they need to make their community better.
Consequently, the panel opened with reflections from Zachary Feder on how New York’s open data program formed and evolved from its early days when open data drew its inspiration from freedom of information laws to today. He noted three important changes in the city’s relationship to open data.
First, Feder explained, New York had sought to improve the way in which it identifies the different sources of open data, looking beyond where people are explicitly requesting the data and instead looking for different sources that might provide that same information. Second, understanding it had a broad audience, the city sought to make data more understandable to non-experts. Third, New York sought to help others use its data by communicating the context in which the city collected it.
“Data absent that context is not very helpful,” said Zachary Feder. “One of the main things that government staff need to do [when publishing data] is translate the information they know in their daily lives interacting with, producing with, and making decisions based on that data to the general public.”
He highlighted the importance of the city’s Open Data Law, which clarifies requirements and helps to set priorities for city government employees.
State and Federal Experiences
While these lessons resonated with the other panelists, they also provided a chance for contrast. As Christian Troncosco noted, experiences with open data in the United States were much more mixed on a state and federal level.
“People may be surprised to learn that a lot of cities and municipal governments are the ones really leading in this space,” said Christian. “It is really striking how far behind a lot of states are. Although most states at this point have some form of open data portal, fewer than 20 states have any sort of policy on the books, whether its an executive order or legislation.”
“It really gets at the issue of prioritization. These states that have open data portals but have no policy underlying those portals tend to be pretty malnourished, not very useable, and not very responsive to the community of users who would otherwise engage with the data being made available,” he added.
This failure to develop state and federal assets challenged people and organizations that rely on data. From a corporate perspective, Christian argued that open data was valuable for businesses seeking to develop useful products and services, including the use of open data to train artificial intelligence instances.
Responsible Data Use in the United Kingdom
In the United Kingdom, transparency and openness were at the center of data ethics. However, as Natalia Domagala stressed, there were other ethical concerns that data scientists and policymakers sought to be aware of, especially as they pertained to personal data.
This awareness would be reflected in the government’s upcoming refresh of the Data Ethics Framework, first developed in 2016, which would charge data scientists to be more aware of the wider implications of their work and help them assess and mitigate any ethical concerns.
“How can you demonstrate that the data that you’re using has been de-identified to the greatest degree possible? Because problems might arise when the dataset you’re working on or the dataset that you’ve released can be matched with other datasets and that will make individuals easily identifiable,” Natalia said.
Natalia continued by arguing that it was essential to embed ethical principles in the practices of data scientists. Some of this work depended on ensuring data scientists knew of principles but another part required public servants to have the skills to use data ethically.
Amid the ongoing pandemic, she noted the public was increasingly cognizant of and concerned about the ethical use of data. This put a renewed obligation on government actors to promote responsible data use in all its operations because public health strategies depend on public trust.
“People have more awareness of the issues and that gives them more agency. They are more ready to hold us accountable,” said Natalia. “I do strongly believe and hope that this will change the way we work, no matter whether we are in the public or private sector. Data ethics is not optional. It’s an absolute necessity to operate because if people won’t trust us, they won’t let us innovate ”
Zachary echoed Natalia’s points, noting that the COVID-19 pandemic had resulted in record traffic to New York’s open data portal.
Priorities for the Future
As with previous panels, Stefaan closed the panel by asking the participants what they considered their biggest priority for the future of the open data movement.
For Christian, the priority is addressing impediments that might stand in the way of organizations that would benefit from collaboration around shared data.
“It’s unrealistic to expect corporations to open their vaults and make all of their data available, but when there are circumstances where there is a mutual benefit to making data available, we want to make sure that the right policy framework is in place.”
Despite growing recognition of the collective benefits of open data, a recent MIT survey found that 64% of business executives are reluctant to fully embrace open data as a result of regulatory uncertainty. To overcome such barriers, Christian suggested that competition and privacy regulators could be empowered to establish expedited review process to approve proposed data sharing arrangements.
For Zachary and Natalia, priorities came down to clarifying the value of data. Only by explaining uses could advocates promote open data’s value long term.
“In addition to using the data [we need to] share the use cases, to have a product that comes out of it, which both fosters more interest in the program […] but also communicates to the public that’s how the data is being used [to] help assuage some concerns on what the impact might be,” Zachary said.
“I’d like to echo what Zachary said about impact stories and case studies on data ethics and open data. That’s absolutely essential for understanding why we need those concepts,” said Natalia. “Also simple explanations of the long-term value of open data and data ethics measures that resonate with policymakers and decision-makers who might not necessarily fully understand what we are talking about.”
- Patrick McGarry, Head of Strategic Partnerships, Data Marketplace;
- John Wilbanks, Sage Bionetworks Chief Commons Officer;
- Swee Leng Harris, Luminate Group Principal for Data & Digital Rights; and
- Nuria Oliver, European Laboratory for Leaning and Intelligent Systems Board Member.