Blog

Stay up-to-date with what we are doing

New Publication

Data Sandboxes: Managing the Open Data Spectrum

A new white paper to support responsible and innovative data collaboration practices through the design, adoption, and governance of data sandboxes.

Posted on 18th of October 2023 by Uma Kalkar, Sampriti Saxena, Stefaan Verhulst

Data Sandboxes: Managing the Open Data Spectrum
Data Sandboxes: Managing the Open Data Spectrum

Today, The Open Data Policy Lab at The GovLab published a new primer, “Data Sandboxes: Managing the Open Data Spectrum”, written by Uma Kalkar, Sampriti Saxena, and Stefaan Verhulst. The white paper is designed to support responsible and innovative data collaboration practices through the design, adoption, and governance of data sandboxes.

Opening up data offers opportunities to enhance governance, elevate public and private services, empower individuals, and bolster public well-being. However, achieving the delicate balance between open data access and the responsible use of sensitive and valuable information presents complex challenges. Data sandboxes are an emerging approach to balancing these needs.

In this white paper, The GovLab seeks to answer the following questions surrounding data sandboxes: What are data sandboxes? How can data sandboxes empower decision-makers to unlock the potential of open data while maintaining the necessary safeguards for data privacy and security? Can data sandboxes help decision-makers overcome barriers to data access and promote purposeful, informed data (re-)use?

The six characteristics of a data sandbox. Image by The GovLab.

After evaluating a series of case studies, we identified the following key findings:

  • Data sandboxes present six unique characteristics that make them a strong tool for facilitating open data and data re-use. These six characteristics are: controlled, secure, multi-sectoral and collaborative, high computing environments, temporal in nature, adaptable, and scalable.
  • Data sandboxes can be used for: pre-engagement assessment, data mesh enablement, rapid prototyping, familiarization, quality and privacy assurance, experimentation and ideation, white labeling and minimization, and maturing data insights.
  • There are many benefits to implementing data sandboxes. We found ten value propositions, such as: decreasing risk in accessing more sensitive data; enhancing data capacity; and fostering greater experimentation and innovation, to name a few.
  • When looking to implement a data sandbox, decision-makers should consider how they will attract and obtain high-quality, relevant data, keep the data fresh for accurate re-use, manage risks of data (re-)use, and translate and scale up sandbox solutions in real markets.
  • Advances in the use of the Internet of Things and Privacy Enhancing Technologies could help improve the creation, preparation, analysis, and security of data in a data sandbox. The development of these technologies, in parallel with European legislative measures such as the Digital Markets Act, the Data Act and the Data Governance Act, can improve the way data is unlocked in a data sandbox, improving trust and encouraging data (re-)use initiatives.

The 10 value propositions of data sandboxes for participants. Image from The GovLab.

Further, we offer a series of operational considerations and principles to take into account when designing a data sandbox, as well as a series of processes and practices that can operationalize these governance requirements. The operational methodology applies a ‘scrum’ framework to help achieve a flexible and agile sandbox, and includes the following considerations:

  1. Appointing a Data Sandbox Facilitator, who will lead the sandbox and team, facilitate implementation, and ensure smooth communication among stakeholders. This person should have expertise in both the sandbox domain and data management. We refer to these individuals as ‘bilinguals.’
  2. Defining the purpose and problem by clearly articulating the objectives of the data sandbox project (i.e. is it a space to test the quality of data, prototype new data products, or improve existing data solutions?) and identifying the specific problem(s) the project aims to address. This can also involve creating a list of prioritized milestones.
  3. Identifying additional roles and positions to ensure all necessary skills and expertise are represented. It can also be helpful to establish the roles and responsibilities of the various team members and stakeholders at this time. This can include system administrators, data stewards, and technical project managers who can speak to the good data practices, funding and sustainability needs, and overall management of the data sandbox.
  4. Establishing data and technological infrastructure requirements, including the minimum specifications (min-specs) of the data and technological resources needed to support the data sandbox project.
  5. Establishing the role of a data steward to define the standards of quality for data and continuously monitor, audit, and help enhance data in the sandbox to uphold quality standards. A data steward is also responsible for summarizing and communicating the results of projects in the sandbox with relevant stakeholders.
  6. Developing a methodology for categorizing the sandbox’s data assets in collaboration with industry players to map key datasets. This can be especially important when using PETs such as synthetic data to create bespoke datasets for sensitive use cases.
  7. Providing strong metadata about the data available in the sandbox to help potential users understand the sandbox’s purpose and its alignment with their needs.
  8. Defining data access and use parameters that clearly outline the eligibility requirements and process of joining the sandbox. This can include creating a system for soliciting, vetting, selecting, and communicating with sandbox users to stay aligned to core values.
  9. Building dispute resolution processes before they are needed to proactively handle disagreements or legal/regulatory issues that may arise in the sandbox.

We conclude that data sandboxes can offer a secure and controlled environment for data access and exploration that provides a low-risk and high-reward way of trialing data sharing and (re-)use. This model of data collaboration can help promote data openness among new players, improving the availability and access to data to address public problems.

Read the full report (PDF) HERE.

Stay up-to-date on the latest developments of this work by signing up for the Data Stewards Network Newsletter.

Learn more about the Open Data Policy Lab by visiting our website: https://opendatapolicylab.org/.

Back to the Blog

Supported by