New Blog Post
Monitoring the Re-Use and Impact of Non-Traditional Data
Posted on 3rd of September 2025 by Adam Zable, Stefaan Verhulst
Non-Traditional Data (NTD) — data digitally captured, mediated, or observed through instruments such as satellites, social media, mobility apps, and wastewater testing — holds immense potential when re-used responsibly for purposes beyond those for which it was originally collected. If combined with traditional sources and guided by strong governance, NTD can generate entirely new forms of public value — what we call the Third Wave of Open Data.
Yet, there is often little awareness of how these datasets are currently being applied, and even less visibility on the lessons learned. That is why we curate and monitor, on a quarterly basis, emerging developments that provide better insight into the value and risks of NTD.
In previous updates, we focused on how NTD has been applied across domains like financial inclusion, public health, socioeconomic analysis, urban mobility, governance, labor dynamics, and digital behavior, helping to surface hidden inequities, improve decision-making, and build more responsive systems.
In this update, we have curated recent advances where researchers and practitioners are using NTD to close monitoring gaps in climate resilience, track migration flows more effectively, support health surveillance, and strengthen urban planning. Their work demonstrates how satellite imagery can provide missing data, how crowdsourced information can enhance equity and resilience, and how AI can extract insights from underused streams.
Below we highlight recent cases, organized by public purpose and type of data. We conclude with reflections on the broader patterns and governance lessons emerging across these cases. Taken together, they illustrate both the expanding potential of NTD applications and the collaborative frameworks required to translate data innovation into real-world impact.
Categories
Public Health & Disease Surveillance
Environment, Climate & Biodiversity
Urban Systems, Mobility & Planning
Migration
Food Security & Markets
Information Flows for Risk and Policy
Public Health & Disease Surveillance
Loyalty Card Data
Suhag, Alisha, Romana Burgess & Anya Skatova. “Shopping Data for Population Health Surveillance: Opportunities, Challenges, and Future Directions.” Journal of Medical Internet Research 27 (2025): e75720. https://doi.org/10.2196/75720
Focus: Explores how supermarket loyalty card data—collected for marketing—can also serve as a resource for tracking health behaviors and informing public health policy.
Role of Non-Traditional Data: Analyzes commercial transaction records on food, alcohol, tobacco, and over-the-counter medicines. These high-frequency data provide early warning signals, such as spikes in cold medicine sales during outbreaks, and can be linked to socioeconomic trends not visible in surveys.
Why This Matters: Loyalty card data are already widely collected and relatively inexpensive to analyze. With proper governance, they could become a powerful complement to official health surveillance, helping to identify health inequalities, monitor responses to interventions like sugar taxes, and provide near real-time feedback for policymakers.
Wearable / Sensor Data
Erturk, Eray, Fahad Kamran, Salar Abbaspourazad, Sean Jewell, Harsh Sharma, Yujie Li, Sinead Williamson, Nicholas J. Foti & Joseph Futoma. “Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions.” Proceedings of the 42nd International Conference on Machine Learning (ICML 2025), PMLR 267 (2025). https://arxiv.org/abs/2507.00191
Focus: Develops a large-scale foundation model trained on behavioral data from wearables (rather than raw sensor signals) to improve health state detection. Using over 2.5 billion hours of data from 162,000 participants in the Apple Heart and Movement Study, the model was evaluated on 57 health-related tasks ranging from static outcomes (e.g., chronic conditions, medication use) to dynamic states (e.g., sleep quality, pregnancy, infections).
Role of Non-Traditional Data: Relies on higher-level behavioral metrics derived from Apple Watch and iPhone data—such as activity, gait, cardiovascular fitness, and sleep—captured at daily and weekly timescales. These derived measures are more closely aligned with meaningful health states than raw sensor streams, and were systematically modeled to handle irregular sampling and missingness.
Why This Matters: Results show that behavioral wearables data can outperform raw biosignals (like PPG) in tasks where lifestyle and activity patterns drive outcomes, such as sleep, pregnancy, and injury detection. Combining behavioral and sensor-based foundation models delivered the strongest overall performance, demonstrating the complementary value of these data types. The study highlights the promise of responsibly leveraging everyday wearable data to enable more proactive, non-invasive, and personalized health monitoring.
Participatory / Crowdsourced Data
Kelley, Kathleen, Nicolò Gozzi, Mattia Mazzoli & Daniela Paolotti. “Exploring influenza vaccination determinants through digital participatory surveillance.” BMC Public Health 25, Article 1345 (2025). https://doi.org/10.1186/s12889-025-22496-8
Focus: Uses over a decade of data from Influweb, the Italian arm of the Influenzanet participatory surveillance network, to study social, demographic, and behavioral factors associated with influenza vaccination uptake between 2011 and 2021.
Role of Non-Traditional Data: Relies on longitudinal self-reported vaccination status and health behavior data contributed by thousands of volunteers online. This digital participatory surveillance system captures individual- and household-level details (e.g., transport use, household composition, medication for chronic illness) not available in traditional vaccination datasets, while enabling timely analysis across flu seasons.
Why This Matters: Findings show that people living with minors or relying on public transportation were less likely to vaccinate despite higher exposure risk, while older adults, those with chronic conditions, and more educated individuals had higher uptake. Vaccination also rose during the COVID-19 pandemic. The study demonstrates how participatory surveillance can complement official records by adding real-time, contextual information, helping target communication and outreach strategies for groups at risk of under-vaccination.
Dungan, Rachel. “The Role of Crowdsourcing as Researchers Respond to Loss of Public Health Data.” AcademyHealth Blog, May 12, 2025. https://academyhealth.org/blog/2025-05/role-crowdsourcing-researchers-respond-loss-public-health-data
Focus: Discusses how researchers are turning to crowdsourced data to fill gaps as U.S. public health surveillance systems face restrictions and rollbacks. Highlights both the opportunities and the limits of these approaches.
Role of Non-Traditional Data: Profiles projects such as Flu Near You (self-reported symptoms), Project Tycho (digitized historical disease data), Zoe’s COVID Symptom Study, and smartphone-based monitoring of infrastructure stress. These examples show how bottom-up contributions can supplement official data.
Why This Matters: Government-collected data remains essential, but crowdsourcing offers speed, inclusiveness, and local relevance. As access to official data narrows, combining grassroots efforts with authoritative systems may help maintain transparency, strengthen trust, and ensure health policy reflects diverse population needs.
Satellite + Environmental Data
Wang, Yuanlong, Pengqi Wang, Changchang Yin & Ping Zhang. “SatHealth: A Multimodal Public Health Dataset with Satellite-based Environmental Factors.” arXiv, June 16 2025.
Focus: Introduces SatHealth, a large-scale dataset built to explore how environmental conditions shape health outcomes. The dataset links satellite-based measures of the physical environment with health insurance claims and social determinants of health. It is intended to support a wide range of research, from modeling disease risks to planning preventive interventions.
Role of Non-Traditional Data: Uses environmental information derived from satellites—including estimates of air quality, land surface temperature, and urban density—that are not available in traditional health records. These satellite-derived indicators are combined with administrative health data and socioeconomic factors to create a multimodal dataset.
Why This Matters: Health outcomes are shaped not only by medical care but also by the environments in which people live. SatHealth provides researchers and policymakers with a resource to analyze these links at scale, enabling richer disease risk models, more targeted public health planning, and evidence that can guide policy and prevention strategies.
Wastewater Data
Wolfe, Marlene K., Amanda L. Bidwell, Stephen P. Hilton & Alexandria B. Boehm. “Wastewater surveillance for avian influenza: national patterns of detection and relationship with reported outbreaks and infections.” medRxiv, May 7, 2025. https://doi.org/10.1101/2025.05.06.25327100
Focus: Looks at how wastewater monitoring can serve as an early warning system for avian influenza (H5N1) outbreaks across the United States.
Role of Non-Traditional Data: Analyzes more than 18,000 wastewater samples collected from 147 treatment plants in 40 U.S. states. These samples were tested for fragments of avian influenza virus using a highly sensitive lab method called droplet digital PCR, which can detect and measure small amounts of genetic material. The results were then compared with official outbreak records from the U.S. Department of Agriculture and the Centers for Disease Control and Prevention, covering poultry, wild birds, cattle, and human cases.
Why This Matters: The study found strong links between wastewater detections and official outbreak reports, suggesting that this kind of monitoring could complement existing animal and human surveillance systems. It shows that wastewater is not just useful for COVID-19 but can also track zoonotic threats before they spread widely.
Wood, Anthony J., Jessica Enright, Aeron R. Sanchez, Ewan Colman & Rowland R. Kao. “Improving wastewater-based epidemiology through strategic placement of samplers.” arXiv, June 17, 2025. https://arxiv.org/abs/2506.14331
Focus: Explores how the positioning of wastewater samplers within sewer networks affects the ability to detect disease outbreaks early and at a local level.
Role of Non-Traditional Data: Builds on wastewater-based epidemiology, which uses traces of pathogens in sewage as a measure of community infection levels. Through simulations and data from Scotland’s COVID-19 pandemic response, the authors model how different network structures and population distributions shape the signals captured.
Why This Matters: Wastewater monitoring has proven valuable for tracking outbreaks, but this study shows its effectiveness depends on where samplers are placed. Thoughtful placement can make surveillance more precise and timely, supporting quicker responses to infectious disease threats such as polio or influenza.
Coulliette-Salmond, Angela, Florence Whitehill, Amanda K. Lyons, Bethelhem Abera, Colin Adler, Maroya Spalding Walters, Magdalena Medrzycki, Christine Ganim Kyros, Mariya Campbell, Michael Y. Lin, Rachel S. Poretsky, Adam Horton, Jennifer Weidhaas, James VanDerslice, L. Scott Benson, Erin M. Driver, Rolf U. Halden, Kerry A. Hamilton & Margaret Williams. “Considerations for healthcare wastewater surveillance of targeted antimicrobial-resistant organisms.” medRxiv, June 28, 2025. https://doi.org/10.1101/2025.06.27.25330422
Focus: Examines feasibility, methods, and operational considerations for implementing wastewater surveillance in healthcare facilities to detect antimicrobial-resistant pathogens such as carbapenemase-producing organisms and Candida auris.
Role of Non-Traditional Data: Collects wastewater samples from long-term care and post-acute healthcare facilities to test for antimicrobial-resistant pathogens such as carbapenem-resistant bacteria and Candida auris. These wastewater findings were paired with on-site screening of patients and assessments of facility infrastructure. Together, this allowed the researchers to evaluate whether monitoring sewage at the facility level could provide an early warning system for dangerous pathogens before they spread more widely.
Why This Matters: Outlines lessons learned from pilots in 16 facilities across five states, including feasibility rates, sampling challenges, and the importance of paired wastewater and clinical data, offering practical guidance for scaling surveillance to support infection control.
Environment, Climate & Biodiversity
Satellite & Remote Sensing
Di Giuseppe, Francesca, Joe McNorton, Anna Lombardi, & Fredrik Wetterhall. “Global data-driven prediction of fire activity.” Nature Communications 16, 2918 (2025), April 1, 2025. https://www.nature.com/articles/s41467-025-58097-7
Focus: Develops and tests a machine learning system that predicts daily wildfirem activity at a kilometer scale. The model combines weather, vegetation, and ignition data to generate forecasts that are more precise than traditional fire risk indices.
Role of Non-Traditional Data: Relies heavily on Earth-observing satellites. These include VIIRS (Visible Infrared Imaging Radiometer Suite) and NASA’s MODIS (Moderate Resolution Imaging Spectroradiometer) to detect active fires; Copernicus Atmosphere Monitoring Service to infer fuel abundance; and the ESA-CCI (European Space Agency Climate Change Initiative) for biomass data. These remote-sensing inputs are combined with models of fuel conditions and ignition proxies such as population density, road networks, and lightning strikes.
Why This Matters: Wildfire forecasting has often depended on broad regional indices that miss local variation. This new system seeks to produce earlier and more localized warnings by combining multiple satellite streams and contextual data. That can give communities and agencies better tools to prepare for and respond to fire risks across very different ecosystems.
Misra, Amit, Kevin White, Simone Fobi Nsutezo, William Straka III & Juan Lavista. “Mapping global floods with 10 years of satellite radar data.” Nature Communications 16, 5762 (2025), July 1, 2025. https://doi.org/10.1038/s41467-025-60973-1
Focus: Presents a deep learning model that generates global flood maps from a decade of Earth observation data (2014–2024). The project produces one of the most consistent global flood extent datasets to date, tested in real-world cases such as cropland risk assessments in Ethiopia and Kenya.
Role of Non-Traditional Data: Relies on Sentinel-1 Synthetic Aperture Radar (SAR), a European satellite that can “see” through clouds and operate day or night—critical for flood monitoring. The model also tested integrating additional geospatial datasets, namely soil moisture, elevation, and land cover to filter out false detections.
Why This Matters: Flooding is one of the most damaging climate risks, but reliable data are often missing. This work provides an open global flood dataset and codebase that can support disaster response, agricultural risk assessment, and long-term resilience planning in vulnerable regions.
Kar, Arunima. “India Is Using AI and Satellites to Map Urban Heat Vulnerability Down to the Building Level.” Wired, June 23, 2025. https://www.wired.com/story/india-is-using-ai-and-satellites-to-map-urban-heat-vulnerability-down-to-the-building-level/
Focus: Reports on initiatives in India, including work by researchers and city planners in Delhi and other cities, to generate hyperlocal maps of heat exposure and vulnerability. These maps identify which buildings and neighborhoods are at highest risk during extreme heat events, guiding local action plans.
Role of Non-Traditional Data: Combines satellite-derived land surface temperature, GIS layers, and AI-based classification of roof types to estimate indoor heat exposure. These data are further integrated with socioeconomic and infrastructural information to produce ward- and neighborhood-scale risk maps.
Why This Matters: Heat exposure can vary drastically even within a single block, yet most planning tools only operate at a city or regional scale. Building-level mapping allows governments to target interventions like cooling shelters, adjusted work hours, or greening programs where they are most urgently needed.
Nasios, Ioannis. “AI-driven multi-source data fusion for algal bloom severity classification in small inland water bodies: Leveraging Sentinel-2, DEM, and NOAA climate data.” arXiv, May 2, 2025. https://arxiv.org/abs/2505.03808
Focus: Proposes a machine learning framework to assess the severity of harmful algal blooms (HABs) in U.S. lakes and reservoirs. These blooms threaten drinking water, ecosystems, and human health, but are difficult to monitor with traditional sampling alone.
Role of Non-Traditional Data: Integrates four sources: Sentinel-2 satellite imagery, Landsat-8 imagery, NOAA’s High-Resolution Rapid Refresh climate model, which provides hourly atmospheric data at 3-kilometer resolution, and the Copernicus Digital Elevation Model, a 30-meter global dataset built from satellite radar interferometry. These combined inputs capture water conditions, local climate, and surface morphology to support scalable monitoring of algal blooms.
Why This Matters: Harmful algal blooms threaten drinking water, ecosystems, and public health, but are difficult to monitor at scale. This work shows how open environmental data can be combined through AI to create cost-effective, near-real-time early warning systems.
Gueterbock, Finn, Raul Santos-Rodriguez, & Jeffrey N. Clark. “Improving Local Air Quality Predictions Using Transfer Learning on Satellite Data and Graph Neural Networks.” Workshop paper at Tackling Climate Change with Machine Learning, ICLR 2025, April 23, 2025. https://arxiv.org/abs/2505.05479
Focus: Proposes a new way to estimate nitrogen dioxide (NO₂) pollution in cities that lack dense monitoring networks. The team used a type of machine learning called a graph neural network (GraphSAGE), which can model relationships across locations in a city. They first trained the model on London, where there are many monitoring stations, and then adapted it to Bristol, where data is sparse. The method effectively acts like a system of “virtual sensors,” improving prediction accuracy compared to standard approaches.
Role of Non-Traditional Data: Combines satellite observations from the Sentinel-5P satellite, which tracks air pollutants including NO₂ and aerosol levels, with ERA5 climate reanalysis data (a global dataset that reconstructs past weather and climate) and data from a limited number of ground-based sensors. By bringing together satellite and climate data with the few existing measurements, the system produces high-resolution air quality estimates for areas where fixed sensors are lacking.
Why This Matters: Air pollution remains one of the leading global health risks, yet monitoring networks are expensive and often concentrated in wealthier or larger cities. This study shows how advanced modeling, paired with satellite and climate data, can help fill those gaps at relatively low cost. Such approaches could support public health planning and environmental policy, especially in smaller or lower-income cities working toward cleaner air and climate resilience.
Community / Crowdsourced Data
Risley, Sarah C., Melissa L. Britsch, Joshua S. Stoll & Heather M. Leslie. “Mapping local knowledge supports science and stewardship.” Ambio, April 26, 2025. https://doi.org/10.1007/s13280-025-02170-4
Focus: Explores how local ecological knowledge can shape scientific research and conservation in Maine’s coastal shellfishing communities.
Role of Non-Traditional Data: Uses participatory mapping and interviews with harvesters and residents to capture observations of shellfish abundance, diversity, and habitat change. These community-sourced data informed new hypotheses and guided scientific site selection.
Why This Matters: Community members often hold detailed, place-based knowledge that is invisible in conventional datasets. By integrating that knowledge with formal research, the study created stronger science, more relevant monitoring, and greater community trust—offering a model for collaborative stewardship of marine resources.
Crowell, Maddy. “Mapping the Unmapped: Can a crowdsourced map of the world help save millions of people from climate disaster?” Grist, June 11, 2025. https://grist.org/solutions/can-a-crowdsourced-map-of-the-world-help-save-millions-of-people-from-climate-disaster/
Focus: Profiles the Humanitarian OpenStreetMap Team (HOT) and its local partners in St. Lucia, where volunteers are creating detailed maps of vulnerable communities that were previously invisible to official planning and disaster response.
Role of Non-Traditional Data: Uses crowdsourced geographic data added to OpenStreetMap, enhanced by satellite imagery, drone surveys, and on-the-ground mapping by young volunteers. The result is fine-grained data on streets, homes, and infrastructure that commercial maps often ignore.
Why This Matters: Accurate maps are critical for emergency planning, especially in areas threatened by hurricanes, floods, or other climate-driven disasters. By empowering local communities to generate their own datasets, HOT ensures that first responders and aid agencies can act quickly and effectively when crises strike.
Mason, Brittany M., Thomas Mesaglio, Jackson Barratt Heitmann, Mark Chandler, Shawan Chowdhury, Simon B. Z. Gorta, Florencia Grattarola, Quentin Groom, Colleen Hitchcock, Levi Hoskins, Samantha K. Lowe, Marina Marquis, Nadja Pernat, Vaughn Shirey, Shukherdorj Baasanmunkh & Corey T. Callaghan. “iNaturalist accelerates biodiversity research.” BioScience, July 28, 2025. https://doi.org/10.1093/biosci/biaf104
Focus: Documents how iNaturalist, a global citizen science platform for biodiversity observations, has become one of the most widely used data sources in ecology research.
Role of Non-Traditional Data: Builds on more than 200 million geotagged photos and audio records submitted by 3.3 million citizen scientists. Once validated, these records feed into biodiversity databases used by researchers worldwide.
Why This Matters: Traditional ecological monitoring is costly and limited in scope. iNaturalist has enabled thousands of peer-reviewed studies, from tracking invasive species to mapping climate impacts. It demonstrates how digital participation at scale can transform ecological science and strengthen our understanding of biodiversity change.
Urban Systems, Mobility & Planning
Mobility Data
Hatfield, Charles Reuven Starobin, Anna Kustar, Marcel Reinmuth, Constant Cap, Agraw Ali Beshir, Jacqueline M. Klopp, Alexander Zipf, James Rising, & Thet Hein Tun. “Lessons in Traffic: Nairobi’s School Term Congestion and Equity Challenges.” African Transport Studies, August 1, 2025. https://www.sciencedirect.com/science/article/pii/S2950196225000225
Focus: Analyzes how Nairobi’s traffic congestion changes during the school year compared to holiday periods, and what this reveals about equity and city planning.
Role of Non-Traditional Data: Draws on anonymized GPS-based road speed data from Uber Movement, covering almost all major roads in the city, to compare traffic flows across different times.
Why This Matters: The study shows congestion gets significantly worse during school terms—especially on secondary and tertiary roads and in wealthier neighborhoods. This suggests that school locations, transport options, and land use policies play a major role in traffic inequities, with direct implications for student wellbeing and urban planning decisions.
Zhang, Jinmeng, Bi Yu Chen, Chenxi Fu, Zehao Yuan & Donggen Wang. “Uncovering horizontal and vertical inequities of individual accessibility using mobile phone data.” Transportation Research Part D: Transport and Environment 143, June 2025. https://doi.org/10.1016/j.trd.2025.104755
Focus: Analyzes accessibility inequities in Shenzhen, China, by looking at how 2.12 million residents reach jobs, shops, and services. The study distinguishes between horizontal inequities (across individuals) and vertical inequities (those faced by vulnerable groups).
Role of Non-Traditional Data: Uses large-scale mobile phone tracking data, including call detail records, to capture how people actually move through the city, creating person-level accessibility measures instead of relying on static, area-based assumptions.
Why This Matters: Traditional planning often masks who is underserved and where. Mobile data provides a clearer picture of unequal access to opportunities and services, giving policymakers tools to design more equitable transport and land-use policies.
AI Analysis of Diverse Data
Wang, Qingyi, Yuebing Liang, Yunhan Zheng, Kaiyuan Xu, Jinhua Zhao & Shenhao Wang. “Generative AI for Urban Planning: Synthesizing Satellite Imagery via Diffusion Models.” arXiv, May 13, 2025. https://arxiv.org/abs/2505.08833
Focus: Introduces a generative AI system that produces realistic, satellite-style images of cities based on open data about land use, infrastructure, and the environment. The tool was tested in Chicago, Dallas, and Los Angeles and evaluated by both experts and the public.
Role of Non-Traditional Data: Draws on OpenStreetMap contributions and freely available satellite imagery to train the model. These sources anchor the AI-generated images in real-world conditions.
Why This Matters: City planning is often technical and difficult for the public to engage with. This approach shows how open data combined with AI can create intuitive, “what-if” visualizations that lower barriers to participation and make planning debates more accessible.
Ratti, Carlo. “We used AI to analyse three cities. It’s true: we now walk more quickly and socialise less.” The Guardian, August 18, 2025. https://www.theguardian.com/commentisfree/2025/aug/18/ai-walk-more-quickly-socialise-less-public-spaces
Focus: Compares 1970s film footage of public spaces in New York, Boston, and Philadelphia with recent video, using AI to measure how social behavior has changed. Findings show people now walk faster, spend less time lingering, and interact less often in public.
Role of Non-Traditional Data: Applies computer vision to digitized archival footage originally collected by urbanist William “Holly” Whyte, enabling fine-grained analysis of crowd movement and social interaction at a scale not possible with manual coding.
Why This Matters: Public space is central to civic life, but its social fabric appears to be thinning. AI provides a new way to track how our use of streets, parks, and plazas evolves, offering evidence to guide urban design interventions that encourage interaction, inclusivity, and resilience.
Pawelke, Andreas, Basma Albanna & Damiano Cerrone. “Unlock Your City’s Hidden Solutions: An AI-Powered Approach to Urban Transformation.” Medium, May 5, 2025. https://dppd.medium.com/unlock-your-citys-hidden-solutions-f69577fd0076
Focus: Introduces the DPPD method, which looks for “positive deviants”—communities that succeed under difficult conditions—and uses these examples to inspire broader urban improvements.
Role of Non-Traditional Data: Combines satellite imagery, mobile data, social media signals, and official statistics with AI analysis to detect neighborhoods that are outperforming expectations. Generative AI is then used to visualize possible interventions, which are refined through local engagement.
Why This Matters: Instead of only focusing on failures, this approach highlights hidden successes already working within cities. Scaling these lessons can save money, reduce risk, and build trust, while giving communities a stronger role in shaping urban transformation.
Migration
Social Media / Platform Data
Rodríguez-Sánchez, Alejandra, & Jasper Tjaden. “Estimating Irregular Migrant Stocks Using Social Media Data and Machine Learning.” MIrreM Briefing Paper, April 7, 2025. https://doi.org/10.5281/zenodo.14808984
Focus: Proposes a new way to estimate the size and characteristics of irregular migrant populations, blending official migration data with insights from social media platforms.
Role of Non-Traditional Data: Uses Facebook’s advertising audience data—specifically people who indicate they’ve “lived abroad”—as a stand-in for migrant presence. These data are paired with UN migration statistics and refined using machine learning (XGBoost) to estimate the share of migrants without legal status.
Why This Matters: Reliable numbers on irregular migration are hard to come by, and official figures are often incomplete or inconsistent. This approach produces estimates for 40 major destination countries that align with known benchmarks and provides a replicable method for governments and researchers to better understand an issue that is frequently politicized but poorly measured.
Kingsbury, Kathleen. “To Understand Global Migration, You Have to See It First.” The New York Times, April 17, 2025. https://www.nytimes.com/interactive/2025/04/17/opinion/global-migration-facebook-data.html#
Focus: Introduces the first global, month-by-month estimates of permanent migration flows (2019–2022) using anonymized location data from Meta, covering more than 180 countries. The dataset tracks long-term moves rather than tourism or business travel.
Role of Non-Traditional Data: Draws on the location history of billions of Meta users to estimate cross-border migration patterns, offering consistent, near-real-time information where official statistics are delayed or absent.
Why This Matters: Migration is one of the most powerful forces reshaping societies, yet reliable data is scarce. This new resource reveals trends such as the surge of Latin Americans moving to the U.S., Ukrainians displaced by war, and labor flows in Asia and the Gulf. Public release of the data provides a shared evidence base for debate and policy.
Food Security & Markets
Crowdsourced / Open Data
Adewopo, Julius, Bo Andree, Zacharey Carmichael, Steve Penson & Kamwoo Lee. “Real-time prices, real results: comparing crowdsourcing, AI, and traditional data collection.” World Bank Data Blog, April 30, 2025. https://blogs.worldbank.org/en/opendata/real-time-prices--real-results--comparing-crowdsourcing--ai--and
Focus: Reports on a study in northern Nigeria that tested three different ways to monitor food prices: traditional surveys, citizen submissions via mobile app, and AI-driven estimates.
Role of Non-Traditional Data: Draws on more than 100,000 price entries submitted by citizens across 150 markets and combines them with AI imputation methods to fill gaps and extend coverage. These results were validated against enumerator-led surveys.
Why This Matters: Food prices shift quickly and are critical for both households and governments. This work shows that crowdsourced and AI-generated data can complement official surveys, offering faster and more flexible tools for monitoring. Nigeria’s statistics bureau has already begun integrating these methods into a national dashboard.
Benassai-Dalmau, Robert, Vasiliki Voukelatou, Rossano Schifanella, Stefania Fiandrino, Daniela Paolotti & Kyriaki Kalimeri. “Unequal Journeys to Food Markets: Continental-Scale Evidence from Open Data in Africa.” Arxiv, May 12, 2025. https://arxiv.org/abs/2505.07913
Focus: Provides the first continent-wide assessment of how easily people in Africa can reach food markets, measuring travel times, market reach, and accessibility gaps across rural and urban areas.
Role of Non-Traditional Data: Draws on volunteered OpenStreetMap data, World Food Programme market datasets, WorldPop demographic estimates, and Meta’s satellite-derived wealth index. Together these sources allow for consistent, cross-country comparisons where government survey data are scarce or incomplete.
Why This Matters: Access to markets is closely tied to food security and economic opportunity. This study reveals stark inequalities between rural and urban areas and links these gaps to wealth levels. The framework offers policymakers a scalable tool to identify underserved regions and plan interventions that can reduce food insecurity.
Information Flows for Risk and Policy
Social Media Data
Salley, Christin, Nathan Fox & Alyssa Schubert. “Bridging the gap in flood risk communication: a comparative study of community and organizational social media posts using natural language processing.” Frontiers in Communication, May 9, 2025. https://doi.org/10.3389/fcomm.2025.1553746
Focus: Compares how communities and organizations communicated on social media during U.S. flood events in July–August 2022, looking at timing, content, sentiment, and emotional tone to assess whether official messaging aligned with public concerns.
Role of Non-Traditional Data: Examines about 118,000 Twitter/X posts geolocated to nine flood-affected states. Using natural language processing techniques—including entity recognition, topic modeling, and sentiment and emotion analysis—the study characterizes real-time exchanges between residents and institutions.
Why This Matters: Reveals gaps between what people needed during flooding and what agencies were communicating. The authors outline five recommendations for improving two-way risk communication, underscoring how social media analysis can highlight blind spots and help officials tailor responses in fast-moving disasters.
Public Web Data
Schütz, Moritz, Lukas Kriesch & Sebastian Losacker. “Mapping local government priorities: a web-mining approach for regional research.” Regional Science Policy & Practice, August 7, 2025. https://doi.org/10.1016/j.rspp.2025.100240
Focus: Develops a way to track and compare local government priorities in Germany by analyzing thousands of municipal websites. The study identifies more than 200 topics, including climate protection, urban planning, and business development.
Role of Non-Traditional Data: Uses web scraping and natural language processing to turn unstructured website text into structured indicators of policy priorities. This creates a new dataset not possible through surveys or interviews alone.
Why This Matters: Local governments shape everyday life, but their priorities are often hard to measure. This study shows how digital communications can be repurposed into research data, opening up new possibilities for transparency and comparative analysis of governance strategies.
Reflections
This round of use cases underscores both the promise and the limits of non-traditional data. A few themes stand out:
Applied examples exist, but research dominates. Some cases show data tied directly to policy or service delivery — like wastewater monitoring linked to CDC records. Most, however, remain academic studies or pilots. The balance suggests that translation into action and decisions is still the exception.
NTD often fills gaps rather than replaces official data. From migration flows to local price tracking, the value lies in covering blind spots where official statistics are missing or too slow. The strongest efforts combine NTD with administrative, survey, or clinical records to make insights more robust.
Certain data types are becoming go-to tools for recurring problems. Patterns are emerging around which non-traditional data sources best fit particular analytical tasks.
Satellite data are repeatedly used for environmental monitoring.
Wastewater for surveillance of pathogens.
Facebook/Meta location data is being tapped to estimate migration flows, filling a gap official statistics struggle with.
Participatory and crowdsourced mapping efforts are proving important for making unmapped communities visible for disaster response.
While many of these projects remain research-driven, the consistency of these pairings suggests a growing playbook of “fit-for-purpose” matches that practitioners can draw on.
Community contributions are most impactful where local value is visible. iNaturalist’s role in ecology research, participatory shellfish mapping, youth-led disaster mapping, and price reporting in Nigerian markets point to a pattern: when communities see direct value, data collection can be sustainable and scalable.
AI methodologies, such as machine learning and natural language processing (NLP), are playing a growing role in extracting insights from both structured and especially unstructured data, such as public web data.
Taken together, these examples reinforce a lesson from past editions: non-traditional data adds the most value when it is integrated with other sources, situated in local context, designed with communities, and governed with clear rules. The opportunity now to build the simple, repeatable pathways — validation, stewardship, standardization — that let promising work become durable practice.