New Blog Post
Unlocking Public Value with Non-Traditional Data: Recent Use Cases and Emerging Trends
Posted on 9th of April 2025 by Stefaan Verhulst, Adam Zable
Non-Traditional Data (NTD)—digitally captured, mediated, or observed data such as mobile phone records, online transactions, or satellite imagery—is reshaping how we identify, understand, and respond to public interest challenges. As part of the Third Wave of Open Data, these often privately held datasets are being responsibly re-used through new governance models and cross-sector collaboration to generate public value at scale.
In our previous post, we shared emerging case studies across health, urban planning, the environment, and more. Several months later, the momentum has not only continued but diversified. New projects reaffirm NTD’s potential—especially when linked with traditional data, embedded in interdisciplinary research, and deployed in ways that are privacy-aware and impact-focused.
This update profiles recent initiatives that push the boundaries of what NTD can do. Together, they highlight the evolving domains where this type of data is helping to surface hidden inequities, improve decision-making, and build more responsive systems:
Financial Inclusion
Public Health and Well-Being
Socioeconomic Analysis
Transportation and Urban Mobility
Data Systems and Governance
Economic and Labor Dynamics
Digital Behavior and Communication
FINANCIAL INCLUSION
Jacobson, Jon. “How grocery shopping data is unlocking financial inclusion.” World Economic Forum. March 27, 2025. https://www.weforum.org/stories/2025/03/how-grocery-shopping-data-is-unlocking-financial-inclusion/
Focus: Using grocery shopping behavior as a proxy for creditworthiness to expand financial inclusion.
Role of Non-Traditional Data: Analyzed grocery transaction data using privacy-preserving technologies, combined with banking information, to assess individuals lacking formal credit histories. AI models identified behavioral signals—such as spending regularity and product choices—indicative of creditworthiness.
Impact: In South Africa, the initiative evaluated 8 million individuals previously deemed unscorable, with 3.2 million qualifying for credit. One bank projected a 29% revenue increase. The model is now considered a scalable blueprint for global financial inclusion.
PUBLIC HEALTH AND WELL-BEING
Menichetti, Giulia, Babak Ravandi & Albert-László Barabási. “Whole Foods vs. Walmart: New research reveals hidden realities of ultra-processed foods in stores.” Northeastern Global News, January 14, 2025. https://news.northeastern.edu/2025/01/14/ultraprocessed-foods-grocery-stores/
Focus: Assessing the prevalence and variation of ultra-processed foods across major U.S. grocery retailers.
Role of Non-Traditional Data: Researchers collected online ingredient lists from Target, Whole Foods, and Walmart and used machine learning to rate each product’s level of processing. The results were compiled into a public database (GroceryDB) and made searchable through the Truefood platform.
Impact: The study revealed the widespread presence of ultra-processed foods across major retailers and highlighted how food processing levels vary by store and product category. The findings offer new tools for consumers to make informed choices and for retailers to evaluate and improve the nutritional quality of their offerings.
Zhang, Miao, Salman Rahman, Vishwali Mhasawade & Rumi Chunara. "Utilizing big data without domain knowledge impacts public health decision-making." Proceedings of the National Academy of Sciences, September 17, 2024. https://doi.org/10.1073/pnas.2402387121
Focus: Evaluates how relying on street-level imagery (Google Street View) and AI to assess neighborhood walkability can mislead public health interventions if behavioral factors are not considered.
Role of Non-Traditional Data: Analyzed 2 million Google Street View images using computer vision to identify sidewalks and crosswalks, linking these features to obesity and diabetes rates in New York City.
Impact: Found that physical inactivity is a key mediator—addressing behavioral factors yielded greater health benefits than modifying the built environment alone. The study cautions that using big data without understanding causal pathways can lead to flawed policy decisions.
Khan, Shehroz S., Pratik K. Mishra, Bing Ye, Smit Patel, Kristine Newman, Alex Mihailidis, and Andrea Iaboni. "A Novel Multi-modal Sensor Dataset and Benchmark to Detect Agitation in People Living with Dementia in a Residential Care Setting." ACM Transactions on Computing for Healthcare, 2025. https://doi.org/10.1145/3720550
Focus: Introduced a benchmark dataset to support automated, real-time detection of agitation in people with dementia, using physiological signals collected in a residential care setting.
Role of Non-Traditional Data: Captured over 600 days of multimodal sensor data—including acceleration, blood volume pulse, electrodermal activity, and temperature—from 20 residents. Agitation events were annotated using nurse reports and validated through video footage.
Impact: Provides a high-quality, publicly available dataset to accelerate research on AI-driven, privacy-preserving monitoring systems in dementia care. Supports development of early warning tools that could improve staff response, reduce injuries, and enhance care quality in institutional environments.
Cummins, Jack A., Daniel J. Gottlieb, Tamar Sofer, and Danielle A. Wallace. "Applying Natural Language Processing Techniques to Map Trends in Insomnia Treatment Terms on the r/Insomnia Subreddit: Infodemiology Study." Journal of Medical Internet Research 27, 2025. https://doi.org/10.2196/58902
Focus: Analyzed online discussions to track how insomnia treatments—both prescription and over-the-counter—are discussed and perceived by the public.
Role of Non-Traditional Data: Applied natural language processing to 340,130 Reddit comments from r/insomnia (2008–2022), extracting sentiment and frequency trends for 35 treatment terms including medications, cognitive behavioral therapy for insomnia (CBT-I), and herbal remedies..
Impact: Revealed evolving user attitudes and uptake of treatments like CBT-I and melatonin. Captured real-world behavior and sentiment often missing from clinical datasets, providing a complementary lens for public health monitoring of sleep-related interventions..
Hswen, Yulin, Qiuyuan Qin, Pressley Smith, et al. "Sentiments of Individuals with Interstitial Cystitis/Bladder Pain Syndrome Toward Pentosan Polysulfate Sodium: Infodemiology Study." JMIR Formative Research 9, 2025. https://doi.org/10.2196/54209
Focus: Explored patient sentiment and experiences related to the drug pentosan polysulfate sodium (PPS) for treating interstitial cystitis/bladder pain syndrome (IC/BPS).
Role of Non-Traditional Data: Analyzed 354 posts from the Inspire online health forum using topic modeling and sentiment analysis to extract themes and emotional tone.
Impact: Found a balanced, neutral sentiment toward PPS, with users weighing side effects against benefits. Demonstrated how online support communities offer nuanced insights into patient perspectives, informing shared decision-making and supplementing clinical trial evidence.
Cesnakova, Lucia, Benjamin Vandendriessche, and Jennifer Goldsack. "Advancing the Use of Sensor-Based Digital Health Technologies (sDHTs) for Mental Health Research and Clinical Practice." Wellcome Open Research, February 2025. https://datacc.dimesociety.org/mental-health/#report
Focus: Developed a strategic roadmap to support the broader adoption of sensor-based digital health technologies (sDHTs) in mental health care, particularly for conditions like depression, anxiety, and psychosis.
Role of Non-Traditional Data: Synthesized evidence and expert insights from around the world on the use of passively collected sensor data—such as sleep, movement, heart rate, and smartphone interaction patterns—to detect behavioral and physiological markers of mental health.
Impact: Outlined actionable recommendations to validate, standardize, and scale sDHTs for use in both research and clinical settings. Aims to enable more personalized, accessible, and continuous mental health support through real-time, data-driven monitoring.
Rajendran, Rajesh Kanna, Kokila S. S., Helen K. Joy, et al. "Leveraging Social Media and Natural Language Processing for Early Detection of Depressive Disorders." In Demystifying the Role of Natural Language Processing (NLP) in Mental Health, IGI Global, 2025. https://doi.org/10.4018/979-8-3693-4203-9.ch011
Patro, Pramoda, Thulasi Bikku, Harinadh Karimikond, et al. "Analyzing Social Media Status to Predict Depression by NLP and ML." In Demystifying the Role of Natural Language Processing (NLP) in Mental Health, IGI Global, 2025. https://doi.org/10.4018/979-8-3693-4203-9.ch003
Focus: These companion chapters investigate how social media and natural language processing (NLP) can be used to detect early signs of depression based on users’ online behavior and expression.
Role of Non-Traditional Data: Analyzed thousands of text posts from platforms such as Twitter, Facebook, Reddit, and Instagram. Used NLP techniques—like sentiment analysis, topic modeling, and lemmatization—and trained machine learning models to classify depressive content.
Impact: Achieved high accuracy (often over 90%) in identifying depressive symptoms. Validated social media as a cost-effective, scalable source for mental health screening. Emphasized the need for multilingual, explainable AI to support equitable and responsible deployment in clinical and public health contexts.
St-Onge, Guillaume, Jessica T. Davis, Laurent Hébert-Dufresne, et al. "Pandemic monitoring with global aircraft-based wastewater surveillance networks." Nature Medicine 31 (2025): 788–796. https://www.nature.com/articles/s41591-025-03501-4
Focus: Assessed the potential of aircraft-based wastewater surveillance networks to detect emerging pathogens—such as novel SARS-CoV-2 variants—by modeling how disease signals travel through global air routes.
Role of Non-Traditional Data: Combined viral RNA detection in aircraft wastewater with airline movement data and epidemiological modeling. Simulated global pathogen spread and detection across strategically selected airport sentinel sites.
Impact: Showed that a network of just 10–20 airports could identify outbreaks weeks earlier than existing surveillance systems, including inferring outbreak origins and transmissibility. Validated aircraft wastewater as a scalable, cost-effective early warning tool for global biosurveillance.
SOCIOECONOMIC ANALYSIS
Naushirvanov, T., Elejalde, E., Kalimeri, K., Omodei, E., Karsai, M. & Ferres, L. “Evacuation patterns and socioeconomic stratification in the context of wildfires in Chile.” arXiv. October 8, 2024. https://arxiv.org/abs/2410.06017
Focus: Analyzed how socioeconomic status influences evacuation patterns during wildfires in Valparaíso, Chile.
Role of Non-Traditional Data: Used high-resolution mobile phone records from Telefónica to track individual movements before, during, and after evacuation events. Researchers inferred home locations and socioeconomic profiles based on census zones to assess disparities.
Impact: Lower-income individuals evacuated later, remained away longer, and traveled shorter distances, highlighting structural inequalities in disaster response. The study also validated mobile phone data against Meta’s crisis datasets, suggesting potential for combining both to enhance real-time disaster planning.
Xu, Fengli, Qi Wang, Esteban Moro, et al. "Using human mobility data to quantify experienced urban inequalities." Nature Human Behaviour, 2025. https://doi.org/10.1038/s41562-024-02079-0
Focus: Introduced a new framework for measuring urban inequality through mobility patterns, emphasizing disparities in social interactions, facility access, and adaptability to shocks like pandemics or natural disasters.
Role of Non-Traditional Data: Synthesized large-scale, passively sensed human mobility data from smartphones, ride-hailing, shared bike, and social media platforms to model dynamic person–place networks over time.
Impact: Provided a scalable, data-driven method to trace lived urban inequalities beyond residential segregation. Enables policymakers to identify vulnerable populations and design equitable urban interventions responsive to real-world mobility behavior.
Bailey, Michael, Drew M. Johnston, Theresa Kuchler, Ayush Kumar, and Johannes Stroebel. "Cross-Gender Social Ties Around the World." NBER Working Paper No. 33480, February 2025. https://www.nber.org/papers/w33480
Focus: Measures global patterns of cross-gender friendships on Facebook to explore social integration and attitudes toward gender equality across nearly 200 countries.
Role of Non-Traditional Data: Analyzed 1.38 trillion Facebook friendship ties from 1.8 billion users, using tie strength and gender metadata to develop subnational indicators of cross-gender connectedness.
Impact: Found that regions with more frequent cross-gender friendships tend to show stronger support for women’s rights in education, employment, and politics. Introduces a scalable, privacy-preserving proxy for measuring gender-based social segregation and its links to cultural and institutional norms.
Lavista Ferres, Juan M. "What 40 Million Devices Can Teach Us About Digital Literacy in America." Microsoft AI for Good Lab, February 15, 2025. https://www.linkedin.com/pulse/what-40-million-devices-can-teach-us-digital-literacy-lavista-ferres-50pyc
Focus: Mapped digital literacy across 28,000 U.S. ZIP codes using behavioral data from 40 million Windows devices, revealing how Americans engage with technology beyond broadband access.
Role of Non-Traditional Data: Used anonymized device-level software usage patterns to build two indices, which were released publicly: one measuring general digital engagement, the other focused on content creation and computational skills. Provided a detailed behavioral view of digital literacy absent from traditional surveys.
Impact: Revealed significant digital skill gaps tied to income and education, even in areas with strong broadband coverage. Exposed overlooked disparities within cities and helps inform more targeted digital inclusion strategies.
TRANSPORTATION AND URBAN PLANNING
Urbano, Valeria Maria, Marika Arena & Giovanni Azzone. “Big data for decision-making in public transport management: A comparison of different data sources.” Research in Transportation Business & Management, March 2025. https://doi.org/10.1016/j.rtbm.2025.101298
Focus: Evaluated how big data sources can support planning, operations, and performance measurement in public transport.
Role of Non-Traditional Data: Researchers compared smart card, mobile phone, and automatic vehicle location data (AVL) from Lombardy’s transport systems, assessing each source’s quality and suitability using a framework structured across decision-making domains.
Impact: The study found that each data source offers distinct strengths and limitations, with mobile phone data supporting long-term planning, smart cards enabling detailed user segmentation, and AVL data enhancing real-time service monitoring. The findings underscore the value of data integration and help guide operators in selecting appropriate datasets for specific transit decisions.
Guo, Hao, Weiyu Zhang, Junjie Yang, et al. “Data driven discovery of human mobility models.” arXiv, January 10, 2025. https://arxiv.org/abs/2501.05684
Focus: Automatically discovering mathematical models of human mobility to improve urban planning, transportation systems, and epidemic response.
Role of Non-Traditional Data: Researchers applied symbolic regression to large-scale mobility datasets—including mobile phone data and census commuting flows—from China, the UK, and the US. The algorithm generated interpretable mobility models directly from observed data without relying on pre-defined formulas.
Impact: The method identified both established and novel mobility models, revealing spatial heterogeneity and improving prediction accuracy. The findings can inform targeted infrastructure design, public transit planning, and regional disease control strategies by offering a more precise, data-driven understanding of how people move.
Yabe, Takahiro, Bernardo García Bulle Bueno, Morgan R. Frank, Alex Pentland & Esteban Moro. "Behaviour-based dependency networks between places shape urban economic resilience." Nature Human Behaviour 9 (2025): 496–506. https://www.nature.com/articles/s41562-024-02072-7
Related Visualization: Invisible Urban Dependencies: A Story Map by Northeastern University. https://www.socialurban.net/dependency/#
Focus: Explored how human mobility patterns create hidden interdependencies between urban locations, influencing how economic shocks—such as pandemics—cascade through cities.
Role of Non-Traditional Data: Used anonymized GPS traces from over 1 million devices across five U.S. cities, combined with point-of-interest (POI) data from SafeGraph. Built networks based on co-visitation patterns to model how foot traffic to one location affects broader urban activity.
Impact: Revealed that behavior-based dependency networks predict economic ripple effects with over 40% greater accuracy than distance-based models. Provided policymakers with tools to simulate how closures (e.g., schools, offices, transit hubs) can trigger cascading impacts—enabling more resilient, system-aware urban planning and infrastructure strategies.
DATA SYSTEMS AND GOVERNANCE
Esko, Siim & Janet McLaren. A Roadmap to Accessing Mobile Network Data for Statistics. Global Partnership for Sustainable Development Data. March 2025. https://www.data4sdgs.org/resources/roadmap-accessing-mobile-network-data-statistics
Focus: Provided guidance for national statistical systems on systematically accessing and integrating mobile network data into official statistics.
Role of Non-Traditional Data: Outlined how mobile phone metadata—such as location traces and call records—can be repurposed to generate or enhance national indicators on population dynamics, mobility, migration, tourism, and disaster response. Synthesized global case studies and presents actionable steps for overcoming legal, technical, and institutional barriers to telecom data access.
Impact: Equips national statistics offices with a practical framework to form data partnerships with mobile operators. Has already supported mobile data initiatives in over 30 countries, boosting the quality, timeliness, and granularity of official statistics—particularly in low- and middle-income settings—and advancing responsible data governance.
Goldfarb, Danielle. "Digital Data and Advanced AI for Richer Global Intelligence." Centre for International Governance Innovation (CIGI). March 2025. https://www.cigionline.org/publications/digital-data-and-advanced-ai-for-richer-global-intelligence/
Focus: Examined how digital data and AI—particularly large language models—can complement or replace traditional public metrics in contexts of delay, manipulation, or absence.
Role of Non-Traditional Data: Surveyed over a dozen global projects using digital traces—such as satellite imagery, online prices, ship sensors, mobile phone metadata, and job postings—paired with AI to generate real-time insights on inflation, inequality, pandemic spread, and supply chains.
Impact: Shows how combining non-traditional data with advanced analytics can close key information gaps in global governance. Calls for governments to treat these tools as public infrastructure—accessible, accountable, and designed with the public interest at their core.
ECONOMIC AND LABOR DYNAMICS
Gimbel, Martha & Ernie Tedeschi. "The Economic Data You Need to Make Decisions Through Volatility." Harvard Business Review, March 17, 2025. https://hbr.org/2025/03/the-economic-data-you-need-to-make-decisions-through-volatility
Focus: Provided strategic guidance on managing economic uncertainty by incorporating faster, more adaptive data sources alongside official statistics.
Role of Non-Traditional Data: Highlighted the growing relevance of alternative indicators—such as real-time job postings, government spending data, and other high-frequency economic signals—for tracking volatility and identifying inflection points.
Impact: Promoted the use of agile, non-traditional data to enhance decision-making in policy and business during periods of rapid change. Emphasized complementarity with official statistics rather than replacement, reinforcing the value of a mixed-data ecosystem.
Lee, Kwok Kin. "Leveraging LinkedIn Data to Understand the Green and Digital Transformations in the Labor Market and the Future of Work." LinkedIn Jobs and Development, March 28, 2025. https://datapartnership.org/updates/leveraging-linkedin-data-to-understand-the-green-and-digital-transformations-in-the-labor-market-and-the-future-of-work/
Focus: Explored how digitalization and decarbonization are reshaping labor markets worldwide. Analyzed real-time trends in job postings and skill demands to support more adaptive workforce development and policy design.
Role of Non-Traditional Data: Used LinkedIn’s labor market data—spanning job postings, hiring trends, and the prevalence of digital and green skills.
Impact: International organizations including the World Bank and IMF leveraged these insights to compare patterns across regions and sectors, helping policymakers anticipate geographic and sectoral disparities in the future of work. Informed targeted reskilling initiatives and identified early signals of AI and green job growth.
DIGITAL BEHAVIOR AND COMMUNICATION
Desiderio, Antonio, Anna Mancini, Giulio Cimini & Riccardo Di Clemente. "Highly engaging events reveal semantic and temporal compression in online community discourse." PNAS Nexus 4, no. 3, March 2025. https://doi.org/10.1093/pnasnexus/pgaf056
Focus: Explored how offline events reshape online conversation dynamics, revealing that high-attention events trigger faster, more emotionally charged, and linguistically repetitive discussions across Reddit political and sports communities.
Role of Non-Traditional Data: Analyzed over 60 million Reddit posts and comments from 2020–2021. Applied temporal and semantic metrics to measure how discourse structure and user behavior respond to real-world events.
Impact: Found that intense offline events consistently compress online communication—accelerating interaction speed and narrowing linguistic diversity. Highlighted implications for content moderation, misinformation risks, and understanding digital collective behavior during crises or major news moments.
REFLECTIONS
The use cases above illustrate the potential and variety of non-traditional data in addressing public interest challenges - from improving dementia care or modeling economic ripple effects through urban foot traffic. A few key patterns stand out from this latest set of examples:
Health remains a leading domain, especially for behavioral and mental health insights. Sensor data, social media posts, and even aircraft wastewater are being repurposed to detect symptoms, surface patient sentiment, or expand the evidence base beyond clinical settings—often in near real time.
Mobility data is gaining prominence as a tool to understand inequality in cities. In particular, it has been leveraged to help track how people interact with infrastructure, adapt during crises, and move through cities in new ways.
Platform data is increasingly being used as a proxy for lived realities. Facebook friendship ties, LinkedIn job trends, and Reddit conversations can provide insights into gender norms, labor market shifts, and evolving health behaviors- turning “digital footprints” into social indicators.
Governance and infrastructure questions are surfacing more explicitly. As seen in the mobile data roadmap for national statistics and calls to treat AI-enhanced metrics as public infrastructure, there’s growing recognition that NTD re-use requires not just technical solutions, but sustainable, responsible governance.
NTD is often complementary, rather than substitutive, to traditional data. Many of the strongest use cases integrate NTD with traditional sources (e.g., census, clinical, or administrative data) to produce more robust, context-rich insights. This hybrid approach can improve coverage, legitimacy, and policy relevance, especially in under-resourced or rapidly evolving settings.
Yet despite the momentum, much of this work remains fragmented or experimental. Moving from pilots to systems change will require key shifts, including:
Legitimacy: Strengthen methodological approaches to address bias and representation, so that NTD earns broader trust among policymakers and researchers.
Social License: Work to ensure responsible reuse by developing participatory methods—tools that involve communities in setting research priorities and interpreting findings—rather than extracting data from them.
Data Stewards: Invest in the human infrastructure needed to connect data initiatives with public value—individuals and teams with the mandate and expertise to identify data needs, broker partnerships, set up governance frameworks, and embed responsible use.