Artificial Intelligence for Enhancing Data Quality, Standardization & Integration
Challenge
Informed policy decisions require high quality data, but many data sources are inconsistent, incomplete, or difficult to use.
Data sources must go through an array of assessment, processing, and standardization steps before they can easily and accurately be used for analysis. While these activities are necessary even for structured survey data, they are even more extensive for nontraditional data sources such as administrative records, geospatial data, or sensor data.
The Advisory Committee on Data for Evidence Building (ACDEB) and the 2022 CHIPS and Science Act provide authority and recommendations for developing a future National Secure Data Service (NSDS), which will support evidence-based decision- making through improving access and usability of federal, state, and local government data assets. This demonstration project is part of the National Secure Data Service roadmap for developing a future shared service that can promote the development of high-quality data. The National Secure Data Service Demonstration (NSDS-D) envisions a future shared service that can streamline data preparation activities for the federal statistical system and promote the development of high-quality data.
Solution
NORC is exploring innovative applications of artificial intelligence to reduce the resources required to create high-quality data.
NORC’s solution begins with identifying the most promising areas for AI to streamline data preparation activities. We draw from interviews with federal statistical experts and subject matter experts in data quality, privacy, and ethics, as well as the literature and our experience preparing high-quality data sources. Our assessment activities will include considering types of data that require preparation, the challenges that need to be addressed with those data sources, existing tools, and relevant ethical or privacy concerns.
Results
Our toolkit will increase the quality and scope of data available to build evidence and support decision-making.
After identifying high-priority use cases for automation to support data standardization, integration, and quality, we are building, documenting, and packaging a toolkit that addresses these needs. Our toolkit will improve the accessibility and quality of data sources, especially non-traditional sources such as administrative records or geospatial data, which hold particular potential to shed new light on important policy questions and evidence gaps.
Related Tags
Project Leads
-
Emily R. Wiegand
Senior Data ScientistProject Director -
Beth Fisher
Senior Research Director ISenior Staff -
Zachary Seeskin
Senior StatisticianSenior Staff -
Mehmet Celepkolu
Senior Data ScientistSenior Staff -
Nola du Toit
Senior Research Methodologist & Data Visualization SpecialistSenior Staff