The National Academies offer expert advice for federal statistical agencies and others who create, support, and use our nation’s data infrastructure
Data infrastructure refers to systems that hold, link, and support access and analysis of data. Our nation’s data infrastructure underpins our national statistics, making it a vital resource for informing decision making and public policy.
Select a term to learn more about different data types.
Statistics is the science of collecting, analyzing, and interpreting data. The products of data analysis are also referred to as statistics.
Structured data is information that is in a standardized format following a pre-defined model. Examples include postal addresses, social security numbers, and survey data.
Nonstructured (or unstructured) data is information that is not organized according to a pre-defined model. Examples include audio recordings and PDFs.
Survey data is information collected scientifically, typically using standardized questionnaires and from a sample that can be considered to represent a broader population. Surveys are a primary source of data for producing national statistics; the national census is one example.
Administrative data is information collected by a government body as part of routine operations (but not explicitly for statistical purposes). Examples include tax records and vital records.
Private-sector data is information collected or generated by commercial and other nongovernmental entities. Examples include social media activity and store scanner data.
Toward a 21st Century National Data Infrastructure: Mobilizing Information for the Common Good
Surveys have traditionally been the cornerstone of national statistical data. Federal surveys collect information on the economy, health, crime, agriculture, education, and many other facets of life in America.
Bottom line:
Experts see a continuing role for surveys in national statistics, but the landscape has changed.
Administrative data—collected by government entities as part of their routine operations—exist at local, state, and federal levels. Examples of administrative data include tax records, vital records, criminal justice records, and information from Medicaid, Medicare, and nutritional assistance programs.
Bottom line:
Although they are not collected specifically for statistical purposes, administrative data can be useful for enhancing national statistical data and processes when thoughtfully integrated and properly vetted for fitness for use.
Private-sector data are collected or generated by commercial entities. Examples include data obtained from websites and social media platforms, transactional and store scanner data, and data collected by companies such as insurers and service providers.
Bottom line:
Statistical agencies can potentially partner with commercial entities to utilize data generated in the private sector, but special care is needed to mitigate the drawbacks.
Measuring Alternative Work Arrangements for Research and Policy
As agencies work to modernize data collection processes, it will be important to study the validity and reliability of new data sources and methods. Frameworks and coordination are needed to assess data quality and determine when alternative data sources are acceptable for use in producing national statistics.
Integrating multiple sources of survey, administrative, and private-sector data can enhance the content, timeliness, and granularity of national statistical data. However, this represents a significant undertaking and requires a systematic approach.
Effective partnerships are foundational to facilitating access to various data sources and gaining the context necessary to assess their quality and applicability for national statistics. In addition, transparency and effective communication are vital to ensuring the responsible collection, sharing, and use of data.
Equity is an essential consideration for any data system. Gaps in coverage, disparities in quality, and misrepresentation can lead to inaccurate data and information. Further, people should not be harmed through the collection and dissemination of their data. A variety of approaches can enhance equity with regard to what types of data are collected, from whom, and how data are accessed, interpreted, and used.
Privacy protection is a fundamental requirement for the nation’s data infrastructure. The usefulness of the information in the infrastructure depends upon the willingness of persons and entities to allow their data to be included. They must trust that their data will be used only for statistical purposes, and that they will not be harmed by anyone accessing their information. Data holders have a variety of technical tools and policy approaches that can be used in combination for effective management of risks of disclosure.
What does the data infrastructure of the future look like? What steps are needed to effectively transition our nation’s statistical systems to meet the challenges of a changing world? Our reports outline concrete steps toward closing gaps and supporting the country’s data needs.