Skip Navigation

Soil Survey Data Integrity

Data Integrity Improvement Effort

Soil survey tabular data were originally stored in hard copy manuscripts. In the 1970’s, the data began to be stored in digital (i.e. computer) format. By 1985, all of the data had been incorporated into the State Soil Survey Database (SSSD). In 1994, the NASIS database was released, replacing the older SSSD and incorporating the data from the SSSD. Since then, the NASIS database has evolved to include more tables, columns, calculations, and interpretations. The long history of soil survey tabular data and the more recent changes to the NASIS database have resulted in data-population inconsistencies. These inconsistencies have always been acknowledged, and multiple attempts have been made to address the problem. The goal has always been to ensure that soil survey data meet the needs of conservation planning and Farm Bill programs.

The soil survey data integrity effort is the latest attempt to clean up the NASIS database. This effort is by far the most comprehensive approach to date. The effort reviewed all past data population standards and other requirements to develop a revised “minimum data population standard.” This new standard includes about 1/3 more data elements than the previous standard. This updated minimum standard lists the NASIS data elements that typically must be populated and committed to the Soil Data Warehouse to support conservation planning and Farm Bill programs.

The data integrity effort updates existing population standards by providing a revised “minimum standard” and a “full standard” for soil survey data that are committed to the Soil Data Warehouse. The minimum standard contains those core data elements that support certain current national programs. The full standard expands upon the minimum standard to include data elements that support other national needs, modeling efforts, and some State level issues. A report listing the NASIS data elements that make up the revised minimum data population standard is posted to the NASIS Data Integrity Report Site. The report is named “Minimum Standard (Legend, Mapunit, Data Mapunit).” The full standard is under development at the time of this release (2019) and will be provided when available.

The data integrity effort also provides NASIS check reports that identify potential errors, omissions, and inconsistencies in current NASIS data for those data elements in the minimum standard. The reports also provide queries for loading suspicious data into NASIS for correcting problems. These reports provide the best means available for assessing data quality and will assist with proper data population. The data integrity reports are applicable to national use and are an essential part of ensuring data comply to the current national minimum standard.

All existing map units, data mapunits, and components that are posted to Soil Data Warehouse for publication through Web Soil Survey should comply with the revised minimum standard. Data that met previous minimum standards should be reassessed to ensure compliance with the revised minimum standard. Any new map units created through soil survey update projects or some initial soil surveys should meet the full standard.  

The data integrity concept was developed by a subteam of the Soil and Plant Science Division Database Focus Team in 2018. The subteam included State and SPSD personnel, including selected State soil scientists, assistant State soil scientists, resource soil scientists, National Technology Support Center staff, National Soil Survey Center staff, senior regional soil scientists, soil data quality specialists, and MLRA soil survey office soil scientists. They represented a geographic cross-section of the United States.
 

NASIS Data Integrity Reports, Types and Function

The data integrity reports are accessed via (or through) from the Data Integrity Report Site in NASIS. The number of data check reports that are posted will increase as new reports are developed.  

Minimum Standard Report

The report “Minimum Standard (Legend, Mapunit, Data Mapunit)” lists the NASIS data elements that make up the revised minimum standard and includes the rationale for data element selection. In the report, the NASIS table and data elements are listed in the left two columns and are sequenced in the order they should be populated and used for running calculations and validations. This sequence, however, is not as critical when searching for data errors.

General data-population guidance is listed in the “Data Population Guidance” column as the guidance becomes available. The “Population Sequence” column lists for each data element a number that denotes the proper population sequence. The “ACRONYM” column lists a letter or group of letters corresponding to the NASIS table in which the data element resides. For example, “H” indicates the data element resides in the Horizon table and “CSM” indicates Component Soil Moisture. The last three columns list the data integrity reports that are available for evaluating the integrity of the data for the aligned data element. Only those reports that are marked “Ready to Use” are listed. As new reports are completed and marked ready to use, they will be listed in the minimum standard report.

This minimum standard provides the minimum level of data needed to meet certain national conservation program requirements; however, populating data beyond this standard is highly recommended where data are available.

Data Integrity Reports

Data integrity reports assist with proper data population. They evaluate the data in those data elements included in the minimum standard. They locate potential data population errors, omissions, and inconsistencies based on standards and guidance in the “National Soil Survey Handbook” and “Soil Survey Manual.”

Development of the data integrity reports is a dynamic process. The number of reports in the Data Integrity folder will change, and some reports may be modified to meet new program needs. A limited number of reports are available from the Data Integrity Report Site as of the release of the revised minimum standard (November 2019). Other reports will be added by the database subteam as the reports are completed and vetted.

Populated data are evaluated using three types of checks: (1) empty table check (when appropriate), (2) NULL fields check, and (3) data quality check. For efficiency, each report evaluates only one data element or a few interrelated data elements. This limit allows the user to focus on a small number of issues at a time.

The report names are a concatenation of a population sequence number, a NASIS Table ACRONYM, and a descriptor indicating the function of the report; for example, “03C-Miscellaneous Area Names Not Approved.” A lowercase letter may follow the Table ACRONYM to control how the reports sort in the Data Integrity folder. For example, “05Ca-RV Component Percent is NULL or Zero” is listed before “05Cb-RV Component Percent NOT Equal 100.”

The reports are designed to run on the local or national database. Running on the local database allows a user to tailor their selected set to assess individual soil survey office data, projects for a fiscal year, or individual projects. Running reports against the national database saves time by using report parameters to filter report output without having to load a selected set. Reports can be run offline against the national database for extra-large datasets.

The report description provides instructions on how to run each report. Report parameters allow the reports to run on smaller areas, such as an individual soil survey area (SSA), or larger areas, such as a State or soil survey region. The reports automatically include official data for major components, but parameters provide flexibility to also select unofficial data or minor components. All component kinds (series, taxadjunct, taxon above family, variants (obsolete), and miscellaneous areas) are automatically included in the horizon-level reports, but the parameters provide flexibility to exclude one or more of these in the report output. Parameters also provide the flexibility to query selected combinations of geographic applicability and map unit status.

The report output contains a title that corresponds to the report name; an explanation about what the report evaluates with general guidance for fixing data issues; a parameter choice list used to filter data; and an extensive table listing errors in the left column plus other columns listing the SSA, map units, and data mapunits in which the error resides and the data mapunit and mapunit ownership.

Text (including script and IDs of data containing errors) is generated at the end of the report. The text can be pasted into a NASIS query for building a selected set containing the data rows with errors (those listed in the table of the report). The report identifies potential errors and lists the ownership of those data. This information allows assignments to be made to the appropriate staff for correction. Also, the report output can be pasted into a spreadsheet for custom sorting and filtering.

Sequence for Running Reports

Generally, the reports in the Data Integrity Report Site are sorted in the sequence in which they should be used for populating data. However, users should focus on using the following 24 priority reports before running any other data integrity reports. These priority reports address fundamental issues that should be addressed first because many other data checks depend on the data examined by these. For example, “01cp-35HTa-Obsolete Lieutex” is used for correcting obsolete terms used in lieu of texture. It is critical to address the obsolete terms before performing other texture-related checks.

  • 01aM-Map Units with Data Mapunits in Different NASIS Site
  • 01Ca-Component Table is Missing
  • 01Cb-Horizon Table is Missing
  • 01COR-RV DMU NOT Indicated, Is Missing, or Has Multiple RVs
  • 01cp-35HTa-Obsolete Lieutex
  • 02Cb-Compname Contains Local Phase or is NULL
  • 02Cc-Major Compname NOT Consistent with Mapunit Name
  • 03C-Component Taxon Kind Equals Series and No OSD
  • 03C-Miscellaneous Area Names Not Approved
  • 03Ca-Component taxon kind is NULL
  • 04Ca-DMU with No Major Component
  • 04Cb-Majorcompflag NOT Indicated or is Minor Comp
  • 05Ca-RV Component Percent is NULL or Zero
  • 05Cb-RV Component Percent NOT Equal 100
  • 05Cc-Identical Major RV Component Percents in Same DMU
  • 27H-Horizons w/Incorrect/NULL Top/Bottom RV Depth/Thickness
  • 35HTb-Texture Class and Lieutex in Same Row
  • 35HTc-Multiple Tex Classes/Lieutex Terms and NOT Stratified
  • 35HTd-RV Texture Class not Consistent with RV Particle Size
  • 37HTGa-Texture Group Table is Missing
  • 37HTGb-RV Texture NOT Indicated or Multiple RVs Indicated
  • 37HTGc-Texture Group NOT Calculated or is NULL
  • 37HTGd-Stratified Single Textures
  • 37HTGe-RV Surface Texture NOT Equal Mapunit Texture Phase

Meeting the Minimum Standard

Any legend, mapunit, or data mapunit data populated in NASIS for the minimum standard data elements and published through Web Soil Survey should be assessed.

An efficient method for locating potential data population issues for the data elements included in the minimum standard is to cycle through all the data integrity reports for a specific area, such as a soil survey region, an MLRA soil survey office, or a soil survey area. Map units and data mapunits meet the minimum standard after all potential issues identified by the data integrity reports have been assessed and all data population problems have been resolved.

An effective approach to meet the minimum standard is to consistently apply the following organized process.

  1. Run a data integrity report for a specific area;
     
  2. Assess the report output and fix any data population problems;
     
  3. Rerun the same report for the same area to verify the issues were resolved;
     
  4. Repeat steps 2 and 3 until the report returns no errors or all issues are resolved; and
     
  5. Repeat steps 1 through 4 for the next sequenced data integrity report until all have been run for the selected area.

The minimum standard provides the minimum level of data population to meet certain national requirements; however, populating data beyond this standard (or to the full standard) is recommended where data are available. Data that were already populated for data elements that are not listed in the minimum standard should not be removed from the database.
 

Data Population Guidance to Meet the Minimum Standard

Although the primary goal is to resolve all data population errors, omissions, and inconsistencies, care should be taken when assigning new values. Resolutions should be based on existing soil information, such as benchmark soils data, pedon descriptions, lab data, established guides for estimation of soil properties and qualities, and relevant tacit knowledge. Published soil surveys are another resource to help resolve data population issues. Historical information, however, should be considered carefully. Science-based, data-driven information should be used to replace NULL (where appropriate) and incorrect values.

Users of the data integrity reports should understand that the report generates potential errors or inconsistences. Although most of the potential problems listed in a report are actual errors or inconsistencies, it is possible, but not probable, that a flagged problem may not be an error. Careful thought and analysis are needed to determine if the problems generated in the reports indeed need to be addressed. Guidance is provided in the leading paragraph of each report.
 

Using Data Integrity Reports to Develop Projects

The data integrity reports identify the work that is needed so that data comply with the revised minimum standard. The work can be managed in NASIS using TABULAR EDIT or MLRA project types. The data integrity reports can therefore be used as the basis for development of these projects. Some projects will be short-term. Others may be longer-term, future projects, depending on the actions that are needed to address the data issues.

Simple errors that can be addressed as they are discovered can be managed in NASIS using TABULAR EDIT projects. An example of such an error is an RV texture class that does not align with the RV sand, silt, and clay percentages. Other potential problems may require in-depth research, field-based data collection, or both. In these cases, the data integrity reports can be used as a basis for development of longer-term future projects, such as updating map unit names or outdated component names. These longer-term future projects can be managed as either TABULAR EDIT or MLRA project types.
 

How to Report Problems with the Data Integrity Reports

Questions, suggestions, and errors related to the data integrity reports can be submitted to the Database Focus Team. The primary contacts are Jeff.Thomas2@usda.gov and Kyle.Stephens@usda.gov.