Support for EPA’s Chemical Data Reporting Public Database

server showing data movement between systems

Project Brief

The Challenge

The Chemical Data Reporting rule under the Toxic Substances Control Act requires manufacturers to provide the U.S. Environmental Protection Agency with information on the production and use of chemicals in commerce. The CDR data reported in 2020 included over a million data records and 350 data elements. EPA needed to extract, compile, process, and sanitize these data from their CDR system but were running into data capacity and quality concerns due to the size and complexity of the data set. In addition, CDR data processing involves careful safeguarding and sanitization of confidential business information, which requires an understanding of the underlying chemical manufacturing data fields. EPA turned to ERG for support.

ERG's Solution

ERG developed a custom Excel XML schema that enabled extraction of additional and more accurate data compared to EPA’s existing 2020 CDR compiled data set. We then used this XML schema to extract, process, and sanitize all CDR data reported in 2020. The data processing effort required extensive knowledge of the Excel XML schema tool and Microsoft Access, as well as subject matter expertise in CDR information, reporting requirements, data structure, and CBI claims. To protect the confidentiality of CBI data, ERG developed a specialized database to apply EPA’s sanitization rules to the 2020 CDR data set using the CBI claims and company-level data. Using this approach, we were able to extract, process, and sanitize the 2020 CDR and provide EPA with a CDR database that the agency could then make publicly available in less than 4½ months from the project start. We also developed other materials (including the Database Dictionary) and communication information for EPA’s website.


U.S. Environmental Protection Agency