Saturday, October 22, 2016

Filling the Gaps in Environmental Science with Big Data

By Christina Burchette


Illustration of earth with data flowing in through spaceThere's no doubt about it-we're living in a data-driven age. Organizations of all kinds depend on data for things like decision making and problem solving, analyzing trends, understanding their customers, and doing research. EPA is certainly no exception.


At EPA we have a large computational science effort that focuses on predicting exposure and toxicity for the thousands of chemicals present in the environment. By combining high throughput testing methods with computational approaches, these “big data” projects aim to improve our understanding of what these exposures mean for public health and the environment.


By continually exploring new datasets and combining our datasets together, we can increase our ability to predict impacts in areas where we have little information or data. This also helps scientists identify vulnerabilities and data gaps that could benefit from additional attention or protection. This allows EPA to be more predictive and responsive of environmental and health impacts.


That's why we're pleased to announce that EPA has joined the National Consortium for Data Science (NCDS), which is a collaboration of leaders in various fields that work together to encourage data science research and identify data science challenges. As a member of the NCDS, EPA will have the opportunity to collaborate with other leaders in toxicity research and can incorporate cutting edge approaches in data science to build upon what we already know.


EPA already has several research projects being developed that both generate and use “big data.”



  • The Stream-Catchment (StreamCat) dataset is an extensive collection of landscape metrics for 2.6 million streams and associated catchments and watersheds within the continental United States. These data are being used by EPA to model reference conditions to which future assessments will be compared for the determination of important changes in stream and watershed condition.

  • EnviroAtlas provides interactive tools and large datasets for exploring the benefits people receive from nature, or “ecosystem goods and services.”  The available data covers the continental United States, and fine-scale data is available for selected communities.

  • The Web-based Interspecies Correlation Estimation (Web-ICE) is a user-friendly internet platform that uses several datasets to allow investigators to estimate the acute toxicity of a chemical to a species based on the known toxicity of the chemical to a surrogate species, since data is often not available or is limited for the majority of species within an ecosystem. Information on the acute toxicity to multiple species within an ecosystem is important for the assessment of the risks to individuals, populations and communities.

  • The Environmental Quality Index (EQI) is a dataset that includes an index of environmental quality based on criteria in five domains: air, water, land, build environment, and sociodemographic space that covers all 50 States at the county level. The EQI allows investigators to conduct association studies between environmental quality and specific health outcomes, such as the rate of preterm birth.  These results are useful in allowing communities to make decisions about effective public health interventions and also can direct further research to specific areas of concern.


Our partnership with NCDS is an opportunity to take on data science with the best minds and best technology possible so that we can continue to fill in data gaps. The more we know, the closer we will be to solving key problems related to air and water quality, human health, ecosystem sustainability, and more.


About the Author: Christina Burchette is an Oak Ridge Associated Universities contractor and writer for the science communication team in EPA's Office of Research and Development.






No comments:

Post a Comment