The Environmental Data Discovery and Transformation System

This user guide is based on the re-implementation of the EnDDaT user interface: https://cida.usgs.gov/enddat/

Overview

The Environmental Data Discovery and Transformation (EnDDaT) service is a tool used to discover data from a variety of data sources, aggregate and process the data, and perform basic transformations. The end result is that environmental data from multiple sources is compiled into a single table. This user guide will step through the process of specifying an area of interest, choosing data, processing and obtaining data. It also describes available transforms and processing available for the data.


Data Discovery

EnDDaT's data discovery starts with an area of interest. You can specify your area of interest in one of two ways: 1) click a point on the map or enter the latitude and longitude of a point and specify a search radius, or 2) by drawing a box. In either mode, you can upload a zipped shapefile for spatial reference so you know how particular sites relate to your study zone.

The image below shows a zip file containing four shapefile components that are required to show the data on a map (1). The bounding box draw and edit tools (2) are on the left side of the map just below the legend. When the edit tools are activated, as shown by 2, the corners can be dragged to change the size or shape of the box. The whole box can also be moved by clicking and dragging the white dot in the middle of the box (3). Once you are happy with your point and radius or your bounding box, you are ready to move on to Choose Data.

Choose Data

EnDDaT provides two methods for selecting data. The first suited to looking closely at what is available from particular sites and the other suited to getting all the data that is available for a given variable in your area of interest. They are called: 1) "Select from available variables one site at a time" and 2) "Select variables for many sites by variable type".

Select from available variables one site and a time

In this choose data mode, you specify data sources then discover the available variables for the sites which the chosen data sources offer. As shown in the image below, clicking on the Data Sources field (1) shows the data source options. The date filter (2) ensures that the sites shown have some data in the date range desired. After a data source is chosen and the optional date range is set, sites are then displayed on the map (3). When you click on a site, the variables available for the site are shown to the right of the map (4).


As shown in the image below, when a site is clicked on the map (1), and variables from that site are selected (2), those variables are added to the Selected Variables list for later use (3). Once all variables of interest are added to the Selected Variables list, you are ready to move on to the Process Data step.


Select variables for many sites by variable type

The second choose data method allows selection of all the variables of a specified type inside the area you've specified. In this case, you specify variables of interest and then have the option to deselect sites in the case that data from that site is not desired. As shown in the image below, variables of interest and an optional date filter are selected (1), sites can be clicked to deselect them (2), sites with a given variable are shown on the map based on the pull down menu on the right of the map (3), and all selected sites/variables are listed in the Selected Variables list (4). Once all variables of interest are selected, you are ready to move on to data processing.

Data Processing

EnDDaT's data processing service can apply temporal summaries, limit retrievals by date range or particular dates, and offer data formatting options.


In the image below, (1) shows that a 24 hour sum of the 1 hr total precipitation is requested for three precipitation sites and the 168 hour (7 day) mean is also being request for all the variables selected. Water-year 2015 is selected in the date filter (2), and the output has been configured to use NaN as the missing value and to delimit the file with commas (3). Time series processing is described in the Statistical Processing section below.


Click this ‘Upload Times’ button to upload a text file with a list of timestamps to filter the output data. If a file is not uploaded, the entire time series will be retrieved. The timezone of these dates/times will be defaulted to GMT if not specified. Acceptable formats:

7/5/2010 13:30
7/6/2010 09:30
7/7/2010 15:30
7/8/2010 13:30
7/9/2010 13:30

Or
07/05/2010 13:30
07/06/2010 09:30
07/07/2010 15:30
07/08/2010 13:30
07/09/2010 13:30

Or
07/05/2010 13:30 CDT
07/06/2010 09:30 CDT
07/07/2010 15:30 CDT
07/08/2010 13:30 CDT
07/09/2010 13:30 CDT

Obtaining Data

At this stage, the url to retrieve the data can be shown, the data can be downloaded, and a file containing site metadata can be downloaded. In the case that a data request for all sites/variables fits in one URL, as in the example shown for Select from available variables one site and a time, a single data processing URL is shown for retrieving all of the chosen data. In the case that a single processing URL would be too long, the Get data and Download buttons become inactive and only the Show data processing url and Download site metadata buttons can be clicked.

The site metadata file, like shown in the image below, contains 9 columns: the data source, unique site number, site name, longitude, latitude, elevation, elevation units, list of selected variables for that site, and the EnDDaT services processing URL to get the variables listed. This file can be used in a script that will iterate over the EnDDaT service URLs to download data for each site.

Here is a snipit of R code that can be used to download data from the sitemetadata file.

sitemetadata<-read.delim('./sitemetadata.tsv',stringsAsFactors = FALSE)
				dlFiles<-c()
				for(site in 1:length(sitemetadata$url)) {
				  fileName<-paste0(sitemetadata$dataset[site],'_',sitemetadata$siteNo[site],'.tsv')
				  message(paste('Downloading file', site, 'of', length(sitemetadata$url), '-', fileName))
				  dl<-download.file(sitemetadata$url[site],fileName)
				  if (dl == 0) dlFiles<-fileName
				}
				

Statistical Processing

The following statistical processes are available: mean (μ), minimum, maximum, and summation (Σ), difference (Δ), max difference (max Δ), and standard deviation (σ). They require the user to specify a time period (e.g. mean over 6 hours). The equations are as follows:

A simple example is provided to demonstrate how N is found. The requested period is 60 minutes.

Time (minutes) Value (x) Mean (μ) Min Max Summation (Σ) Difference (Δ) Max Diff (max Δ) St.Dev (σ)
0 1
15 3
30 2
45 1 1.75 1 3 7 0 2 0.829
60 1 1.75 1 3 7 -2 2 0.829
75 2 1.5 1 2 6 0 1 0.5
90 1 1.25 1 2 5 0 1 0.433