Ecopop Data
A full “run” of ecopop creation results in a set of files that provide ecopop delineations and a suite of attributes and parameters we have identified as relevant to the E3SM-Land Model. This documentation outlines the structure and contents of these data. Note that additional data may be provided (for example, data related to streamflow gages and watersheds). Information about those tables are also provided here, but are not necessarily part of the core ecopop functionality.
Each “run” must be named, and that name is shown as {name} in the following documentation. Some filenames depend on the ids they represent, and these are shown in {} in the tree below. When the number of files in a directory can be variable, … is used.
Directory and file structure
The following tree shows how ecopop exports are structured on disk. Note that all core files are contained in the parent directory (called {name}
), while auxiliary/optional files will be in subdirectories that may or may not be present depending on the run parameters. That is, all files in the {name}
directory will always exist for any ecopop export, but other directories (e.g. streamflow
, watersheds
, forcings
, etc.) might not.
{name} <-- parent directory
├── {name}_epus.gpkg
├── {name}_epus.tif
├── {name}_epus.shp
├── {name}_epu_classes.gpkg
├── {name}_epu_classes.tif
├── {name}_areagrid.tif
├── {name}_adjacency.csv
├── LICENCE.txt
├── streamflow
├── {id_gage_0}.csv
├── {id_gage_...}.csv
└── {id_gage_N}.csv
├── watersheds
├── {name}_basins.gpkg
├── {name}_epu_gages.csv
└── {name}_gages.gpkg
├── forcings
├── daily
├── {hpu_id_0}.csv
├── {hpu_id_...}.cosv
└── {hpu_id_N}.csv
├── hourly
├── {hpu_id_0}.csv
├── {hpu_id_...}.cosv
└── {hpu_id_N}.csv
Core files
The following files will be present for any ecopop exports. Note that ecopop processing is all done in unprojected coordinates (EPSG:4326), but care is taken to correctly compute areas and distances when appropriate. All georeferenced outputs therefore also are in the EPSG:4326 coordinate reference system.
filename |
description |
---|---|
{name}_`hpus.gpkg`_ |
GeoPackage containing the HPUs as polygons (or MultiPolygons) and some attributes. See the hpus.gpkg table for attribute list. |
{name}_hpus.shp |
ESRI shapefile of HPU boundaries; this is primarily used for upload to Google Earth Engine (GEE) as GEE does not support the .gpkg format. |
{name}_hpus.tif |
Geotiff of ecopop units. Pixel values represent the unit to which the HPU belongs and correspond to {name}_hpus.gpkg and {name}_hpus.shp. Provided as a convenience when raster math is faster. |
{name}_hpu_classes.tif |
Geotiff of ecopop classes. Pixel values represent the class to which the HPU belongs. A HP class is the “cluster” to which the pixel belongs. The maximum number of classes is set as a ecopop creation parameter. This file is essentially a rasterization of {name}_hpu_classes.gpkg. |
{name}_hpu_classes.gpkg |
GeoPackage of ecopop classes. Each polygon represents a connected cluster of pixels sharing the same class. The maximum number of classes is set as a ecopop creation parameter. This is a polygonization of the {name}_hpu_classes.tif file. |
{name}_areagrid.tif |
Geotiff for which pixel values represent the area of the pixel in square km. This is used for computing actual ecopop unit areas, as working in unprojected coordinate systems (4326) require a bit of extra work to estimate pixel areas in meaningful units (i.e. km instead of degrees). |
{name}_adjacency.csv |
Provides connectivity information among ecopop units. Contains two columns: hpu_id of each ecopop unit in the dataset, and adjacency that specifies each hpu_id’s neighbors as a comma-separated string. |
Auxiliary files
Auxiliary files are not considered part of the core ecopop functionality, and therefore these files may not be present for a general user.
The watersheds
directory contains files about streamflow gages and their watersheds. These data were obtained from the Veins of the Earth (VotE) data platform, which is currently only available to LANL employees. Information about VotE can be found here, and LANL collaborators can access the private VotE repository upon request.
The streamflow
directory contains one csv per streamflow gage, and each filename corresponds to the id_gage provided by VotE (and found within the watersheds files). These data were obtained from the Veins of the Earth (VotE) data platform, which is currently only available to LANL employees. Information about VotE can be found here, and LANL collaborators can access the private VotE repository upon request.
The forcings
directory contains meterologic and other time-series data either required for or useful to running E3SM-Land models on ecopop units. The data were sampled from the ERA5-Land Hourly dataset on Google Earth Engine (GEE). Some postprocessing to bring units to more standard formats (and other details) is performed, so the band descriptions provided by the GEE Data Catalog might not be exactly accuarte. Each csv file is named a corresponding ecopop id and contains time series. See the `hpus.gpkg`_ for more detailed descriptions of the contents of each file.
directory |
filename |
description |
---|---|---|
watersheds |
{name}_basins.gpkg |
watershed geometries and many attributes for which streamflow data exist; see detailed file description |
watersheds |
{name}_gages.gpkg |
streamgage locations and many gage attributes for which streamflow data exist; see detailed file description |
streamflow |
{id_gage_n}.csv |
daily time series of streamflow in cubic meters per second; contact devs for information about q_quality column if desired |
forcings |
{hpu_id_n}.csv |
time series of ERA5-Land-derived variables for each ecopop unit; see detailed file description |
Individual file contents
epus.gpkg
attribute |
units |
description |
source |
---|---|---|---|
fid |
unique feature id created upon export; essentially meaningless. |
derived |
|
hpu_id |
unique ecopop id |
derived |
|
pop_mean |
n people per 0.01 km^2 |
average population density across the HU |
Worldpop Estimated Residential Population per 100x100m Grid Square for 2020, downloaded from GEE and generated by WorldPop. |
area_sum |
km^2 |
area of the HU |
derived |
hpu_class |
class to which this HU belongs |
derived |
|
centroid_x |
degrees |
longitude of the HU centroid |
derived |
centroid_y |
degrees |
latitude of the HU centroid |
derived |
fmax |
fmax parameter required by E3SM-Land Model |
MERIT-DEM + GEE + custom function |
|
elevation_mean |
m.a.s.l. |
average elevation across the HU |
|
elevation_std |
m |
standard deviation of elevation across the HU |
|
soil_depth_mean |
m |
average soil depth across the HU |
|
topo_slope_mean |
degrees |
average topographic slope across the HU |
|
soc_U-L_mean |
dg/kg |
average soil organic carbon across the HU between upper depth (U) and lower depth (L) |
SoilGrids 2.0 hosted on GEE |
clay_U-L_mean |
g/kg |
average soil clay across the HU between upper depth (U) and lower depth (L) |
SoilGrids 2.0 hosted on GEE |
silt_U-L_mean |
g/kg |
average soil silt across the HU between upper depth (U) and lower depth (L) |
SoilGrids 2.0 hosted on GEE |
sand_U-L_mean |
g/kg |
average soil sand across the HU between upper depth (U) and lower depth (L) |
SoilGrids 2.0 hosted on GEE |
bdod_U-L_mean |
cg/cm³ |
average soil bulk density across the HU between upper depth (U) and lower depth (L) |
SoilGrids 2.0 hosted on GEE |
lc_XXX |
fraction of HU covered by land cover type XXX for the year 2015 |
MCD12Q1.006 MODIS Land Cover Type Yearly Global 500m for 2015 hosted on GEE |
basins.gpkg and gages.gpkg
attribute name |
description |
---|---|
id_gage |
unique VotE id, negative values indicate records for which two or more gages were merged (i.e. duplicates) |
source |
usgs, hydat, bandas, rarcticnet, cdr, etc. |
id_source |
id provided by the source, stored as a string |
station_name |
provided by source dataset |
river_name_source |
directly copied from source or sometimes inferred |
lat_source |
provided latitude |
lon_source |
provided longitude |
drainarea_km2_source |
provided drainage area |
elevation_m_source |
provided altitude of gage |
naturalish |
indicator for pristine gages; methods vary among sources |
in_camels |
text; denotes which, if any, among CAMELS-like databases the gage appears in. Options are ‘usa’, ‘hysets’, ‘br’, ‘aug’, ‘cl’, ‘gb’, ‘aus’. Some USGS gages may appear in both ‘usa’ and ‘hysets’ and thus the field would read ‘usa,hysets’ |
start_date |
YYYY-MM-DD date of gage’s first streamflow record |
end_date |
YYYY-MM-DD date of gage’s last streamflow record |
span_years |
total range of streamflow records, including missing data |
fraction_valid |
fraction of time between start_date and end_date containing observations |
mapped_id_reach |
id_reach within VotE corresponding to the reach on which the gage is located |
mapped_from_provided |
boolean; if True, indicates mapping was possible using the provided gage metadata and simple mapping rules |
mapped_dist_km |
distance between the provided gage location and the mapped gage location |
mapped_method |
int; provides information about the method used to perform the gage mapping |
manmap_da_km2 |
for manually mapping gages; contains the drainage area to use when mapping |
manmap_geom |
for manually mapping gages; manually-specified location of the gage |
manmap_method |
describes which method was used in manual mapping |
manmap_id_reach |
for manually mapping gages; the manually-selected id_reach corresponding to the gage |
Forcings csvs aka {epu_id_n}.csv
variable (band) |
units |
description |
---|---|---|
date |
UTC time in YYYY-MM-DD HH:MM:SS |
|
temperature_2m |
deg C |
Temperature of air at 2m above the surface of land, sea or in-land waters. 2m temperature is calculated by interpolating between the lowest model level and the Earth’s surface, taking account of the atmospheric conditions. |
dewpoint_temperature_2m |
deg C |
Temperature to which the air, at 2 meters above the surface of the Earth, would have to be cooled for saturation to occur. It is a measure of the humidity of the air. Combined with temperature and pressure, it can be used to calculate the relative humidity. 2m dew point temperature is calculated by interpolating between the lowest model level and the Earth’s surface, taking account of the atmospheric conditions. |
total_precipitation |
mm |
This is the spatially-averaged total precipitation across the polygon within the time step indicated by the date column. Accumulated liquid and frozen water, including rain and snow, that falls to the Earth’s surface. It is the sum of large-scale precipitation (that precipitation which is generated by large-scale weather patterns, such as troughs and cold fronts) and convective precipitation (generated by convection which occurs when air at lower levels in the atmosphere is warmer and less dense than the air above, so it rises). Precipitation variables do not include fog, dew or the precipitation that evaporates in the atmosphere before it lands at the surface of the Earth. |
u_component_of_wind_10m |
m/s |
Eastward component of the 10m wind. It is the horizontal speed of air moving towards the east, at a height of ten meters above the surface of the Earth, in meters per second. Care should be taken when comparing this variable with observations, because wind observations vary on small space and time scales and are affected by the local terrain, vegetation and buildings that are represented only on average in the ECMWF Integrated Forecasting System. This variable can be combined with the V component of 10m wind to give the speed and direction of the horizontal 10m wind. |
v_component_of_wind_10m |
m/s |
Northward component of the 10m wind. It is the horizontal speed of air moving towards the north, at a height of ten meters above the surface of the Earth, in meters per second. Care should be taken when comparing this variable with observations, because wind observations vary on small space and time scales and are affected by the local terrain, vegetation and buildings that are represented only on average in the ECMWF Integrated Forecasting System. This variable can be combined with the U component of 10m wind to give the speed and direction of the horizontal 10m wind. |
surface_pressure |
kPa |
Pressure (force per unit area) of the atmosphere on the surface of land, sea and in-land water. It is a measure of the weight of all the air in a column vertically above the area of the Earth’s surface represented at a fixed point. Surface pressure is often used in combination with temperature to calculate air density. The strong variation of pressure with altitude makes it difficult to see the low and high pressure systems over mountainous areas, so mean sea level pressure, rather than surface pressure, is normally used for this purpose. |
surface_runoff |
mm |
Some water from rainfall, melting snow, or deep in the soil, stays stored in the soil. Otherwise, the water drains away, either over the surface (surface runoff), or under the ground (sub-surface runoff) and the sum of these two is simply called ‘runoff’. This variable is accumulated from the beginning of the forecast time to the end of the forecast step. The units of runoff are depth in meters. This is the depth the water would have if it were spread evenly over the grid box. Care should be taken when comparing model variables with observations, because observations are often local to a particular point rather than averaged over a grid square area. |
sub_surface_runoff |
mm |
Some water from rainfall, melting snow, or deep in the soil, stays stored in the soil. Otherwise, the water drains away, either over the surface (surface runoff), or under the ground (sub-surface runoff) and the sum of these two is simply called ‘runoff’. This variable is accumulated from the beginning of the forecast time to the end of the forecast step. The units of runoff are depth in meters. This is the depth the water would have if it were spread evenly over the grid box. |
surface_solar_radiation_downwards |
W/m2 |
Amount of solar radiation (also known as shortwave radiation) reaching the surface of the Earth. This variable comprises both direct and diffuse solar radiation. Radiation from the Sun (solar, or shortwave, radiation) is partly reflected back to space by clouds and particles in the atmosphere (aerosols) and some of it is absorbed. The rest is incident on the Earth’s surface (represented by this variable). To a reasonably good approximation, this variable is the model equivalent of what would be measured by a pyranometer (an instrument used for measuring solar radiation) at the surface. However, care should be taken when comparing model variables with observations, because observations are often local to a particular point in space and time, rather than representing averages over a model grid box and model time step. This variable is accumulated from the beginning of the forecast time to the end of the forecast step. |
surface_thermal_radiation_downwards |
W/m2 |
Amount of thermal (also known as longwave or terrestrial) radiation emitted by the atmosphere and clouds that reaches the Earth’s surface. The surface of the Earth emits thermal radiation, some of which is absorbed by the atmosphere and clouds. The atmosphere and clouds likewise emit thermal radiation in all directions, some of which reaches the surface (represented by this variable). |