Ecopop Data

A full “run” of ecopop creation results in a set of files that provide ecopop delineations and a suite of attributes and parameters we have identified as relevant to the E3SM-Land Model. This documentation outlines the structure and contents of these data. Note that additional data may be provided (for example, data related to streamflow gages and watersheds). Information about those tables are also provided here, but are not necessarily part of the core ecopop functionality.

Each “run” must be named, and that name is shown as {name} in the following documentation. Some filenames depend on the ids they represent, and these are shown in {} in the tree below. When the number of files in a directory can be variable, is used.

Directory and file structure

The following tree shows how ecopop exports are structured on disk. Note that all core files are contained in the parent directory (called {name}), while auxiliary/optional files will be in subdirectories that may or may not be present depending on the run parameters. That is, all files in the {name} directory will always exist for any ecopop export, but other directories (e.g. streamflow, watersheds, forcings, etc.) might not.

{name} <-- parent directory
├── {name}_epus.gpkg
├── {name}_epus.tif
├── {name}_epus.shp
├── {name}_epu_classes.gpkg
├── {name}_epu_classes.tif
├── {name}_areagrid.tif
├── {name}_adjacency.csv
├── LICENCE.txt
├── streamflow
   ├── {id_gage_0}.csv
   ├── {id_gage_...}.csv
   └── {id_gage_N}.csv
├── watersheds
   ├── {name}_basins.gpkg
   ├── {name}_epu_gages.csv
   └── {name}_gages.gpkg
├── forcings
   ├── daily
         ├── {hpu_id_0}.csv
         ├── {hpu_id_...}.cosv
         └── {hpu_id_N}.csv
   ├── hourly
         ├── {hpu_id_0}.csv
         ├── {hpu_id_...}.cosv
         └── {hpu_id_N}.csv

Core files

The following files will be present for any ecopop exports. Note that ecopop processing is all done in unprojected coordinates (EPSG:4326), but care is taken to correctly compute areas and distances when appropriate. All georeferenced outputs therefore also are in the EPSG:4326 coordinate reference system.

filename

description

{name}_`hpus.gpkg`_

GeoPackage containing the HPUs as polygons (or MultiPolygons) and some attributes. See the hpus.gpkg table for attribute list.

{name}_hpus.shp

ESRI shapefile of HPU boundaries; this is primarily used for upload to Google Earth Engine (GEE) as GEE does not support the .gpkg format.

{name}_hpus.tif

Geotiff of ecopop units. Pixel values represent the unit to which the HPU belongs and correspond to {name}_hpus.gpkg and {name}_hpus.shp. Provided as a convenience when raster math is faster.

{name}_hpu_classes.tif

Geotiff of ecopop classes. Pixel values represent the class to which the HPU belongs. A HP class is the “cluster” to which the pixel belongs. The maximum number of classes is set as a ecopop creation parameter. This file is essentially a rasterization of {name}_hpu_classes.gpkg.

{name}_hpu_classes.gpkg

GeoPackage of ecopop classes. Each polygon represents a connected cluster of pixels sharing the same class. The maximum number of classes is set as a ecopop creation parameter. This is a polygonization of the {name}_hpu_classes.tif file.

{name}_areagrid.tif

Geotiff for which pixel values represent the area of the pixel in square km. This is used for computing actual ecopop unit areas, as working in unprojected coordinate systems (4326) require a bit of extra work to estimate pixel areas in meaningful units (i.e. km instead of degrees).

{name}_adjacency.csv

Provides connectivity information among ecopop units. Contains two columns: hpu_id of each ecopop unit in the dataset, and adjacency that specifies each hpu_id’s neighbors as a comma-separated string.

Auxiliary files

Auxiliary files are not considered part of the core ecopop functionality, and therefore these files may not be present for a general user.

The watersheds directory contains files about streamflow gages and their watersheds. These data were obtained from the Veins of the Earth (VotE) data platform, which is currently only available to LANL employees. Information about VotE can be found here, and LANL collaborators can access the private VotE repository upon request.

The streamflow directory contains one csv per streamflow gage, and each filename corresponds to the id_gage provided by VotE (and found within the watersheds files). These data were obtained from the Veins of the Earth (VotE) data platform, which is currently only available to LANL employees. Information about VotE can be found here, and LANL collaborators can access the private VotE repository upon request.

The forcings directory contains meterologic and other time-series data either required for or useful to running E3SM-Land models on ecopop units. The data were sampled from the ERA5-Land Hourly dataset on Google Earth Engine (GEE). Some postprocessing to bring units to more standard formats (and other details) is performed, so the band descriptions provided by the GEE Data Catalog might not be exactly accuarte. Each csv file is named a corresponding ecopop id and contains time series. See the `hpus.gpkg`_ for more detailed descriptions of the contents of each file.

directory

filename

description

watersheds

{name}_basins.gpkg

watershed geometries and many attributes for which streamflow data exist; see detailed file description

watersheds

{name}_gages.gpkg

streamgage locations and many gage attributes for which streamflow data exist; see detailed file description

streamflow

{id_gage_n}.csv

daily time series of streamflow in cubic meters per second; contact devs for information about q_quality column if desired

forcings

{hpu_id_n}.csv

time series of ERA5-Land-derived variables for each ecopop unit; see detailed file description

Individual file contents

epus.gpkg

attribute

units

description

source

fid

unique feature id created upon export; essentially meaningless.

derived

hpu_id

unique ecopop id

derived

pop_mean

n people per 0.01 km^2

average population density across the HU

Worldpop Estimated Residential Population per 100x100m Grid Square for 2020, downloaded from GEE and generated by WorldPop.

area_sum

km^2

area of the HU

derived

hpu_class

class to which this HU belongs

derived

centroid_x

degrees

longitude of the HU centroid

derived

centroid_y

degrees

latitude of the HU centroid

derived

fmax

fmax parameter required by E3SM-Land Model

MERIT-DEM + GEE + custom function

elevation_mean

m.a.s.l.

average elevation across the HU

MERIT-DEM

elevation_std

m

standard deviation of elevation across the HU

MERIT-DEM

soil_depth_mean

m

average soil depth across the HU

Pelletier, 2016

topo_slope_mean

degrees

average topographic slope across the HU

Geomorpho90m slope

soc_U-L_mean

dg/kg

average soil organic carbon across the HU between upper depth (U) and lower depth (L)

SoilGrids 2.0 hosted on GEE

clay_U-L_mean

g/kg

average soil clay across the HU between upper depth (U) and lower depth (L)

SoilGrids 2.0 hosted on GEE

silt_U-L_mean

g/kg

average soil silt across the HU between upper depth (U) and lower depth (L)

SoilGrids 2.0 hosted on GEE

sand_U-L_mean

g/kg

average soil sand across the HU between upper depth (U) and lower depth (L)

SoilGrids 2.0 hosted on GEE

bdod_U-L_mean

cg/cm³

average soil bulk density across the HU between upper depth (U) and lower depth (L)

SoilGrids 2.0 hosted on GEE

lc_XXX

fraction of HU covered by land cover type XXX for the year 2015

MCD12Q1.006 MODIS Land Cover Type Yearly Global 500m for 2015 hosted on GEE

basins.gpkg and gages.gpkg

attribute name

description

id_gage

unique VotE id, negative values indicate records for which two or more gages were merged (i.e. duplicates)

source

usgs, hydat, bandas, rarcticnet, cdr, etc.

id_source

id provided by the source, stored as a string

station_name

provided by source dataset

river_name_source

directly copied from source or sometimes inferred

lat_source

provided latitude

lon_source

provided longitude

drainarea_km2_source

provided drainage area

elevation_m_source

provided altitude of gage

naturalish

indicator for pristine gages; methods vary among sources

in_camels

text; denotes which, if any, among CAMELS-like databases the gage appears in. Options are ‘usa’, ‘hysets’, ‘br’, ‘aug’, ‘cl’, ‘gb’, ‘aus’. Some USGS gages may appear in both ‘usa’ and ‘hysets’ and thus the field would read ‘usa,hysets’

start_date

YYYY-MM-DD date of gage’s first streamflow record

end_date

YYYY-MM-DD date of gage’s last streamflow record

span_years

total range of streamflow records, including missing data

fraction_valid

fraction of time between start_date and end_date containing observations

mapped_id_reach

id_reach within VotE corresponding to the reach on which the gage is located

mapped_from_provided

boolean; if True, indicates mapping was possible using the provided gage metadata and simple mapping rules

mapped_dist_km

distance between the provided gage location and the mapped gage location

mapped_method

int; provides information about the method used to perform the gage mapping

manmap_da_km2

for manually mapping gages; contains the drainage area to use when mapping

manmap_geom

for manually mapping gages; manually-specified location of the gage

manmap_method

describes which method was used in manual mapping

manmap_id_reach

for manually mapping gages; the manually-selected id_reach corresponding to the gage

Forcings csvs aka {epu_id_n}.csv

variable (band)

units

description

date

UTC time in YYYY-MM-DD HH:MM:SS

temperature_2m

deg C

Temperature of air at 2m above the surface of land, sea or in-land waters. 2m temperature is calculated by interpolating between the lowest model level and the Earth’s surface, taking account of the atmospheric conditions.

dewpoint_temperature_2m

deg C

Temperature to which the air, at 2 meters above the surface of the Earth, would have to be cooled for saturation to occur. It is a measure of the humidity of the air. Combined with temperature and pressure, it can be used to calculate the relative humidity. 2m dew point temperature is calculated by interpolating between the lowest model level and the Earth’s surface, taking account of the atmospheric conditions.

total_precipitation

mm

This is the spatially-averaged total precipitation across the polygon within the time step indicated by the date column. Accumulated liquid and frozen water, including rain and snow, that falls to the Earth’s surface. It is the sum of large-scale precipitation (that precipitation which is generated by large-scale weather patterns, such as troughs and cold fronts) and convective precipitation (generated by convection which occurs when air at lower levels in the atmosphere is warmer and less dense than the air above, so it rises). Precipitation variables do not include fog, dew or the precipitation that evaporates in the atmosphere before it lands at the surface of the Earth.

u_component_of_wind_10m

m/s

Eastward component of the 10m wind. It is the horizontal speed of air moving towards the east, at a height of ten meters above the surface of the Earth, in meters per second. Care should be taken when comparing this variable with observations, because wind observations vary on small space and time scales and are affected by the local terrain, vegetation and buildings that are represented only on average in the ECMWF Integrated Forecasting System. This variable can be combined with the V component of 10m wind to give the speed and direction of the horizontal 10m wind.

v_component_of_wind_10m

m/s

Northward component of the 10m wind. It is the horizontal speed of air moving towards the north, at a height of ten meters above the surface of the Earth, in meters per second. Care should be taken when comparing this variable with observations, because wind observations vary on small space and time scales and are affected by the local terrain, vegetation and buildings that are represented only on average in the ECMWF Integrated Forecasting System. This variable can be combined with the U component of 10m wind to give the speed and direction of the horizontal 10m wind.

surface_pressure

kPa

Pressure (force per unit area) of the atmosphere on the surface of land, sea and in-land water. It is a measure of the weight of all the air in a column vertically above the area of the Earth’s surface represented at a fixed point. Surface pressure is often used in combination with temperature to calculate air density. The strong variation of pressure with altitude makes it difficult to see the low and high pressure systems over mountainous areas, so mean sea level pressure, rather than surface pressure, is normally used for this purpose.

surface_runoff

mm

Some water from rainfall, melting snow, or deep in the soil, stays stored in the soil. Otherwise, the water drains away, either over the surface (surface runoff), or under the ground (sub-surface runoff) and the sum of these two is simply called ‘runoff’. This variable is accumulated from the beginning of the forecast time to the end of the forecast step. The units of runoff are depth in meters. This is the depth the water would have if it were spread evenly over the grid box. Care should be taken when comparing model variables with observations, because observations are often local to a particular point rather than averaged over a grid square area.

sub_surface_runoff

mm

Some water from rainfall, melting snow, or deep in the soil, stays stored in the soil. Otherwise, the water drains away, either over the surface (surface runoff), or under the ground (sub-surface runoff) and the sum of these two is simply called ‘runoff’. This variable is accumulated from the beginning of the forecast time to the end of the forecast step. The units of runoff are depth in meters. This is the depth the water would have if it were spread evenly over the grid box.

surface_solar_radiation_downwards

W/m2

Amount of solar radiation (also known as shortwave radiation) reaching the surface of the Earth. This variable comprises both direct and diffuse solar radiation. Radiation from the Sun (solar, or shortwave, radiation) is partly reflected back to space by clouds and particles in the atmosphere (aerosols) and some of it is absorbed. The rest is incident on the Earth’s surface (represented by this variable). To a reasonably good approximation, this variable is the model equivalent of what would be measured by a pyranometer (an instrument used for measuring solar radiation) at the surface. However, care should be taken when comparing model variables with observations, because observations are often local to a particular point in space and time, rather than representing averages over a model grid box and model time step. This variable is accumulated from the beginning of the forecast time to the end of the forecast step.

surface_thermal_radiation_downwards

W/m2

Amount of thermal (also known as longwave or terrestrial) radiation emitted by the atmosphere and clouds that reaches the Earth’s surface. The surface of the Earth emits thermal radiation, some of which is absorbed by the atmosphere and clouds. The atmosphere and clouds likewise emit thermal radiation in all directions, some of which reaches the surface (represented by this variable).