Data Lineage (if applicable). Please include versions (e.g., input and forcing data, models, and coupling modules; instrument measurements; surveys; sample collections; etc.)
1. Description of methods used for collection/generation of data:
A. Soil Landscapes of Canada (SLC) Data Version 2.2 (Centre for Land and Biological Resources Research, Agriculture and Agri-Food Canada, 1996): This dataset was published in Dec 1996 by Agriculture and Agri-Food Canada based on soil survey mapping done over the years and updated regularly. The data has a resolution of 1:1 million and covers entire Canada. The dataset is structured as below:
a. It divides whole of Canada into several ecodistricts which have been made available as a polygon shapefile where each polygon corresponds to an ecodistrict.
b. Each polygon is further divided into a number of soil texture components.
c. The percentage area covered by the components in an ecodistrict are given in tables but their locations are not known i.e. there is no shapefile defining coverage of each component inside of each ecodistrict.
d. Each of these components have been allocated a soil type: CL - clay loam, KCL - clay loam with >35% coarse fragments, CY - clay, LM - loam, KLM - loam with >35% coarse fragments, SD - sand, KSD - sand with >35% coarse fragments, SL - sandy loam, KSL - sandy loam with >35% coarse fragments, KSP - cobbly sand, O - organic, # - not applicable (rock, ice, urban).
Centre for Land and Biological Resources Research. 1996. Soil Landscapes of Canada, v.2.2, Research Branch, Agriculture and Agri-Food Canada. Ottawa.
B. STASTSGO2 USA data (United States Department of Agriculture, 2015): The dataset was published by National Cooperative Soil Survey in December 2015 succeeding the STATSGO dataset. The dataset was developed by using detailed soil survey maps, geology, vegetation, climate data and Landsat images. The dataset has a resolution of 1:250,000 in the continental U.S., Hawaii, Puerto Rico, and the Virgin Islands and 1:1,000,000 in Alaska. The structure of dataset is very similar to the SLC dataset with a few differences:
a. The dataset divides the entire area into map units similar to the ecodistricts in SLC dataset. Unlike the SLC dataset, a single map unit can correspond to several polygons in the shapefile.
b. Each map unit is further divided into a number of components. Like the Canadian dataset, the location of these components are not known (i.e. is it not fully distributed).
c. For each component, there are several vertical layers. % sand, % silt, % clay and % organic for each vertical horizon is present.
Soil Survey Staff, Natural Resources Conservation Service, United States Department of Agriculture (2015). U.S. General Soil Map (STATSGO2). Available online at
http://websoilsurvey.nrcs.usda.gov/. Accessed [06/21/2017].
2. Methods for processing the data: The processing of the data was done separately for the Canadian part of the river basins (entire Mackenzie River Basin and majority of Nelson-Churchill River Basin) and for the US part of the river basin (a small part of Nelson-Churchill River Basin covering parts of Minnesota, Montana, North Dakota, South Dakota and Washington and Wisconsin).
A. Processing the SLC 2.2 dataset:
I. Conversion of soil texture classes to percentage soil texture: The soil type for each component was converted to % sand, % clay and % organic using Canadian System of Soil Classification (CSSC) manual, 3rd edition (Soil Classification Working Group, Agriculture and Agri-Food Canada, 1998).
a. For this, the minimum and maximum of percentage values of sand and clay were extracted from the soil triangle in the CSSC manual and the mean value was calculated by averaging max and min values for each of texture class.
b. Organic were given a minimum value of 5%. For organic soil, the organic percent was increased to 100% setting each of %Sand and %Clay to zero.
c. It was ensured that % organic + % sand + % clay stays less than or equal to 100. To ensure this, the minimum percentage organic in sandy soils (SD and KSD) was changed from 5% to 2.5%.
d. Some classes were not present in the soil triangle like KSD (sand with greater than 35% coarse fragments) and KSL (sandy loam with greater than 35 % coarse fragments). Assuming coarse fragments as sand, the percentages were changed by increasing max and min values of sand by 35 (capped at 100 %) and finding the mean thereafter. These soil classes were however negligibly present in our areas of interest.
e. Several components had the value '#' suggesting rock, urban or ice cover. The ecodistricts where these components had an area greater than 50% was considered as '#' or NA for its entirety while for the ecodistricts where this component area was less than 50%, this component was ignored altogether in further computations. It should be noted that out of 683 components in Mackenzie River Basin, only 68 had NA or '#' as their texture class and most of these had areas less than 25%. Similar findings were observed in Nelson-Churchill River Basin also.
Soil Classification Working Group. 1998. The Canadian System of Soil Classification, 3rd ed. Agriculture and Agri-Food Canada Publication 1646, 187 pp.
II. Aggregating the soil percentage to 0.125 degree resolution: Following were the steps followed in aggregation of soil texture percentage to create gridded dataset:
a. First, inland water was “differenced” from the ecodistrict polygon shapefile using QGIS as the area percent of each component in a polygon corresponds to the land area of the polygon rather than the total area.
b. Sand, clay and organic percentage was calcualted for each ecodistrict polygon by following formula:
Min % of Polygon = Minimum of (Min % of each component in the polygon)
Max % of Polygon = Maximum of (Max % of each component in the polygon)
Mean % of Polygon = Summation of (%area X %mean) for each component/100
c. Each 0.125 degree resolution grid cell was intersected with the shapefile layer of ecodistricts thus getting the polygons and their respective area inside each grid cell. The percentage values of sand, clay and organic matter were then calculated using similar formulae as above.
B. Processing STATSGO2 dataset:
I. Conversion of soil class to percentage soil texture: Since the dataset already has the percentage values of sand, silt, clay and organic, there was no need to map the soil classes to soil texture percentages as in SLC 2.2 dataset. However, a few adjustments were made to the percentage values so that the processed data from the two sources are coherent:
a. The STATSGO2 dataset has organic percent for each vertical layer of soil. For all the soil layers which have organic percent greater than 30%, the %sand, %clay, %silt are all zero suggesting a fully organic soil for that component. Therefore, the organic soils (having organic soil percent greater than 30%) were made a 100 % organic similar to SLC 2.2 dataset assumption.
b. %sand, %silt and %clay add up to 100 % while organic is an addition which makes %sand, %silt, %clay, %organic greater than 100. So, the percentage sand, silt, clay and organic were normalized to 100.
II. Aggregating the soil percentage to 0.125 degree resolution: The aggregation methodology followed for STATSGO2 dataset is very similar to the SLC 2.2 dataset. An extra step involved in processing this dataset was the calculation of soil texture data per component by averaging soil texture percentage of vertical horizons weighted over their depth. After getting soil texture per component, the rest of the steps were similar to the SLC 2.2 dataset to calculate the soil texture percent per grid cell.
The two datasets were then appended to create gridded soil texture data for Nelson-Churchill River Basin and Mackenzie River Basin.
3. Instrument- or software-specific information needed to interpret the data: The dataset is comprise of two file typess: the "*.csv" data table file containing the minimum, maximum and average percentage of sand, clay and organic for each 0.125 degree grid cell (identified by Latitude and Longitude in the table); and the shapefiles of gridded river basins, which can be viewed in any GIS software (e.g. QGIS). The data table file can be joined with the grid cells shapefile using "ID" variable to view spatial distribution of % values of different soil texture variables (%sand, %clay, and %organic), either the average, minimum or maximum can be selected.