VectorPreProcessing package

Submodules

VectorPreProcessing.Aggregation_vector module

Basin and River Network Aggregation

The merit_basin_aggregation function aggregates basin and river network shapefiles. This function uses parameters like minimum sub-area, slope, and river length to iteratively aggregate small sub-basins.

Parameters:

input_basin: Basin GeoDataFrame with COMID identifiers.
input_river: River network GeoDataFrame with slope and length attributes.
min_subarea: Minimum area for sub-basins.
min_slope: Minimum allowable river slope.
min_length: Minimum river length.

This function iterates through sub-basins, merging those below the minimum sub-area threshold until no further aggregation is possible. It also computes and adjusts slopes, river lengths, and weighted slopes for simplified river networks.

Example usage:

>>> from VectorPreProcessing.Aggregation_vector import merit_basin_aggregation
>>> import geopandas as gpd
>>> import os

>>> # Define paths and parameters
>>> input_basin_path = "/home/fuaday/github-repos/Souris_Assiniboine_MAF/1-geofabric/SrsAboine-geofabric/sras_subbasins_MAF_noAgg.shp"
>>> input_river_path = "/home/fuaday/github-repos/Souris_Assiniboine_MAF/1-geofabric/SrsAboine-geofabric/sras_rivers_MAF_noAgg.shp"
>>> min_subarea = 50
>>> min_slope = 0.0000001
>>> min_length = 1.0
>>> output_basin_path = "/home/fuaday/github-repos/Souris_Assiniboine_MAF/1-geofabric/sras_subbasins_MAF_Agg.shp"
>>> output_river_path = "/home/fuaday/github-repos/Souris_Assiniboine_MAF/1-geofabric/sras_rivers_MAF_Agg.shp"

>>> # Load input data
>>> input_basin = gpd.read_file(input_basin_path)
>>> input_river = gpd.read_file(input_river_path)

>>> # Perform aggregation
>>> agg_basin, agg_river = merit_basin_aggregation(input_basin, input_river, min_subarea, min_slope, min_length)

>>> # Save aggregated data
>>> agg_basin.to_file(output_basin_path)
>>> agg_river.to_file(output_river_path)

VectorPreProcessing.Aggregation_vector.merit_basin_aggregation(input_basin, input_river, min_subarea, min_slope, min_length)[source]

VectorPreProcessing.NetCDFWriter module

Overview

The NetCDFWriter class is designed to generate model-ready NetCDF files (e.g., MESH_parameters.nc) containing soil and other geophysical subbasin data integrated from a vector shapefile and a NetCDF drainage database. This class is typically used in workflows that prepare input parameters for land surface models like MESH.

It supports flexible handling of both layer-dependent (e.g., soil properties per depth layer) and layer-independent (e.g., slope, contributing area) variables. The output conforms to CF conventions and includes appropriate coordinate reference metadata for spatial consistency.

Function Descriptions

class VectorPreProcessing.NetCDFWriter.NetCDFWriter(nc_filename, shapefile_path, input_ddb_path)[source]

Initializes the NetCDF writer with paths to the output file, input shapefile, and NetCDF drainage database.

Parameters:

nc_filename (str) – Path to the NetCDF output file to be created.
shapefile_path (str) – Path to the input shapefile containing the attributes.
input_ddb_path (str) – Path to the NetCDF drainage database used to extract coordinates.

VectorPreProcessing.NetCDFWriter.read_shapefile(): Reads the input shapefile and converts it into a GeoDataFrame. The file is automatically reprojected to EPSG:4326 (WGS 84).

VectorPreProcessing.NetCDFWriter.set_coordinates(): Extracts lon, lat, and subbasin values from the NetCDF drainage database file. These values serve as the spatial base for NetCDF output.

VectorPreProcessing.NetCDFWriter.set_num_soil_layers(num_layers)

Sets the number of vertical soil layers that will be written into the NetCDF file.

Parameters:: num_layers (int) – The number of soil layers (e.g., 4 for a 4-layer soil profile).

VectorPreProcessing.NetCDFWriter.add_var_attrs(var, attrs)

Adds metadata attributes to a NetCDF variable, such as units, standard name, and axis designation.

Parameters:

var (netCDF4.Variable) – The NetCDF variable to modify.
attrs (dict) – Dictionary of attributes to apply.

VectorPreProcessing.NetCDFWriter.write_netcdf(properties, variable_info)

Writes the actual NetCDF file using the specified properties and metadata.

Parameters:

properties (dict) – Dictionary specifying which variables are layer-dependent vs. layer-independent.
variable_info (dict) – Dictionary mapping each variable to a tuple of (NetCDF name, data type, unit).

Example Usage

from VectorPreProcessing.NetCDFWriter import NetCDFWriter

# Paths for NetCDFWriter
nc_filename = 'MESH_parameters3.nc'
output_shapefile = 'merged_soil_data_shapefile4.shp'
input_ddb = '/scratch/fuaday/sras-agg-model/MESH-sras-agg/MESH_drainage_database.nc'
mesh_intervals = [(0, 0.1), (0.1, 0.35), (0.35, 1.2), (1.2, 4.1)]

# Initialize NetCDFWriter with the necessary paths
nc_writer = NetCDFWriter(
    nc_filename=nc_filename,
    shapefile_path=output_shapefile,
    input_ddb_path=input_ddb
)

# Step 1: Read the attribute shapefile and extract spatial coordinates from the drainage database
nc_writer.read_shapefile()
nc_writer.set_coordinates()

# Step 2: Specify the number of vertical soil layers to include in the output
nc_writer.set_num_soil_layers(num_layers=len(mesh_intervals))

# Step 3: Define which variables are layer-dependent vs. layer-independent
properties = {
    'layer_dependent': ['CLAY', 'SAND', 'OC'],  # Varies by soil layer and subbasin
    'layer_independent': ['ncontr', 'meanBDRICM', 'meanBDTICM', 'xslp', 'dd']  # Varies only by subbasin
}

# Step 4: Provide metadata for each variable to be written to NetCDF
variable_info = {
    'CLAY': ('CLAY', 'f4', 'Percentage'),
    'SAND': ('SAND', 'f4', 'Percentage'),
    'OC': ('ORGM', 'f4', 'Percentage'),
    'ncontr': ('IWF', 'i4', '1'),
    'meanBDRICM': ('BDRICM', 'f4', 'Meters'),
    'meanBDTICM': ('BDTICM', 'f4', 'Meters'),
    'xslp': ('xslp', 'f4', 'degree'),
    'dd': ('dd', 'f4', 'm_per_km2')
}

# Step 5: Write the final NetCDF file with structured metadata and spatial consistency
nc_writer.write_netcdf(properties=properties, variable_info=variable_info)

class VectorPreProcessing.NetCDFWriter.NetCDFWriter(nc_filename, shapefile_path, input_ddb_path)[source]

Bases: object

A class to generate NetCDF files with soil data merged from shapefiles and NetCDF drainage databases.

Attributes:

nc_filenamestr: Path to the output NetCDF file.
shapefile_pathstr: Path to the input shapefile.
input_ddb_pathstr: Path to the NetCDF drainage database.
merged_gdfgeopandas.GeoDataFrame: GeoDataFrame containing merged shapefile data.
lonlist: List of longitude values from the NetCDF drainage database.
latlist: List of latitude values from the NetCDF drainage database.
segidlist: List of subbasin identifiers.
num_soil_lyrsint: Number of soil layers in the dataset.

add_var_attrs(var, attrs)[source]

Adds attributes to a NetCDF variable.

Parameters:

varnetCDF4.Variable: The NetCDF variable to which attributes will be added.
attrsdict: A dictionary of attribute names and values.

read_shapefile()[source]

Reads the shapefile and converts it into a GeoDataFrame.

This function reads the shapefile, reprojects it to EPSG:4326 (WGS 84), and stores the result in the merged GeoDataFrame.

set_coordinates()[source]: Extracts longitude, latitude, and subbasin IDs from the NetCDF drainage database.

set_num_soil_layers(num_layers)[source]

Sets the number of soil layers for the NetCDF file.

Parameters:

num_layersint: Number of soil layers to be included in the NetCDF file.

write_netcdf(properties, variable_info)[source]

Creates a NetCDF file with processed soil data.

Parameters:

propertiesdict: A dictionary with two keys: - ‘layer_dependent’: List of property names tied to the number of soil layers. - ‘layer_independent’: List of property names dependent only on the subbasin.
variable_infodict: A dictionary mapping property names to tuples containing: (new variable name in NetCDF, data type code, unit).

VectorPreProcessing.convert_ddbnetcdf module

NetCDF to CSV/Shapefile Converter

This script converts a NetCDF file containing hydrological data into either a CSV file or a Shapefile.

This script contains a function convert_netcdf that converts a NetCDF file into either a CSV file or a Shapefile.

Example Usage:

>>> from convert_ddbnetcdf import convert_netcdf
>>> convert_netcdf(netcdf_file='input.nc', output_file='output.csv', conversion_type='csv')
>>> convert_netcdf(netcdf_file='input.nc', output_file='output.shp', conversion_type='shapefile')

Functions:

convert_netcdf: Converts a NetCDF file into either a CSV or a Shapefile.

Parameters:

netcdf_file (str): Path to the input NetCDF file.
output_file (str): Path to the output file (CSV or Shapefile).
conversion_type (str): Conversion type, either “csv” or “shapefile”.

VectorPreProcessing.convert_ddbnetcdf.convert_netcdf(netcdf_file, output_file, conversion_type='csv')[source]

Converts a NetCDF file to either a CSV or a Shapefile.

Parameters:

netcdf_filestr: Path to the input NetCDF file.
output_filestr: Path to the output file (CSV or Shapefile).
conversion_typestr, optional: Type of conversion (“csv” or “shapefile”), default is “csv”.

Returns:

None

VectorPreProcessing.gdf_edit module

gdf_edit.py

This module provides functions to flag non-contributing areas (NCAs) or lakes and reservoirs in GeoDataFrames based on intersection thresholds, with customizable options for column names, default values, and initialization values.

Example Usage

1. Using Shapefiles: >>> from VectorPreProcessing.gdf_edit import flag_ncaalg_from_files >>> flagged_gdf = flag_ncaalg_from_files( … ‘path/to/shapefile1.shp’, … ‘path/to/shapefile2.shp’, … threshold=0.1, … output_path=’output.shp’ … )

>>> flagged_gdf = flag_ncaalg_from_files(
...     'path/to/shapefile1.shp',
...     'path/to/shapefile2.shp',
...     threshold=0.1,
...     output_path='output.shp',
...     ncontr_col="custom_flag_column",   # Custom column in gdf1 to store flags
...     value_column="NON_ID",             # Column in gdf2 with values to assign
...     initial_value=0,                   # Initial value for gdf1's flag column
...     default_value=5                    # Default value if no value_column specified
... )

2. Using GeoDataFrames Directly: >>> from VectorPreProcessing.gdf_edit import flag_ncaalg >>> import geopandas as gpd >>> gdf1 = gpd.read_file(‘path/to/shapefile1.shp’) >>> gdf2 = gpd.read_file(‘path/to/shapefile2.shp’) >>> flagged_gdf = flag_ncaalg(gdf1, gdf2, threshold=0.1)

>>> flagged_gdf = flag_ncaalg(
...     gdf1,
...     gdf2,
...     threshold=0.1,
...     ncontr_col="custom_flag_column",   # Custom column in gdf1 to store flags
...     value_column="NON_ID",             # Column in gdf2 with values to assign
...     initial_value=0,                   # Initial value for gdf1's flag column
...     default_value=5                    # Default value if no value_column specified
... )

VectorPreProcessing.gdf_edit.flag_ncaalg(gdf1: GeoDataFrame, gdf2: GeoDataFrame, threshold: float = 0.1, output_path: str = None, ncontr_col: str = 'ncontr', value_column: str = None, initial_value=None, default_value=2) → GeoDataFrame[source]

Flag intersections and optionally assign values from gdf2.

This function identifies intersections between polygons in gdf1 and gdf2 that meet a specified threshold. If an intersection is found, a constant value (default is 2) or a value from a specified column in gdf2 (if provided) is assigned to the corresponding row in gdf1. If multiple intersections exist, the first match is used.

Parameters:

gdf1 (gpd.GeoDataFrame) – The primary GeoDataFrame.
gdf2 (gpd.GeoDataFrame) – The secondary GeoDataFrame with values to assign.
threshold (float, optional) – The threshold for considering an intersection significant (default is 0.1 or 10%).
output_path (str, optional) – Path where the modified gdf1 should be saved. If None, the file is not saved.
ncontr_col (str, optional) – The name of the column to store assigned values in gdf1.
value_column (str, optional) – The name of the column in gdf2 with values to assign to gdf1. If None, a constant value (default_value) is used.
initial_value (optional) – The initial value to assign to the ncontr_col column in gdf1 before processing intersections.
default_value (optional) – The default value to assign to the ncontr_col column if value_column is None (default is 2).

Returns:

The modified gdf1 with assigned values based on intersections.

Return type:

gpd.GeoDataFrame

VectorPreProcessing.gdf_edit.flag_ncaalg_from_files(shapefile1: str, shapefile2: str, threshold: float = 0.1, output_path: str = None, ncontr_col: str = 'ncontr', value_column: str = None, initial_value=None, default_value=2) → GeoDataFrame[source]

Read two shapefiles, set their CRS to EPSG:4326, and apply the flag_ncaalg function.

Parameters:

shapefile1 (str) – Path to the first shapefile.
shapefile2 (str) – Path to the second shapefile.
threshold (float, optional) – The threshold for considering an intersection significant, as a fraction of the first GeoDataFrame’s polygon area (default is 0.1 for 10%).
output_path (str, optional) – Path where the modified first GeoDataFrame should be saved. If None, the file is not saved.
ncontr_col (str, optional) – The name of the column to flag intersections in gdf1.
value_column (str, optional) – The name of the column in gdf2 with values to assign to gdf1.
initial_value (optional) – The initial value to assign to the ncontr_col column in gdf1 before processing intersections.
default_value (optional) – The default value to assign to the ncontr_col column if value_column is None (default is 2).

Returns:

The modified GeoDataFrame of the first GeoDataFrame with the specified column added.

Return type:

gpd.GeoDataFrame

VectorPreProcessing.gsde_soil module

Overview

The GSDESoil class provides a pipeline to process, clean, interpolate, and integrate soil property data into hydrological model inputs, such as those required by MESH. It is designed to handle GSDE-derived statistics stored in CSV files, convert them to model-ready format using weighted depth-averaging, and merge them into a basin shapefile based on unique identifiers (e.g., COMID).

Function Descriptions

class VectorPreProcessing.gsde_soil.GSDESoil(directory, input_basin, output_shapefile)[source]

Initializes the processor with input/output paths.

Parameters:

directory (str) – Directory containing input CSV files.
input_basin (str) – Path to the input shapefile with a COMID field.
output_shapefile (str) – Path where the merged shapefile will be saved.

VectorPreProcessing.gsde_soil.load_data(file_names, search_replace_dict=None, suffix_dict=None)

Loads and merges soil data from multiple CSV files. Columns can be renamed using search/replace rules and optionally suffixed to avoid name collisions.

Parameters:

file_names (list) – List of CSV file names.
search_replace_dict (dict, optional) – Dictionary with filename as key and (search_list, replace_list) as value.
suffix_dict (dict, optional) – Dictionary with filename as key and string suffix as value.

VectorPreProcessing.gsde_soil.fill_and_clean_data(exclude_cols=['COMID'], exclude_patterns=['OC', 'BD', 'BDRICM', 'BDTICM'], max_val=100)

Cleans soil data by removing outliers, rescaling BDRICM/BDTICM, and filling missing values via forward/backward fill.

Parameters:

exclude_cols (list) – Columns to ignore during cleaning.
exclude_patterns (list) – Substrings used to skip certain columns during range checks.
max_val (float) – Maximum threshold for valid data (values above this become NaN).

VectorPreProcessing.gsde_soil.calculate_weights(gsde_intervals, mesh_intervals)

Computes weights to map GSDE soil depth intervals to model mesh layers.

Parameters:

gsde_intervals (list of tuple) – List of tuples representing GSDE depth layers (e.g., [(0, 0.045), …]).
mesh_intervals (list of tuple) – List of tuples representing target model layer depths.

VectorPreProcessing.gsde_soil.calculate_mesh_values(column_names)

Applies weights to calculate layer-averaged MESH-compatible soil properties.

Parameters:: column_names (dict) – Dictionary mapping each property (e.g., “CLAY”, “OC”) to its source columns.

VectorPreProcessing.gsde_soil.merge_and_save_shapefile(): Merges the processed soil data with the input basin shapefile using COMID and saves the final output.

VectorPreProcessing.gsde_soil.set_coordinates(input_ddb)

Optionally reads spatial reference (lon, lat, subbasin) from a NetCDF drainage database.

Parameters:: input_ddb (str) – Path to the NetCDF drainage database file.

Example Usage

from gsde_soil import GSDESoil

# Step 1: Initialize the soil processor with paths to your directories and files
gsde = GSDESoil(
    directory='/home/fuaday/scratch/sras-agg-model/gistool-outputs',
    input_basin='/home/fuaday/scratch/sras-agg-model/geofabric-outputs/sras_subbasins_MAF_Agg2.shp',
    output_shapefile='merged_soil_data_shapefile4.shp'
)

# Step 2: Define the list of input CSV files
file_names = [
    'sras_model_stats_CLAY1.csv', 'sras_model_stats_CLAY2.csv',
    'sras_model_stats_SAND1.csv', 'sras_model_stats_SAND2.csv',
    'sras_model_stats_OC1.csv',   'sras_model_stats_OC2.csv',
    'sras_model_stats_BDRICM_M_250m_ll.csv',
    'sras_model_stats_BDTICM_M_250m_ll.csv',
    'sras_model_slope_degree.csv', 'sras_model_riv_0p1_2.csv'
]

# Step 3: Prepare renaming instructions for each file (search/replace patterns)
search_replace_dict = {
    'sras_model_stats_CLAY1.csv': (['.CLAY_depth=4.5', '.CLAY_depth=9.1000004', '.CLAY_depth=16.6', '.CLAY_depth=28.9'], ['CLAY1', 'CLAY2', 'CLAY3', 'CLAY4']),
    'sras_model_stats_CLAY2.csv': (['.CLAY_depth=49.299999', '.CLAY_depth=82.900002', '.CLAY_depth=138.3', '.CLAY_depth=229.60001'], ['CLAY5', 'CLAY6', 'CLAY7', 'CLAY8']),
    'sras_model_stats_SAND1.csv': (['.SAND_depth=4.5', '.SAND_depth=9.1000004', '.SAND_depth=16.6', '.SAND_depth=28.9'], ['SAND1', 'SAND2', 'SAND3', 'SAND4']),
    'sras_model_stats_SAND2.csv': (['.SAND_depth=49.299999', '.SAND_depth=82.900002', '.SAND_depth=138.3', '.SAND_depth=229.60001'], ['SAND5', 'SAND6', 'SAND7', 'SAND8']),
    'sras_model_stats_OC1.csv': (['.OC_depth=4.5', '.OC_depth=9.1000004', '.OC_depth=16.6', '.OC_depth=28.9'], ['OC1', 'OC2', 'OC3', 'OC4']),
    'sras_model_stats_OC2.csv': (['.OC_depth=49.299999', '.OC_depth=82.900002', '.OC_depth=138.3', '.OC_depth=229.60001'], ['OC5', 'OC6', 'OC7', 'OC8'])
}

# Step 4: Optionally specify suffixes to distinguish overlapping columns
suffix_dict = {
    'sras_model_stats_BDRICM_M_250m_ll.csv': 'BDRICM',
    'sras_model_stats_BDTICM_M_250m_ll.csv': 'BDTICM'
}

# Step 5: Load the data, applying renaming and suffixes
gsde.load_data(
    file_names=file_names,
    search_replace_dict=search_replace_dict,
    suffix_dict=suffix_dict
)

# Step 6: Clean and prepare the soil data (e.g., remove outliers, fill NaNs)
gsde.fill_and_clean_data()

# Step 7: Define soil profile intervals for GSDE and MESH (depths in meters)
gsde_intervals = [(0, 0.045), (0.045, 0.091), (0.091, 0.166), (0.166, 0.289),
                  (0.289, 0.493), (0.493, 0.829), (0.829, 1.383), (1.383, 2.296)]

mesh_intervals = [(0, 0.1), (0.1, 0.35), (0.35, 1.2), (1.2, 4.1)]

gsde.calculate_weights(gsde_intervals, mesh_intervals)

# Step 8: Compute mesh-compatible weighted averages of soil properties
column_names = {
    'CLAY': ['CLAY1', 'CLAY2', 'CLAY3', 'CLAY4', 'CLAY5', 'CLAY6', 'CLAY7', 'CLAY8'],
    'SAND': ['SAND1', 'SAND2', 'SAND3', 'SAND4', 'SAND5', 'SAND6', 'SAND7', 'SAND8'],
    'OC':   ['OC1', 'OC2', 'OC3', 'OC4', 'OC5', 'OC6', 'OC7', 'OC8']
}
gsde.calculate_mesh_values(column_names)

# Step 9: Merge processed soil data into the basin shapefile and save output
gsde.merge_and_save_shapefile()

class VectorPreProcessing.gsde_soil.GSDESoil(directory, input_basin, output_shapefile)[source]

Bases: object

A class to process, clean, interpolate, and merge soil property data from CSV files with a given basin shapefile, producing model-ready soil inputs.

directory

Directory containing input CSV files with soil properties.

Type:: str

input_basin

Path to the input basin shapefile with a ‘COMID’ identifier.

Type:: str

output_shapefile

Path to the output shapefile with processed soil attributes.

Type:: str

file_paths

List of full file paths for input CSVs.

Type:: list

gsde_df

Combined soil property table after processing.

Type:: pandas.DataFrame

merged_gdf

Final spatial dataset with soil properties merged to polygons.

Type:: geopandas.GeoDataFrame

weights_used

Weights used to interpolate soil layers into mesh layers.

Type:: list of list

mesh_intervals

Target depth intervals used for model input (e.g., MESH layers).

Type:: list of tuple

lon

Longitude values loaded from a NetCDF drainage database.

Type:: ndarray

lat

Latitude values loaded from a NetCDF drainage database.

Type:: ndarray

segid

Segment IDs (e.g., subbasin or COMID) from a drainage database.

Type:: ndarray

num_soil_lyrs

Number of output mesh layers.

Type:: int

calculate_mesh_values(column_names)[source]

Apply the calculated weights to soil property columns and generate epth-integrated values for each mesh layer.

Parameters:

column_namesdict: Dictionary mapping each property (e.g., “CLAY”, “OC”) to its source columns. Example: {‘CLAY’: [‘CLAY1’, ‘CLAY2’, …], ‘OC’: [‘OC1’, ‘OC2’, …]}

calculate_weights(gsde_intervals, mesh_intervals)[source]

Calculate the contribution weights from each GSDE layer to each model-defined mesh layer based on depth intervals.

Parameters:

gsde_intervalslist of tuple: List of tuples representing GSDE depth layers (e.g., [(0, 0.045), …]).
mesh_intervalslist of tuple: Target model layer depths (e.g., [(0, 0.1), (0.1, 0.35), …]).

fill_and_clean_data(exclude_cols=['COMID'], exclude_patterns=['OC', 'BD', 'BDRICM', 'BDTICM'], max_val=100)[source]

Clean the soil data by: - Replacing extreme values with NaN (based on max_val). - Normalizing and capping specific fields (e.g., BDRICM/BDTICM). - Filling missing values using forward and backward fill.

Parameters:

exclude_cols (list of str) – Columns to exclude from NaN replacement.
exclude_patterns (list of str) – Column name substrings to skip when applying value caps.
max_val (float) – Maximum valid threshold for general soil values.

static load_and_merge_files(file_list, search_replace_dict=None, suffix_dict=None, key='COMID')[source]

Load multiple CSV files and merge them on a common key. Renames and suffixes column names as needed during the loading process.

Parameters:

file_list (list of str) – List of full CSV file paths.
search_replace_dict (dict, optional) – Column renaming instructions for each file.
suffix_dict (dict, optional) – Suffix strings to append to column names by file.
key (str) – Primary key used to merge all data files (default is ‘COMID’).

Returns:

Merged DataFrame containing columns from all input files.

Return type:

pandas.DataFrame

load_data(file_names, search_replace_dict=None, suffix_dict=None)[source]

Load and merge multiple CSV files into a single DataFrame. Optionally apply search-and-replace logic and suffixes to column names to ensure compatibility.

Parameters:

file_names (list of str) – List of filenames to load from the given directory.
search_replace_dict (dict, optional) – Dictionary where keys are filenames and values are (search_list, replace_list) tuples used to rename columns (e.g., depth labels to CLAY1, CLAY2, etc.).
suffix_dict (dict, optional) – Dictionary where keys are filenames and values are suffix strings to append to column names (useful for distinguishing overlapping variables).

merge_and_save_shapefile()[source]: Merge the processed soil data (via COMID) into the input shapefile and save the result. Output is a GeoDataFrame with mesh values appended as new attributes.

set_coordinates(input_ddb)[source]

Set longitude and latitude values from a NetCDF drainage database.

Parameters:

input_ddbstr: Path to the NetCDF drainage database file.

VectorPreProcessing.remap_climate_to_ddb module

VectorPreProcessing.remap_climate_to_ddb.process_file(file_path, segid, lon, lat, output_directory)[source]

Process a single NetCDF file and remap its data to the drainage database (DDB) format.

Parameters:

file_path (str) – Path to the input NetCDF file.
segid (numpy.ndarray) – Array of subbasin IDs from the drainage database.
lon (numpy.ndarray) – Array of longitude values from the drainage database.
lat (numpy.ndarray) – Array of latitude values from the drainage database.
output_directory (str) – Path to the directory where the processed file will be saved.

Example

>>> from remap_climate_to_ddb import process_file
>>> process_file(
...     file_path="path/to/input.nc",
...     segid=subbasin_ids,
...     lon=longitudes,
...     lat=latitudes,
...     output_directory="path/to/output"
... )

VectorPreProcessing.remap_climate_to_ddb.remap_rdrs_climate_data(input_directory, output_directory, input_basin, input_ddb, start_year, end_year)[source]

Remap RDRS climate data to a drainage database (DDB) format for a range of years.

Parameters:

input_directory (str) – Path to the directory containing input NetCDF files.
output_directory (str) – Path to the directory where processed files will be saved.
input_basin (str) – Path to the basin shapefile.
input_ddb (str) – Path to the drainage database NetCDF file.
start_year (int) – Start year of the data to process.
end_year (int) – End year of the data to process.

Example

>>> from remap_climate_to_ddb import remap_rdrs_climate_data
>>> remap_rdrs_climate_data(
...     input_directory="path/to/input",
...     output_directory="path/to/output",
...     input_basin="path/to/basin.shp",
...     input_ddb="path/to/ddb.nc",
...     start_year=2000,
...     end_year=2020
... )

VectorPreProcessing.remap_climate_to_ddb.remap_rdrs_climate_data_single_year(input_directory, output_directory, input_basin, input_ddb, year)[source]

Remap RDRS climate data to a drainage database (DDB) format for a single year.

Parameters:

input_directory (str) – Path to the directory containing input NetCDF files.
output_directory (str) – Path to the directory where processed files will be saved.
input_basin (str) – Path to the basin shapefile.
input_ddb (str) – Path to the drainage database NetCDF file.
year (int) – Year of the data to process.

Example

>>> from remap_climate_to_ddb import remap_rdrs_climate_data_single_year
>>> remap_rdrs_climate_data_single_year(
...     input_directory="path/to/input",
...     output_directory="path/to/output",
...     input_basin="path/to/basin.shp",
...     input_ddb="path/to/ddb.nc",
...     year=2020
... )

SLURM Script Usage

This SLURM script demonstrates how to use the functions remap_rdrs_climate_data and remap_rdrs_climate_data_single_year in an HPC environment.

Typical Usage

Run all sections in a single job:

sbatch Forcing_RDRS_processingMet3.sh --section1 --section2 --section3

Run each year in parallel using SLURM array jobs:

sbatch --array=0-38 Forcing_RDRS_processingMet3.sh --section1

SLURM Shell Script

#!/bin/bash
#SBATCH --account=rpp-kshook
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --mem-per-cpu=30G
#SBATCH --time=24:00:00
#SBATCH --job-name=vectForcRDRS
#SBATCH --mail-user=fuad.yassin@usask.ca
#SBATCH --mail-type=BEGIN,END,FAIL

: '
This script processes climate forcing data for the vector-based MESH RDRS dataset.
Supports array jobs and all-years processing.
'

module load cdo
module load nco

basin="sras"
start_year=1980
end_year=2018
input_forcing_easymore='/scratch/fuaday/sras-agg-model/easymore-outputs'
ddb_remapped_output_forcing='/scratch/fuaday/sras-agg-model/easymore-outputs2'
input_basin='/scratch/fuaday/sras-agg-model/geofabric-outputs/sras_subbasins_MAF_Agg.shp'
input_ddb='/scratch/fuaday/sras-agg-model/MESH-sras-agg/MESH_drainage_database.nc'
dir_merged_file="/scratch/fuaday/sras-agg-model/easymore-outputs-merged"
merged_file="${dir_merged_file}/${basin}_rdrs_${start_year}_${end_year}_v21_allVar.nc"

source $HOME/virtual-envs/scienv/bin/activate
module load StdEnv/2020
module load gcc/9.3.0
module restore scimods
module load cdo
module load nco

function run_section1_single_year {
    local year=$1
    python -c "
import sys
sys.path.append('$HOME/virtual-envs/scienv/lib/python3.8/site-packages')
from MESHpyPreProcessing.remap_rdrs_climate_data import remap_rdrs_climate_data_single_year
remap_rdrs_climate_data_single_year(
    input_directory='$input_forcing_easymore',
    output_directory='$ddb_remapped_output_forcing',
    input_basin='$input_basin',
    input_ddb='$input_ddb',
    year=$year
)
"
}

function run_section1_all_years {
    python -c "
import sys
sys.path.append('$HOME/virtual-envs/scienv/lib/python3.8/site-packages')
from MESHpyPreProcessing.remap_rdrs_climate_data import remap_rdrs_climate_data
remap_rdrs_climate_data(
    input_directory='$input_forcing_easymore',
    output_directory='$ddb_remapped_output_forcing',
    input_basin='$input_basin',
    input_ddb='$input_ddb',
    start_year=$start_year,
    end_year=$end_year
)
"
}

function run_section2 {
    mkdir -p "$dir_merged_file"
    merge_cmd="cdo mergetime"
    for (( year=$start_year; year<=$end_year; year++ )); do
        merge_cmd+=" ${ddb_remapped_output_forcing}/remapped_remapped_ncrb_model_${year}*.nc"
    done
    $merge_cmd "$merged_file"
}

function run_section3 {
    ncatted -O -a units,RDRS_v2.1_P_TT_09944,o,c,"K" "$merged_file"
    ncatted -O -a units,RDRS_v2.1_P_P0_SFC,o,c,"Pa" "$merged_file"
    ncatted -O -a units,RDRS_v2.1_P_UVC_09944,o,c,"m s-1" "$merged_file"
    ncatted -O -a units,RDRS_v2.1_A_PR0_SFC,o,c,"mm s-1" "$merged_file"

    temp_file="${dir_merged_file}/${basin}_temp.nc"
    cdo -z zip -b F32 -aexpr,'RDRS_v2.1_P_TT_09944=RDRS_v2.1_P_TT_09944 + 273.15; RDRS_v2.1_P_P0_SFC=RDRS_v2.1_P_P0_SFC * 100.0; RDRS_v2.1_P_UVC_09944=RDRS_v2.1_P_UVC_09944 * 0.514444; RDRS_v2.1_A_PR0_SFC=RDRS_v2.1_A_PR0_SFC / 3.6' "$merged_file" "$temp_file"
    mv "$temp_file" "$merged_file"
}

for arg in "$@"; do
    case $arg in
        --section1)
            if [ -z "$SLURM_ARRAY_TASK_ID" ]; then
                run_section1_all_years
            else
                year=$((start_year + SLURM_ARRAY_TASK_ID))
                run_section1_single_year $year
            fi
            ;;
        --section2)
            run_section2
            ;;
        --section3)
            run_section3
            ;;
    esac
done

VectorPreProcessing package

Submodules

VectorPreProcessing.Aggregation_vector module

Basin and River Network Aggregation

Parameters:

Example usage:

VectorPreProcessing.NetCDFWriter module

Overview

Function Descriptions

Example Usage

Attributes:

Parameters:

Parameters:

Parameters:

VectorPreProcessing.convert_ddbnetcdf module

NetCDF to CSV/Shapefile Converter

Example Usage:

Functions:

Parameters:

Parameters:

Returns:

VectorPreProcessing.gdf_edit module

gdf_edit.py

Example Usage

VectorPreProcessing.gsde_soil module

Overview

Function Descriptions

Example Usage

Parameters:

Parameters:

Parameters:

VectorPreProcessing.remap_climate_to_ddb module

SLURM Script Usage

Typical Usage

SLURM Shell Script

Module contents