VectorPreProcessing package

Submodules

VectorPreProcessing.Aggregation_vector module

Basin and River Network Aggregation

The merit_basin_aggregation function aggregates basin and river network shapefiles. This function uses parameters like minimum sub-area, slope, and river length to iteratively aggregate small sub-basins.

Parameters:

  • input_basin: Basin GeoDataFrame with COMID identifiers.

  • input_river: River network GeoDataFrame with slope and length attributes.

  • min_subarea: Minimum area for sub-basins.

  • min_slope: Minimum allowable river slope.

  • min_length: Minimum river length.

This function iterates through sub-basins, merging those below the minimum sub-area threshold until no further aggregation is possible. It also computes and adjusts slopes, river lengths, and weighted slopes for simplified river networks.

Example usage:

>>> from VectorPreProcessing.Aggregation_vector import merit_basin_aggregation
>>> import geopandas as gpd
>>> import os
>>> # Define paths and parameters
>>> input_basin_path = "/home/fuaday/github-repos/Souris_Assiniboine_MAF/1-geofabric/SrsAboine-geofabric/sras_subbasins_MAF_noAgg.shp"
>>> input_river_path = "/home/fuaday/github-repos/Souris_Assiniboine_MAF/1-geofabric/SrsAboine-geofabric/sras_rivers_MAF_noAgg.shp"
>>> min_subarea = 50
>>> min_slope = 0.0000001
>>> min_length = 1.0
>>> output_basin_path = "/home/fuaday/github-repos/Souris_Assiniboine_MAF/1-geofabric/sras_subbasins_MAF_Agg.shp"
>>> output_river_path = "/home/fuaday/github-repos/Souris_Assiniboine_MAF/1-geofabric/sras_rivers_MAF_Agg.shp"
>>> # Load input data
>>> input_basin = gpd.read_file(input_basin_path)
>>> input_river = gpd.read_file(input_river_path)
>>> # Perform aggregation
>>> agg_basin, agg_river = merit_basin_aggregation(input_basin, input_river, min_subarea, min_slope, min_length)
>>> # Save aggregated data
>>> agg_basin.to_file(output_basin_path)
>>> agg_river.to_file(output_river_path)
VectorPreProcessing.Aggregation_vector.merit_basin_aggregation(input_basin, input_river, min_subarea, min_slope, min_length)[source]

VectorPreProcessing.NetCDFWriter module

Overview

The NetCDFWriter class is designed to generate model-ready NetCDF files (e.g., MESH_parameters.nc) containing soil and other geophysical subbasin data integrated from a vector shapefile and a NetCDF drainage database. This class is typically used in workflows that prepare input parameters for land surface models like MESH.

It supports flexible handling of both layer-dependent (e.g., soil properties per depth layer) and layer-independent (e.g., slope, contributing area) variables. The output conforms to CF conventions and includes appropriate coordinate reference metadata for spatial consistency.

Function Descriptions

class VectorPreProcessing.NetCDFWriter.NetCDFWriter(nc_filename, shapefile_path, input_ddb_path)[source]

Initializes the NetCDF writer with paths to the output file, input shapefile, and NetCDF drainage database.

Parameters:
  • nc_filename (str) – Path to the NetCDF output file to be created.

  • shapefile_path (str) – Path to the input shapefile containing the attributes.

  • input_ddb_path (str) – Path to the NetCDF drainage database used to extract coordinates.

VectorPreProcessing.NetCDFWriter.read_shapefile()

Reads the input shapefile and converts it into a GeoDataFrame. The file is automatically reprojected to EPSG:4326 (WGS 84).

VectorPreProcessing.NetCDFWriter.set_coordinates()

Extracts lon, lat, and subbasin values from the NetCDF drainage database file. These values serve as the spatial base for NetCDF output.

VectorPreProcessing.NetCDFWriter.set_num_soil_layers(num_layers)

Sets the number of vertical soil layers that will be written into the NetCDF file.

Parameters:

num_layers (int) – The number of soil layers (e.g., 4 for a 4-layer soil profile).

VectorPreProcessing.NetCDFWriter.add_var_attrs(var, attrs)

Adds metadata attributes to a NetCDF variable, such as units, standard name, and axis designation.

Parameters:
  • var (netCDF4.Variable) – The NetCDF variable to modify.

  • attrs (dict) – Dictionary of attributes to apply.

VectorPreProcessing.NetCDFWriter.write_netcdf(properties, variable_info)

Writes the actual NetCDF file using the specified properties and metadata.

Parameters:
  • properties (dict) – Dictionary specifying which variables are layer-dependent vs. layer-independent.

  • variable_info (dict) – Dictionary mapping each variable to a tuple of (NetCDF name, data type, unit).

Example Usage

from VectorPreProcessing.NetCDFWriter import NetCDFWriter

# Paths for NetCDFWriter
nc_filename = 'MESH_parameters3.nc'
output_shapefile = 'merged_soil_data_shapefile4.shp'
input_ddb = '/scratch/fuaday/sras-agg-model/MESH-sras-agg/MESH_drainage_database.nc'
mesh_intervals = [(0, 0.1), (0.1, 0.35), (0.35, 1.2), (1.2, 4.1)]

# Initialize NetCDFWriter with the necessary paths
nc_writer = NetCDFWriter(
    nc_filename=nc_filename,
    shapefile_path=output_shapefile,
    input_ddb_path=input_ddb
)

# Step 1: Read the attribute shapefile and extract spatial coordinates from the drainage database
nc_writer.read_shapefile()
nc_writer.set_coordinates()

# Step 2: Specify the number of vertical soil layers to include in the output
nc_writer.set_num_soil_layers(num_layers=len(mesh_intervals))

# Step 3: Define which variables are layer-dependent vs. layer-independent
properties = {
    'layer_dependent': ['CLAY', 'SAND', 'OC'],  # Varies by soil layer and subbasin
    'layer_independent': ['ncontr', 'meanBDRICM', 'meanBDTICM', 'xslp', 'dd']  # Varies only by subbasin
}

# Step 4: Provide metadata for each variable to be written to NetCDF
variable_info = {
    'CLAY': ('CLAY', 'f4', 'Percentage'),
    'SAND': ('SAND', 'f4', 'Percentage'),
    'OC': ('ORGM', 'f4', 'Percentage'),
    'ncontr': ('IWF', 'i4', '1'),
    'meanBDRICM': ('BDRICM', 'f4', 'Meters'),
    'meanBDTICM': ('BDTICM', 'f4', 'Meters'),
    'xslp': ('xslp', 'f4', 'degree'),
    'dd': ('dd', 'f4', 'm_per_km2')
}

# Step 5: Write the final NetCDF file with structured metadata and spatial consistency
nc_writer.write_netcdf(properties=properties, variable_info=variable_info)
class VectorPreProcessing.NetCDFWriter.NetCDFWriter(nc_filename, shapefile_path, input_ddb_path)[source]

Bases: object

A class to generate NetCDF files with soil data merged from shapefiles and NetCDF drainage databases.

Attributes:

nc_filenamestr

Path to the output NetCDF file.

shapefile_pathstr

Path to the input shapefile.

input_ddb_pathstr

Path to the NetCDF drainage database.

merged_gdfgeopandas.GeoDataFrame

GeoDataFrame containing merged shapefile data.

lonlist

List of longitude values from the NetCDF drainage database.

latlist

List of latitude values from the NetCDF drainage database.

segidlist

List of subbasin identifiers.

num_soil_lyrsint

Number of soil layers in the dataset.

add_var_attrs(var, attrs)[source]

Adds attributes to a NetCDF variable.

Parameters:

varnetCDF4.Variable

The NetCDF variable to which attributes will be added.

attrsdict

A dictionary of attribute names and values.

read_shapefile()[source]

Reads the shapefile and converts it into a GeoDataFrame.

This function reads the shapefile, reprojects it to EPSG:4326 (WGS 84), and stores the result in the merged GeoDataFrame.

set_coordinates()[source]

Extracts longitude, latitude, and subbasin IDs from the NetCDF drainage database.

set_num_soil_layers(num_layers)[source]

Sets the number of soil layers for the NetCDF file.

Parameters:

num_layersint

Number of soil layers to be included in the NetCDF file.

write_netcdf(properties, variable_info)[source]

Creates a NetCDF file with processed soil data.

Parameters:

propertiesdict

A dictionary with two keys: - ‘layer_dependent’: List of property names tied to the number of soil layers. - ‘layer_independent’: List of property names dependent only on the subbasin.

variable_infodict

A dictionary mapping property names to tuples containing: (new variable name in NetCDF, data type code, unit).

VectorPreProcessing.convert_ddbnetcdf module

NetCDF to CSV/Shapefile Converter

This script converts a NetCDF file containing hydrological data into either a CSV file or a Shapefile.

This script contains a function convert_netcdf that converts a NetCDF file into either a CSV file or a Shapefile.

Example Usage:

>>> from convert_ddbnetcdf import convert_netcdf
>>> convert_netcdf(netcdf_file='input.nc', output_file='output.csv', conversion_type='csv')
>>> convert_netcdf(netcdf_file='input.nc', output_file='output.shp', conversion_type='shapefile')

Functions:

  • convert_netcdf: Converts a NetCDF file into either a CSV or a Shapefile.

Parameters:

  • netcdf_file (str): Path to the input NetCDF file.

  • output_file (str): Path to the output file (CSV or Shapefile).

  • conversion_type (str): Conversion type, either “csv” or “shapefile”.

VectorPreProcessing.convert_ddbnetcdf.convert_netcdf(netcdf_file, output_file, conversion_type='csv')[source]

Converts a NetCDF file to either a CSV or a Shapefile.

Parameters:

netcdf_filestr

Path to the input NetCDF file.

output_filestr

Path to the output file (CSV or Shapefile).

conversion_typestr, optional

Type of conversion (“csv” or “shapefile”), default is “csv”.

Returns:

None

VectorPreProcessing.gdf_edit module

gdf_edit.py

This module provides functions to flag non-contributing areas (NCAs) or lakes and reservoirs in GeoDataFrames based on intersection thresholds, with customizable options for column names, default values, and initialization values.

Example Usage

1. Using Shapefiles: >>> from VectorPreProcessing.gdf_edit import flag_ncaalg_from_files >>> flagged_gdf = flag_ncaalg_from_files( … ‘path/to/shapefile1.shp’, … ‘path/to/shapefile2.shp’, … threshold=0.1, … output_path=’output.shp’ … )

>>> flagged_gdf = flag_ncaalg_from_files(
...     'path/to/shapefile1.shp',
...     'path/to/shapefile2.shp',
...     threshold=0.1,
...     output_path='output.shp',
...     ncontr_col="custom_flag_column",   # Custom column in gdf1 to store flags
...     value_column="NON_ID",             # Column in gdf2 with values to assign
...     initial_value=0,                   # Initial value for gdf1's flag column
...     default_value=5                    # Default value if no value_column specified
... )

2. Using GeoDataFrames Directly: >>> from VectorPreProcessing.gdf_edit import flag_ncaalg >>> import geopandas as gpd >>> gdf1 = gpd.read_file(‘path/to/shapefile1.shp’) >>> gdf2 = gpd.read_file(‘path/to/shapefile2.shp’) >>> flagged_gdf = flag_ncaalg(gdf1, gdf2, threshold=0.1)

>>> flagged_gdf = flag_ncaalg(
...     gdf1,
...     gdf2,
...     threshold=0.1,
...     ncontr_col="custom_flag_column",   # Custom column in gdf1 to store flags
...     value_column="NON_ID",             # Column in gdf2 with values to assign
...     initial_value=0,                   # Initial value for gdf1's flag column
...     default_value=5                    # Default value if no value_column specified
... )
VectorPreProcessing.gdf_edit.flag_ncaalg(gdf1: GeoDataFrame, gdf2: GeoDataFrame, threshold: float = 0.1, output_path: str = None, ncontr_col: str = 'ncontr', value_column: str = None, initial_value=None, default_value=2) GeoDataFrame[source]

Flag intersections and optionally assign values from gdf2.

This function identifies intersections between polygons in gdf1 and gdf2 that meet a specified threshold. If an intersection is found, a constant value (default is 2) or a value from a specified column in gdf2 (if provided) is assigned to the corresponding row in gdf1. If multiple intersections exist, the first match is used.

Parameters:
  • gdf1 (gpd.GeoDataFrame) – The primary GeoDataFrame.

  • gdf2 (gpd.GeoDataFrame) – The secondary GeoDataFrame with values to assign.

  • threshold (float, optional) – The threshold for considering an intersection significant (default is 0.1 or 10%).

  • output_path (str, optional) – Path where the modified gdf1 should be saved. If None, the file is not saved.

  • ncontr_col (str, optional) – The name of the column to store assigned values in gdf1.

  • value_column (str, optional) – The name of the column in gdf2 with values to assign to gdf1. If None, a constant value (default_value) is used.

  • initial_value (optional) – The initial value to assign to the ncontr_col column in gdf1 before processing intersections.

  • default_value (optional) – The default value to assign to the ncontr_col column if value_column is None (default is 2).

Returns:

The modified gdf1 with assigned values based on intersections.

Return type:

gpd.GeoDataFrame

VectorPreProcessing.gdf_edit.flag_ncaalg_from_files(shapefile1: str, shapefile2: str, threshold: float = 0.1, output_path: str = None, ncontr_col: str = 'ncontr', value_column: str = None, initial_value=None, default_value=2) GeoDataFrame[source]

Read two shapefiles, set their CRS to EPSG:4326, and apply the flag_ncaalg function.

Parameters:
  • shapefile1 (str) – Path to the first shapefile.

  • shapefile2 (str) – Path to the second shapefile.

  • threshold (float, optional) – The threshold for considering an intersection significant, as a fraction of the first GeoDataFrame’s polygon area (default is 0.1 for 10%).

  • output_path (str, optional) – Path where the modified first GeoDataFrame should be saved. If None, the file is not saved.

  • ncontr_col (str, optional) – The name of the column to flag intersections in gdf1.

  • value_column (str, optional) – The name of the column in gdf2 with values to assign to gdf1.

  • initial_value (optional) – The initial value to assign to the ncontr_col column in gdf1 before processing intersections.

  • default_value (optional) – The default value to assign to the ncontr_col column if value_column is None (default is 2).

Returns:

The modified GeoDataFrame of the first GeoDataFrame with the specified column added.

Return type:

gpd.GeoDataFrame

VectorPreProcessing.gsde_soil module

Overview

The GSDESoil class provides a pipeline to process, clean, interpolate, and integrate soil property data into hydrological model inputs, such as those required by MESH. It is designed to handle GSDE-derived statistics stored in CSV files, convert them to model-ready format using weighted depth-averaging, and merge them into a basin shapefile based on unique identifiers (e.g., COMID).

Function Descriptions

class VectorPreProcessing.gsde_soil.GSDESoil(directory, input_basin, output_shapefile)[source]

Initializes the processor with input/output paths.

Parameters:
  • directory (str) – Directory containing input CSV files.

  • input_basin (str) – Path to the input shapefile with a COMID field.

  • output_shapefile (str) – Path where the merged shapefile will be saved.

VectorPreProcessing.gsde_soil.load_data(file_names, search_replace_dict=None, suffix_dict=None)

Loads and merges soil data from multiple CSV files. Columns can be renamed using search/replace rules and optionally suffixed to avoid name collisions.

Parameters:
  • file_names (list) – List of CSV file names.

  • search_replace_dict (dict, optional) – Dictionary with filename as key and (search_list, replace_list) as value.

  • suffix_dict (dict, optional) – Dictionary with filename as key and string suffix as value.

VectorPreProcessing.gsde_soil.fill_and_clean_data(exclude_cols=['COMID'], exclude_patterns=['OC', 'BD', 'BDRICM', 'BDTICM'], max_val=100)

Cleans soil data by removing outliers, rescaling BDRICM/BDTICM, and filling missing values via forward/backward fill.

Parameters:
  • exclude_cols (list) – Columns to ignore during cleaning.

  • exclude_patterns (list) – Substrings used to skip certain columns during range checks.

  • max_val (float) – Maximum threshold for valid data (values above this become NaN).

VectorPreProcessing.gsde_soil.calculate_weights(gsde_intervals, mesh_intervals)

Computes weights to map GSDE soil depth intervals to model mesh layers.

Parameters:
  • gsde_intervals (list of tuple) – List of tuples representing GSDE depth layers (e.g., [(0, 0.045), …]).

  • mesh_intervals (list of tuple) – List of tuples representing target model layer depths.

VectorPreProcessing.gsde_soil.calculate_mesh_values(column_names)

Applies weights to calculate layer-averaged MESH-compatible soil properties.

Parameters:

column_names (dict) – Dictionary mapping each property (e.g., “CLAY”, “OC”) to its source columns.

VectorPreProcessing.gsde_soil.merge_and_save_shapefile()

Merges the processed soil data with the input basin shapefile using COMID and saves the final output.

VectorPreProcessing.gsde_soil.set_coordinates(input_ddb)

Optionally reads spatial reference (lon, lat, subbasin) from a NetCDF drainage database.

Parameters:

input_ddb (str) – Path to the NetCDF drainage database file.

Example Usage

from gsde_soil import GSDESoil

# Step 1: Initialize the soil processor with paths to your directories and files
gsde = GSDESoil(
    directory='/home/fuaday/scratch/sras-agg-model/gistool-outputs',
    input_basin='/home/fuaday/scratch/sras-agg-model/geofabric-outputs/sras_subbasins_MAF_Agg2.shp',
    output_shapefile='merged_soil_data_shapefile4.shp'
)

# Step 2: Define the list of input CSV files
file_names = [
    'sras_model_stats_CLAY1.csv', 'sras_model_stats_CLAY2.csv',
    'sras_model_stats_SAND1.csv', 'sras_model_stats_SAND2.csv',
    'sras_model_stats_OC1.csv',   'sras_model_stats_OC2.csv',
    'sras_model_stats_BDRICM_M_250m_ll.csv',
    'sras_model_stats_BDTICM_M_250m_ll.csv',
    'sras_model_slope_degree.csv', 'sras_model_riv_0p1_2.csv'
]

# Step 3: Prepare renaming instructions for each file (search/replace patterns)
search_replace_dict = {
    'sras_model_stats_CLAY1.csv': (['.CLAY_depth=4.5', '.CLAY_depth=9.1000004', '.CLAY_depth=16.6', '.CLAY_depth=28.9'], ['CLAY1', 'CLAY2', 'CLAY3', 'CLAY4']),
    'sras_model_stats_CLAY2.csv': (['.CLAY_depth=49.299999', '.CLAY_depth=82.900002', '.CLAY_depth=138.3', '.CLAY_depth=229.60001'], ['CLAY5', 'CLAY6', 'CLAY7', 'CLAY8']),
    'sras_model_stats_SAND1.csv': (['.SAND_depth=4.5', '.SAND_depth=9.1000004', '.SAND_depth=16.6', '.SAND_depth=28.9'], ['SAND1', 'SAND2', 'SAND3', 'SAND4']),
    'sras_model_stats_SAND2.csv': (['.SAND_depth=49.299999', '.SAND_depth=82.900002', '.SAND_depth=138.3', '.SAND_depth=229.60001'], ['SAND5', 'SAND6', 'SAND7', 'SAND8']),
    'sras_model_stats_OC1.csv': (['.OC_depth=4.5', '.OC_depth=9.1000004', '.OC_depth=16.6', '.OC_depth=28.9'], ['OC1', 'OC2', 'OC3', 'OC4']),
    'sras_model_stats_OC2.csv': (['.OC_depth=49.299999', '.OC_depth=82.900002', '.OC_depth=138.3', '.OC_depth=229.60001'], ['OC5', 'OC6', 'OC7', 'OC8'])
}

# Step 4: Optionally specify suffixes to distinguish overlapping columns
suffix_dict = {
    'sras_model_stats_BDRICM_M_250m_ll.csv': 'BDRICM',
    'sras_model_stats_BDTICM_M_250m_ll.csv': 'BDTICM'
}

# Step 5: Load the data, applying renaming and suffixes
gsde.load_data(
    file_names=file_names,
    search_replace_dict=search_replace_dict,
    suffix_dict=suffix_dict
)

# Step 6: Clean and prepare the soil data (e.g., remove outliers, fill NaNs)
gsde.fill_and_clean_data()

# Step 7: Define soil profile intervals for GSDE and MESH (depths in meters)
gsde_intervals = [(0, 0.045), (0.045, 0.091), (0.091, 0.166), (0.166, 0.289),
                  (0.289, 0.493), (0.493, 0.829), (0.829, 1.383), (1.383, 2.296)]

mesh_intervals = [(0, 0.1), (0.1, 0.35), (0.35, 1.2), (1.2, 4.1)]

gsde.calculate_weights(gsde_intervals, mesh_intervals)

# Step 8: Compute mesh-compatible weighted averages of soil properties
column_names = {
    'CLAY': ['CLAY1', 'CLAY2', 'CLAY3', 'CLAY4', 'CLAY5', 'CLAY6', 'CLAY7', 'CLAY8'],
    'SAND': ['SAND1', 'SAND2', 'SAND3', 'SAND4', 'SAND5', 'SAND6', 'SAND7', 'SAND8'],
    'OC':   ['OC1', 'OC2', 'OC3', 'OC4', 'OC5', 'OC6', 'OC7', 'OC8']
}
gsde.calculate_mesh_values(column_names)

# Step 9: Merge processed soil data into the basin shapefile and save output
gsde.merge_and_save_shapefile()
class VectorPreProcessing.gsde_soil.GSDESoil(directory, input_basin, output_shapefile)[source]

Bases: object

A class to process, clean, interpolate, and merge soil property data from CSV files with a given basin shapefile, producing model-ready soil inputs.

directory

Directory containing input CSV files with soil properties.

Type:

str

input_basin

Path to the input basin shapefile with a ‘COMID’ identifier.

Type:

str

output_shapefile

Path to the output shapefile with processed soil attributes.

Type:

str

file_paths

List of full file paths for input CSVs.

Type:

list

gsde_df

Combined soil property table after processing.

Type:

pandas.DataFrame

merged_gdf

Final spatial dataset with soil properties merged to polygons.

Type:

geopandas.GeoDataFrame

weights_used

Weights used to interpolate soil layers into mesh layers.

Type:

list of list

mesh_intervals

Target depth intervals used for model input (e.g., MESH layers).

Type:

list of tuple

lon

Longitude values loaded from a NetCDF drainage database.

Type:

ndarray

lat

Latitude values loaded from a NetCDF drainage database.

Type:

ndarray

segid

Segment IDs (e.g., subbasin or COMID) from a drainage database.

Type:

ndarray

num_soil_lyrs

Number of output mesh layers.

Type:

int

calculate_mesh_values(column_names)[source]

Apply the calculated weights to soil property columns and generate epth-integrated values for each mesh layer.

Parameters:

column_namesdict

Dictionary mapping each property (e.g., “CLAY”, “OC”) to its source columns. Example: {‘CLAY’: [‘CLAY1’, ‘CLAY2’, …], ‘OC’: [‘OC1’, ‘OC2’, …]}

calculate_weights(gsde_intervals, mesh_intervals)[source]

Calculate the contribution weights from each GSDE layer to each model-defined mesh layer based on depth intervals.

Parameters:

gsde_intervalslist of tuple

List of tuples representing GSDE depth layers (e.g., [(0, 0.045), …]).

mesh_intervalslist of tuple

Target model layer depths (e.g., [(0, 0.1), (0.1, 0.35), …]).

fill_and_clean_data(exclude_cols=['COMID'], exclude_patterns=['OC', 'BD', 'BDRICM', 'BDTICM'], max_val=100)[source]

Clean the soil data by: - Replacing extreme values with NaN (based on max_val). - Normalizing and capping specific fields (e.g., BDRICM/BDTICM). - Filling missing values using forward and backward fill.

Parameters:
  • exclude_cols (list of str) – Columns to exclude from NaN replacement.

  • exclude_patterns (list of str) – Column name substrings to skip when applying value caps.

  • max_val (float) – Maximum valid threshold for general soil values.

static load_and_merge_files(file_list, search_replace_dict=None, suffix_dict=None, key='COMID')[source]

Load multiple CSV files and merge them on a common key. Renames and suffixes column names as needed during the loading process.

Parameters:
  • file_list (list of str) – List of full CSV file paths.

  • search_replace_dict (dict, optional) – Column renaming instructions for each file.

  • suffix_dict (dict, optional) – Suffix strings to append to column names by file.

  • key (str) – Primary key used to merge all data files (default is ‘COMID’).

Returns:

Merged DataFrame containing columns from all input files.

Return type:

pandas.DataFrame

load_data(file_names, search_replace_dict=None, suffix_dict=None)[source]

Load and merge multiple CSV files into a single DataFrame. Optionally apply search-and-replace logic and suffixes to column names to ensure compatibility.

Parameters:
  • file_names (list of str) – List of filenames to load from the given directory.

  • search_replace_dict (dict, optional) – Dictionary where keys are filenames and values are (search_list, replace_list) tuples used to rename columns (e.g., depth labels to CLAY1, CLAY2, etc.).

  • suffix_dict (dict, optional) – Dictionary where keys are filenames and values are suffix strings to append to column names (useful for distinguishing overlapping variables).

merge_and_save_shapefile()[source]

Merge the processed soil data (via COMID) into the input shapefile and save the result. Output is a GeoDataFrame with mesh values appended as new attributes.

set_coordinates(input_ddb)[source]

Set longitude and latitude values from a NetCDF drainage database.

Parameters:

input_ddbstr

Path to the NetCDF drainage database file.

VectorPreProcessing.remap_climate_to_ddb module

VectorPreProcessing.remap_climate_to_ddb.process_file(file_path, segid, lon, lat, output_directory)[source]

Process a single NetCDF file and remap its data to the drainage database (DDB) format.

Parameters:
  • file_path (str) – Path to the input NetCDF file.

  • segid (numpy.ndarray) – Array of subbasin IDs from the drainage database.

  • lon (numpy.ndarray) – Array of longitude values from the drainage database.

  • lat (numpy.ndarray) – Array of latitude values from the drainage database.

  • output_directory (str) – Path to the directory where the processed file will be saved.

Example

>>> from remap_climate_to_ddb import process_file
>>> process_file(
...     file_path="path/to/input.nc",
...     segid=subbasin_ids,
...     lon=longitudes,
...     lat=latitudes,
...     output_directory="path/to/output"
... )
VectorPreProcessing.remap_climate_to_ddb.remap_rdrs_climate_data(input_directory, output_directory, input_basin, input_ddb, start_year, end_year)[source]

Remap RDRS climate data to a drainage database (DDB) format for a range of years.

Parameters:
  • input_directory (str) – Path to the directory containing input NetCDF files.

  • output_directory (str) – Path to the directory where processed files will be saved.

  • input_basin (str) – Path to the basin shapefile.

  • input_ddb (str) – Path to the drainage database NetCDF file.

  • start_year (int) – Start year of the data to process.

  • end_year (int) – End year of the data to process.

Example

>>> from remap_climate_to_ddb import remap_rdrs_climate_data
>>> remap_rdrs_climate_data(
...     input_directory="path/to/input",
...     output_directory="path/to/output",
...     input_basin="path/to/basin.shp",
...     input_ddb="path/to/ddb.nc",
...     start_year=2000,
...     end_year=2020
... )
VectorPreProcessing.remap_climate_to_ddb.remap_rdrs_climate_data_single_year(input_directory, output_directory, input_basin, input_ddb, year)[source]

Remap RDRS climate data to a drainage database (DDB) format for a single year.

Parameters:
  • input_directory (str) – Path to the directory containing input NetCDF files.

  • output_directory (str) – Path to the directory where processed files will be saved.

  • input_basin (str) – Path to the basin shapefile.

  • input_ddb (str) – Path to the drainage database NetCDF file.

  • year (int) – Year of the data to process.

Example

>>> from remap_climate_to_ddb import remap_rdrs_climate_data_single_year
>>> remap_rdrs_climate_data_single_year(
...     input_directory="path/to/input",
...     output_directory="path/to/output",
...     input_basin="path/to/basin.shp",
...     input_ddb="path/to/ddb.nc",
...     year=2020
... )

SLURM Script Usage

This SLURM script demonstrates how to use the functions remap_rdrs_climate_data and remap_rdrs_climate_data_single_year in an HPC environment.

Typical Usage

Run all sections in a single job:

sbatch Forcing_RDRS_processingMet3.sh --section1 --section2 --section3

Run each year in parallel using SLURM array jobs:

sbatch --array=0-38 Forcing_RDRS_processingMet3.sh --section1

SLURM Shell Script

#!/bin/bash
#SBATCH --account=rpp-kshook
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --mem-per-cpu=30G
#SBATCH --time=24:00:00
#SBATCH --job-name=vectForcRDRS
#SBATCH --mail-user=fuad.yassin@usask.ca
#SBATCH --mail-type=BEGIN,END,FAIL

: '
This script processes climate forcing data for the vector-based MESH RDRS dataset.
Supports array jobs and all-years processing.
'

module load cdo
module load nco

basin="sras"
start_year=1980
end_year=2018
input_forcing_easymore='/scratch/fuaday/sras-agg-model/easymore-outputs'
ddb_remapped_output_forcing='/scratch/fuaday/sras-agg-model/easymore-outputs2'
input_basin='/scratch/fuaday/sras-agg-model/geofabric-outputs/sras_subbasins_MAF_Agg.shp'
input_ddb='/scratch/fuaday/sras-agg-model/MESH-sras-agg/MESH_drainage_database.nc'
dir_merged_file="/scratch/fuaday/sras-agg-model/easymore-outputs-merged"
merged_file="${dir_merged_file}/${basin}_rdrs_${start_year}_${end_year}_v21_allVar.nc"

source $HOME/virtual-envs/scienv/bin/activate
module load StdEnv/2020
module load gcc/9.3.0
module restore scimods
module load cdo
module load nco

function run_section1_single_year {
    local year=$1
    python -c "
import sys
sys.path.append('$HOME/virtual-envs/scienv/lib/python3.8/site-packages')
from MESHpyPreProcessing.remap_rdrs_climate_data import remap_rdrs_climate_data_single_year
remap_rdrs_climate_data_single_year(
    input_directory='$input_forcing_easymore',
    output_directory='$ddb_remapped_output_forcing',
    input_basin='$input_basin',
    input_ddb='$input_ddb',
    year=$year
)
"
}

function run_section1_all_years {
    python -c "
import sys
sys.path.append('$HOME/virtual-envs/scienv/lib/python3.8/site-packages')
from MESHpyPreProcessing.remap_rdrs_climate_data import remap_rdrs_climate_data
remap_rdrs_climate_data(
    input_directory='$input_forcing_easymore',
    output_directory='$ddb_remapped_output_forcing',
    input_basin='$input_basin',
    input_ddb='$input_ddb',
    start_year=$start_year,
    end_year=$end_year
)
"
}

function run_section2 {
    mkdir -p "$dir_merged_file"
    merge_cmd="cdo mergetime"
    for (( year=$start_year; year<=$end_year; year++ )); do
        merge_cmd+=" ${ddb_remapped_output_forcing}/remapped_remapped_ncrb_model_${year}*.nc"
    done
    $merge_cmd "$merged_file"
}

function run_section3 {
    ncatted -O -a units,RDRS_v2.1_P_TT_09944,o,c,"K" "$merged_file"
    ncatted -O -a units,RDRS_v2.1_P_P0_SFC,o,c,"Pa" "$merged_file"
    ncatted -O -a units,RDRS_v2.1_P_UVC_09944,o,c,"m s-1" "$merged_file"
    ncatted -O -a units,RDRS_v2.1_A_PR0_SFC,o,c,"mm s-1" "$merged_file"

    temp_file="${dir_merged_file}/${basin}_temp.nc"
    cdo -z zip -b F32 -aexpr,'RDRS_v2.1_P_TT_09944=RDRS_v2.1_P_TT_09944 + 273.15; RDRS_v2.1_P_P0_SFC=RDRS_v2.1_P_P0_SFC * 100.0; RDRS_v2.1_P_UVC_09944=RDRS_v2.1_P_UVC_09944 * 0.514444; RDRS_v2.1_A_PR0_SFC=RDRS_v2.1_A_PR0_SFC / 3.6' "$merged_file" "$temp_file"
    mv "$temp_file" "$merged_file"
}

for arg in "$@"; do
    case $arg in
        --section1)
            if [ -z "$SLURM_ARRAY_TASK_ID" ]; then
                run_section1_all_years
            else
                year=$((start_year + SLURM_ARRAY_TASK_ID))
                run_section1_single_year $year
            fi
            ;;
        --section2)
            run_section2
            ;;
        --section3)
            run_section3
            ;;
    esac
done

Module contents