VectorPreProcessing package
Submodules
VectorPreProcessing.Aggregation_vector module
Basin and River Network Aggregation
The merit_basin_aggregation function aggregates basin and river network shapefiles. This function uses parameters like minimum sub-area, slope, and river length to iteratively aggregate small sub-basins.
Parameters:
input_basin: Basin GeoDataFrame with COMID identifiers.
input_river: River network GeoDataFrame with slope and length attributes.
min_subarea: Minimum area for sub-basins.
min_slope: Minimum allowable river slope.
min_length: Minimum river length.
This function iterates through sub-basins, merging those below the minimum sub-area threshold until no further aggregation is possible. It also computes and adjusts slopes, river lengths, and weighted slopes for simplified river networks.
Example usage:
>>> from VectorPreProcessing.Aggregation_vector import merit_basin_aggregation
>>> import geopandas as gpd
>>> import os
>>> # Define paths and parameters
>>> input_basin_path = "/home/fuaday/github-repos/Souris_Assiniboine_MAF/1-geofabric/SrsAboine-geofabric/sras_subbasins_MAF_noAgg.shp"
>>> input_river_path = "/home/fuaday/github-repos/Souris_Assiniboine_MAF/1-geofabric/SrsAboine-geofabric/sras_rivers_MAF_noAgg.shp"
>>> min_subarea = 50
>>> min_slope = 0.0000001
>>> min_length = 1.0
>>> output_basin_path = "/home/fuaday/github-repos/Souris_Assiniboine_MAF/1-geofabric/sras_subbasins_MAF_Agg.shp"
>>> output_river_path = "/home/fuaday/github-repos/Souris_Assiniboine_MAF/1-geofabric/sras_rivers_MAF_Agg.shp"
>>> # Load input data
>>> input_basin = gpd.read_file(input_basin_path)
>>> input_river = gpd.read_file(input_river_path)
>>> # Perform aggregation
>>> agg_basin, agg_river = merit_basin_aggregation(input_basin, input_river, min_subarea, min_slope, min_length)
>>> # Save aggregated data
>>> agg_basin.to_file(output_basin_path)
>>> agg_river.to_file(output_river_path)
VectorPreProcessing.NetCDFWriter module
Overview
The NetCDFWriter class is designed to generate model-ready NetCDF files (e.g., MESH_parameters.nc) containing
soil and other geophysical subbasin data integrated from a vector shapefile and a NetCDF drainage database.
This class is typically used in workflows that prepare input parameters for land surface models like MESH.
It supports flexible handling of both layer-dependent (e.g., soil properties per depth layer) and layer-independent (e.g., slope, contributing area) variables. The output conforms to CF conventions and includes appropriate coordinate reference metadata for spatial consistency.
Function Descriptions
- class VectorPreProcessing.NetCDFWriter.NetCDFWriter(nc_filename, shapefile_path, input_ddb_path)[source]
Initializes the NetCDF writer with paths to the output file, input shapefile, and NetCDF drainage database.
- Parameters:
nc_filename (str) – Path to the NetCDF output file to be created.
shapefile_path (str) – Path to the input shapefile containing the attributes.
input_ddb_path (str) – Path to the NetCDF drainage database used to extract coordinates.
- VectorPreProcessing.NetCDFWriter.read_shapefile()
Reads the input shapefile and converts it into a GeoDataFrame. The file is automatically reprojected to EPSG:4326 (WGS 84).
- VectorPreProcessing.NetCDFWriter.set_coordinates()
Extracts lon, lat, and subbasin values from the NetCDF drainage database file. These values serve as the spatial base for NetCDF output.
- VectorPreProcessing.NetCDFWriter.set_num_soil_layers(num_layers)
Sets the number of vertical soil layers that will be written into the NetCDF file.
- Parameters:
num_layers (int) – The number of soil layers (e.g., 4 for a 4-layer soil profile).
- VectorPreProcessing.NetCDFWriter.add_var_attrs(var, attrs)
Adds metadata attributes to a NetCDF variable, such as units, standard name, and axis designation.
- Parameters:
var (netCDF4.Variable) – The NetCDF variable to modify.
attrs (dict) – Dictionary of attributes to apply.
- VectorPreProcessing.NetCDFWriter.write_netcdf(properties, variable_info)
Writes the actual NetCDF file using the specified properties and metadata.
- Parameters:
properties (dict) – Dictionary specifying which variables are layer-dependent vs. layer-independent.
variable_info (dict) – Dictionary mapping each variable to a tuple of (NetCDF name, data type, unit).
Example Usage
from VectorPreProcessing.NetCDFWriter import NetCDFWriter
# Paths for NetCDFWriter
nc_filename = 'MESH_parameters3.nc'
output_shapefile = 'merged_soil_data_shapefile4.shp'
input_ddb = '/scratch/fuaday/sras-agg-model/MESH-sras-agg/MESH_drainage_database.nc'
mesh_intervals = [(0, 0.1), (0.1, 0.35), (0.35, 1.2), (1.2, 4.1)]
# Initialize NetCDFWriter with the necessary paths
nc_writer = NetCDFWriter(
nc_filename=nc_filename,
shapefile_path=output_shapefile,
input_ddb_path=input_ddb
)
# Step 1: Read the attribute shapefile and extract spatial coordinates from the drainage database
nc_writer.read_shapefile()
nc_writer.set_coordinates()
# Step 2: Specify the number of vertical soil layers to include in the output
nc_writer.set_num_soil_layers(num_layers=len(mesh_intervals))
# Step 3: Define which variables are layer-dependent vs. layer-independent
properties = {
'layer_dependent': ['CLAY', 'SAND', 'OC'], # Varies by soil layer and subbasin
'layer_independent': ['ncontr', 'meanBDRICM', 'meanBDTICM', 'xslp', 'dd'] # Varies only by subbasin
}
# Step 4: Provide metadata for each variable to be written to NetCDF
variable_info = {
'CLAY': ('CLAY', 'f4', 'Percentage'),
'SAND': ('SAND', 'f4', 'Percentage'),
'OC': ('ORGM', 'f4', 'Percentage'),
'ncontr': ('IWF', 'i4', '1'),
'meanBDRICM': ('BDRICM', 'f4', 'Meters'),
'meanBDTICM': ('BDTICM', 'f4', 'Meters'),
'xslp': ('xslp', 'f4', 'degree'),
'dd': ('dd', 'f4', 'm_per_km2')
}
# Step 5: Write the final NetCDF file with structured metadata and spatial consistency
nc_writer.write_netcdf(properties=properties, variable_info=variable_info)
- class VectorPreProcessing.NetCDFWriter.NetCDFWriter(nc_filename, shapefile_path, input_ddb_path)[source]
Bases:
objectA class to generate NetCDF files with soil data merged from shapefiles and NetCDF drainage databases.
Attributes:
- nc_filenamestr
Path to the output NetCDF file.
- shapefile_pathstr
Path to the input shapefile.
- input_ddb_pathstr
Path to the NetCDF drainage database.
- merged_gdfgeopandas.GeoDataFrame
GeoDataFrame containing merged shapefile data.
- lonlist
List of longitude values from the NetCDF drainage database.
- latlist
List of latitude values from the NetCDF drainage database.
- segidlist
List of subbasin identifiers.
- num_soil_lyrsint
Number of soil layers in the dataset.
- add_var_attrs(var, attrs)[source]
Adds attributes to a NetCDF variable.
Parameters:
- varnetCDF4.Variable
The NetCDF variable to which attributes will be added.
- attrsdict
A dictionary of attribute names and values.
- read_shapefile()[source]
Reads the shapefile and converts it into a GeoDataFrame.
This function reads the shapefile, reprojects it to EPSG:4326 (WGS 84), and stores the result in the merged GeoDataFrame.
- set_coordinates()[source]
Extracts longitude, latitude, and subbasin IDs from the NetCDF drainage database.
- set_num_soil_layers(num_layers)[source]
Sets the number of soil layers for the NetCDF file.
Parameters:
- num_layersint
Number of soil layers to be included in the NetCDF file.
- write_netcdf(properties, variable_info)[source]
Creates a NetCDF file with processed soil data.
Parameters:
- propertiesdict
A dictionary with two keys: - ‘layer_dependent’: List of property names tied to the number of soil layers. - ‘layer_independent’: List of property names dependent only on the subbasin.
- variable_infodict
A dictionary mapping property names to tuples containing: (new variable name in NetCDF, data type code, unit).
VectorPreProcessing.convert_ddbnetcdf module
NetCDF to CSV/Shapefile Converter
This script converts a NetCDF file containing hydrological data into either a CSV file or a Shapefile.
This script contains a function convert_netcdf that converts a NetCDF file into either a CSV file or a Shapefile.
Example Usage:
>>> from convert_ddbnetcdf import convert_netcdf
>>> convert_netcdf(netcdf_file='input.nc', output_file='output.csv', conversion_type='csv')
>>> convert_netcdf(netcdf_file='input.nc', output_file='output.shp', conversion_type='shapefile')
Functions:
convert_netcdf: Converts a NetCDF file into either a CSV or a Shapefile.
Parameters:
netcdf_file (str): Path to the input NetCDF file.
output_file (str): Path to the output file (CSV or Shapefile).
conversion_type (str): Conversion type, either “csv” or “shapefile”.
- VectorPreProcessing.convert_ddbnetcdf.convert_netcdf(netcdf_file, output_file, conversion_type='csv')[source]
Converts a NetCDF file to either a CSV or a Shapefile.
Parameters:
- netcdf_filestr
Path to the input NetCDF file.
- output_filestr
Path to the output file (CSV or Shapefile).
- conversion_typestr, optional
Type of conversion (“csv” or “shapefile”), default is “csv”.
Returns:
None
VectorPreProcessing.gdf_edit module
gdf_edit.py
This module provides functions to flag non-contributing areas (NCAs) or lakes and reservoirs in GeoDataFrames based on intersection thresholds, with customizable options for column names, default values, and initialization values.
Example Usage
1. Using Shapefiles: >>> from VectorPreProcessing.gdf_edit import flag_ncaalg_from_files >>> flagged_gdf = flag_ncaalg_from_files( … ‘path/to/shapefile1.shp’, … ‘path/to/shapefile2.shp’, … threshold=0.1, … output_path=’output.shp’ … )
>>> flagged_gdf = flag_ncaalg_from_files(
... 'path/to/shapefile1.shp',
... 'path/to/shapefile2.shp',
... threshold=0.1,
... output_path='output.shp',
... ncontr_col="custom_flag_column", # Custom column in gdf1 to store flags
... value_column="NON_ID", # Column in gdf2 with values to assign
... initial_value=0, # Initial value for gdf1's flag column
... default_value=5 # Default value if no value_column specified
... )
2. Using GeoDataFrames Directly: >>> from VectorPreProcessing.gdf_edit import flag_ncaalg >>> import geopandas as gpd >>> gdf1 = gpd.read_file(‘path/to/shapefile1.shp’) >>> gdf2 = gpd.read_file(‘path/to/shapefile2.shp’) >>> flagged_gdf = flag_ncaalg(gdf1, gdf2, threshold=0.1)
>>> flagged_gdf = flag_ncaalg(
... gdf1,
... gdf2,
... threshold=0.1,
... ncontr_col="custom_flag_column", # Custom column in gdf1 to store flags
... value_column="NON_ID", # Column in gdf2 with values to assign
... initial_value=0, # Initial value for gdf1's flag column
... default_value=5 # Default value if no value_column specified
... )
- VectorPreProcessing.gdf_edit.flag_ncaalg(gdf1: GeoDataFrame, gdf2: GeoDataFrame, threshold: float = 0.1, output_path: str = None, ncontr_col: str = 'ncontr', value_column: str = None, initial_value=None, default_value=2) GeoDataFrame[source]
Flag intersections and optionally assign values from gdf2.
This function identifies intersections between polygons in gdf1 and gdf2 that meet a specified threshold. If an intersection is found, a constant value (default is 2) or a value from a specified column in gdf2 (if provided) is assigned to the corresponding row in gdf1. If multiple intersections exist, the first match is used.
- Parameters:
gdf1 (gpd.GeoDataFrame) – The primary GeoDataFrame.
gdf2 (gpd.GeoDataFrame) – The secondary GeoDataFrame with values to assign.
threshold (float, optional) – The threshold for considering an intersection significant (default is 0.1 or 10%).
output_path (str, optional) – Path where the modified gdf1 should be saved. If None, the file is not saved.
ncontr_col (str, optional) – The name of the column to store assigned values in gdf1.
value_column (str, optional) – The name of the column in gdf2 with values to assign to gdf1. If None, a constant value (default_value) is used.
initial_value (optional) – The initial value to assign to the ncontr_col column in gdf1 before processing intersections.
default_value (optional) – The default value to assign to the ncontr_col column if value_column is None (default is 2).
- Returns:
The modified gdf1 with assigned values based on intersections.
- Return type:
gpd.GeoDataFrame
- VectorPreProcessing.gdf_edit.flag_ncaalg_from_files(shapefile1: str, shapefile2: str, threshold: float = 0.1, output_path: str = None, ncontr_col: str = 'ncontr', value_column: str = None, initial_value=None, default_value=2) GeoDataFrame[source]
Read two shapefiles, set their CRS to EPSG:4326, and apply the flag_ncaalg function.
- Parameters:
shapefile1 (str) – Path to the first shapefile.
shapefile2 (str) – Path to the second shapefile.
threshold (float, optional) – The threshold for considering an intersection significant, as a fraction of the first GeoDataFrame’s polygon area (default is 0.1 for 10%).
output_path (str, optional) – Path where the modified first GeoDataFrame should be saved. If None, the file is not saved.
ncontr_col (str, optional) – The name of the column to flag intersections in gdf1.
value_column (str, optional) – The name of the column in gdf2 with values to assign to gdf1.
initial_value (optional) – The initial value to assign to the ncontr_col column in gdf1 before processing intersections.
default_value (optional) – The default value to assign to the ncontr_col column if value_column is None (default is 2).
- Returns:
The modified GeoDataFrame of the first GeoDataFrame with the specified column added.
- Return type:
gpd.GeoDataFrame
VectorPreProcessing.gsde_soil module
Overview
The GSDESoil class provides a pipeline to process, clean, interpolate, and integrate soil property data
into hydrological model inputs, such as those required by MESH. It is designed to handle GSDE-derived statistics
stored in CSV files, convert them to model-ready format using weighted depth-averaging, and merge them into a
basin shapefile based on unique identifiers (e.g., COMID).
Function Descriptions
- class VectorPreProcessing.gsde_soil.GSDESoil(directory, input_basin, output_shapefile)[source]
Initializes the processor with input/output paths.
- Parameters:
directory (str) – Directory containing input CSV files.
input_basin (str) – Path to the input shapefile with a COMID field.
output_shapefile (str) – Path where the merged shapefile will be saved.
- VectorPreProcessing.gsde_soil.load_data(file_names, search_replace_dict=None, suffix_dict=None)
Loads and merges soil data from multiple CSV files. Columns can be renamed using search/replace rules and optionally suffixed to avoid name collisions.
- Parameters:
file_names (list) – List of CSV file names.
search_replace_dict (dict, optional) – Dictionary with filename as key and (search_list, replace_list) as value.
suffix_dict (dict, optional) – Dictionary with filename as key and string suffix as value.
- VectorPreProcessing.gsde_soil.fill_and_clean_data(exclude_cols=['COMID'], exclude_patterns=['OC', 'BD', 'BDRICM', 'BDTICM'], max_val=100)
Cleans soil data by removing outliers, rescaling BDRICM/BDTICM, and filling missing values via forward/backward fill.
- Parameters:
exclude_cols (list) – Columns to ignore during cleaning.
exclude_patterns (list) – Substrings used to skip certain columns during range checks.
max_val (float) – Maximum threshold for valid data (values above this become NaN).
- VectorPreProcessing.gsde_soil.calculate_weights(gsde_intervals, mesh_intervals)
Computes weights to map GSDE soil depth intervals to model mesh layers.
- Parameters:
gsde_intervals (list of tuple) – List of tuples representing GSDE depth layers (e.g., [(0, 0.045), …]).
mesh_intervals (list of tuple) – List of tuples representing target model layer depths.
- VectorPreProcessing.gsde_soil.calculate_mesh_values(column_names)
Applies weights to calculate layer-averaged MESH-compatible soil properties.
- Parameters:
column_names (dict) – Dictionary mapping each property (e.g., “CLAY”, “OC”) to its source columns.
- VectorPreProcessing.gsde_soil.merge_and_save_shapefile()
Merges the processed soil data with the input basin shapefile using
COMIDand saves the final output.
- VectorPreProcessing.gsde_soil.set_coordinates(input_ddb)
Optionally reads spatial reference (lon, lat, subbasin) from a NetCDF drainage database.
- Parameters:
input_ddb (str) – Path to the NetCDF drainage database file.
Example Usage
from gsde_soil import GSDESoil
# Step 1: Initialize the soil processor with paths to your directories and files
gsde = GSDESoil(
directory='/home/fuaday/scratch/sras-agg-model/gistool-outputs',
input_basin='/home/fuaday/scratch/sras-agg-model/geofabric-outputs/sras_subbasins_MAF_Agg2.shp',
output_shapefile='merged_soil_data_shapefile4.shp'
)
# Step 2: Define the list of input CSV files
file_names = [
'sras_model_stats_CLAY1.csv', 'sras_model_stats_CLAY2.csv',
'sras_model_stats_SAND1.csv', 'sras_model_stats_SAND2.csv',
'sras_model_stats_OC1.csv', 'sras_model_stats_OC2.csv',
'sras_model_stats_BDRICM_M_250m_ll.csv',
'sras_model_stats_BDTICM_M_250m_ll.csv',
'sras_model_slope_degree.csv', 'sras_model_riv_0p1_2.csv'
]
# Step 3: Prepare renaming instructions for each file (search/replace patterns)
search_replace_dict = {
'sras_model_stats_CLAY1.csv': (['.CLAY_depth=4.5', '.CLAY_depth=9.1000004', '.CLAY_depth=16.6', '.CLAY_depth=28.9'], ['CLAY1', 'CLAY2', 'CLAY3', 'CLAY4']),
'sras_model_stats_CLAY2.csv': (['.CLAY_depth=49.299999', '.CLAY_depth=82.900002', '.CLAY_depth=138.3', '.CLAY_depth=229.60001'], ['CLAY5', 'CLAY6', 'CLAY7', 'CLAY8']),
'sras_model_stats_SAND1.csv': (['.SAND_depth=4.5', '.SAND_depth=9.1000004', '.SAND_depth=16.6', '.SAND_depth=28.9'], ['SAND1', 'SAND2', 'SAND3', 'SAND4']),
'sras_model_stats_SAND2.csv': (['.SAND_depth=49.299999', '.SAND_depth=82.900002', '.SAND_depth=138.3', '.SAND_depth=229.60001'], ['SAND5', 'SAND6', 'SAND7', 'SAND8']),
'sras_model_stats_OC1.csv': (['.OC_depth=4.5', '.OC_depth=9.1000004', '.OC_depth=16.6', '.OC_depth=28.9'], ['OC1', 'OC2', 'OC3', 'OC4']),
'sras_model_stats_OC2.csv': (['.OC_depth=49.299999', '.OC_depth=82.900002', '.OC_depth=138.3', '.OC_depth=229.60001'], ['OC5', 'OC6', 'OC7', 'OC8'])
}
# Step 4: Optionally specify suffixes to distinguish overlapping columns
suffix_dict = {
'sras_model_stats_BDRICM_M_250m_ll.csv': 'BDRICM',
'sras_model_stats_BDTICM_M_250m_ll.csv': 'BDTICM'
}
# Step 5: Load the data, applying renaming and suffixes
gsde.load_data(
file_names=file_names,
search_replace_dict=search_replace_dict,
suffix_dict=suffix_dict
)
# Step 6: Clean and prepare the soil data (e.g., remove outliers, fill NaNs)
gsde.fill_and_clean_data()
# Step 7: Define soil profile intervals for GSDE and MESH (depths in meters)
gsde_intervals = [(0, 0.045), (0.045, 0.091), (0.091, 0.166), (0.166, 0.289),
(0.289, 0.493), (0.493, 0.829), (0.829, 1.383), (1.383, 2.296)]
mesh_intervals = [(0, 0.1), (0.1, 0.35), (0.35, 1.2), (1.2, 4.1)]
gsde.calculate_weights(gsde_intervals, mesh_intervals)
# Step 8: Compute mesh-compatible weighted averages of soil properties
column_names = {
'CLAY': ['CLAY1', 'CLAY2', 'CLAY3', 'CLAY4', 'CLAY5', 'CLAY6', 'CLAY7', 'CLAY8'],
'SAND': ['SAND1', 'SAND2', 'SAND3', 'SAND4', 'SAND5', 'SAND6', 'SAND7', 'SAND8'],
'OC': ['OC1', 'OC2', 'OC3', 'OC4', 'OC5', 'OC6', 'OC7', 'OC8']
}
gsde.calculate_mesh_values(column_names)
# Step 9: Merge processed soil data into the basin shapefile and save output
gsde.merge_and_save_shapefile()
- class VectorPreProcessing.gsde_soil.GSDESoil(directory, input_basin, output_shapefile)[source]
Bases:
objectA class to process, clean, interpolate, and merge soil property data from CSV files with a given basin shapefile, producing model-ready soil inputs.
- directory
Directory containing input CSV files with soil properties.
- Type:
str
- input_basin
Path to the input basin shapefile with a ‘COMID’ identifier.
- Type:
str
- output_shapefile
Path to the output shapefile with processed soil attributes.
- Type:
str
- file_paths
List of full file paths for input CSVs.
- Type:
list
- gsde_df
Combined soil property table after processing.
- Type:
pandas.DataFrame
- merged_gdf
Final spatial dataset with soil properties merged to polygons.
- Type:
geopandas.GeoDataFrame
- weights_used
Weights used to interpolate soil layers into mesh layers.
- Type:
list of list
- mesh_intervals
Target depth intervals used for model input (e.g., MESH layers).
- Type:
list of tuple
- lon
Longitude values loaded from a NetCDF drainage database.
- Type:
ndarray
- lat
Latitude values loaded from a NetCDF drainage database.
- Type:
ndarray
- segid
Segment IDs (e.g., subbasin or COMID) from a drainage database.
- Type:
ndarray
- num_soil_lyrs
Number of output mesh layers.
- Type:
int
- calculate_mesh_values(column_names)[source]
Apply the calculated weights to soil property columns and generate epth-integrated values for each mesh layer.
Parameters:
- column_namesdict
Dictionary mapping each property (e.g., “CLAY”, “OC”) to its source columns. Example: {‘CLAY’: [‘CLAY1’, ‘CLAY2’, …], ‘OC’: [‘OC1’, ‘OC2’, …]}
- calculate_weights(gsde_intervals, mesh_intervals)[source]
Calculate the contribution weights from each GSDE layer to each model-defined mesh layer based on depth intervals.
Parameters:
- gsde_intervalslist of tuple
List of tuples representing GSDE depth layers (e.g., [(0, 0.045), …]).
- mesh_intervalslist of tuple
Target model layer depths (e.g., [(0, 0.1), (0.1, 0.35), …]).
- fill_and_clean_data(exclude_cols=['COMID'], exclude_patterns=['OC', 'BD', 'BDRICM', 'BDTICM'], max_val=100)[source]
Clean the soil data by: - Replacing extreme values with NaN (based on max_val). - Normalizing and capping specific fields (e.g., BDRICM/BDTICM). - Filling missing values using forward and backward fill.
- Parameters:
exclude_cols (list of str) – Columns to exclude from NaN replacement.
exclude_patterns (list of str) – Column name substrings to skip when applying value caps.
max_val (float) – Maximum valid threshold for general soil values.
- static load_and_merge_files(file_list, search_replace_dict=None, suffix_dict=None, key='COMID')[source]
Load multiple CSV files and merge them on a common key. Renames and suffixes column names as needed during the loading process.
- Parameters:
file_list (list of str) – List of full CSV file paths.
search_replace_dict (dict, optional) – Column renaming instructions for each file.
suffix_dict (dict, optional) – Suffix strings to append to column names by file.
key (str) – Primary key used to merge all data files (default is ‘COMID’).
- Returns:
Merged DataFrame containing columns from all input files.
- Return type:
pandas.DataFrame
- load_data(file_names, search_replace_dict=None, suffix_dict=None)[source]
Load and merge multiple CSV files into a single DataFrame. Optionally apply search-and-replace logic and suffixes to column names to ensure compatibility.
- Parameters:
file_names (list of str) – List of filenames to load from the given directory.
search_replace_dict (dict, optional) – Dictionary where keys are filenames and values are (search_list, replace_list) tuples used to rename columns (e.g., depth labels to CLAY1, CLAY2, etc.).
suffix_dict (dict, optional) – Dictionary where keys are filenames and values are suffix strings to append to column names (useful for distinguishing overlapping variables).
- merge_and_save_shapefile()[source]
Merge the processed soil data (via COMID) into the input shapefile and save the result. Output is a GeoDataFrame with mesh values appended as new attributes.
VectorPreProcessing.remap_climate_to_ddb module
- VectorPreProcessing.remap_climate_to_ddb.process_file(file_path, segid, lon, lat, output_directory)[source]
Process a single NetCDF file and remap its data to the drainage database (DDB) format.
- Parameters:
file_path (str) – Path to the input NetCDF file.
segid (numpy.ndarray) – Array of subbasin IDs from the drainage database.
lon (numpy.ndarray) – Array of longitude values from the drainage database.
lat (numpy.ndarray) – Array of latitude values from the drainage database.
output_directory (str) – Path to the directory where the processed file will be saved.
Example
>>> from remap_climate_to_ddb import process_file >>> process_file( ... file_path="path/to/input.nc", ... segid=subbasin_ids, ... lon=longitudes, ... lat=latitudes, ... output_directory="path/to/output" ... )
- VectorPreProcessing.remap_climate_to_ddb.remap_rdrs_climate_data(input_directory, output_directory, input_basin, input_ddb, start_year, end_year)[source]
Remap RDRS climate data to a drainage database (DDB) format for a range of years.
- Parameters:
input_directory (str) – Path to the directory containing input NetCDF files.
output_directory (str) – Path to the directory where processed files will be saved.
input_basin (str) – Path to the basin shapefile.
input_ddb (str) – Path to the drainage database NetCDF file.
start_year (int) – Start year of the data to process.
end_year (int) – End year of the data to process.
Example
>>> from remap_climate_to_ddb import remap_rdrs_climate_data >>> remap_rdrs_climate_data( ... input_directory="path/to/input", ... output_directory="path/to/output", ... input_basin="path/to/basin.shp", ... input_ddb="path/to/ddb.nc", ... start_year=2000, ... end_year=2020 ... )
- VectorPreProcessing.remap_climate_to_ddb.remap_rdrs_climate_data_single_year(input_directory, output_directory, input_basin, input_ddb, year)[source]
Remap RDRS climate data to a drainage database (DDB) format for a single year.
- Parameters:
input_directory (str) – Path to the directory containing input NetCDF files.
output_directory (str) – Path to the directory where processed files will be saved.
input_basin (str) – Path to the basin shapefile.
input_ddb (str) – Path to the drainage database NetCDF file.
year (int) – Year of the data to process.
Example
>>> from remap_climate_to_ddb import remap_rdrs_climate_data_single_year >>> remap_rdrs_climate_data_single_year( ... input_directory="path/to/input", ... output_directory="path/to/output", ... input_basin="path/to/basin.shp", ... input_ddb="path/to/ddb.nc", ... year=2020 ... )
SLURM Script Usage
This SLURM script demonstrates how to use the functions
remap_rdrs_climate_data and remap_rdrs_climate_data_single_year in an HPC environment.
Typical Usage
Run all sections in a single job:
sbatch Forcing_RDRS_processingMet3.sh --section1 --section2 --section3
Run each year in parallel using SLURM array jobs:
sbatch --array=0-38 Forcing_RDRS_processingMet3.sh --section1
SLURM Shell Script
#!/bin/bash
#SBATCH --account=rpp-kshook
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --mem-per-cpu=30G
#SBATCH --time=24:00:00
#SBATCH --job-name=vectForcRDRS
#SBATCH --mail-user=fuad.yassin@usask.ca
#SBATCH --mail-type=BEGIN,END,FAIL
: '
This script processes climate forcing data for the vector-based MESH RDRS dataset.
Supports array jobs and all-years processing.
'
module load cdo
module load nco
basin="sras"
start_year=1980
end_year=2018
input_forcing_easymore='/scratch/fuaday/sras-agg-model/easymore-outputs'
ddb_remapped_output_forcing='/scratch/fuaday/sras-agg-model/easymore-outputs2'
input_basin='/scratch/fuaday/sras-agg-model/geofabric-outputs/sras_subbasins_MAF_Agg.shp'
input_ddb='/scratch/fuaday/sras-agg-model/MESH-sras-agg/MESH_drainage_database.nc'
dir_merged_file="/scratch/fuaday/sras-agg-model/easymore-outputs-merged"
merged_file="${dir_merged_file}/${basin}_rdrs_${start_year}_${end_year}_v21_allVar.nc"
source $HOME/virtual-envs/scienv/bin/activate
module load StdEnv/2020
module load gcc/9.3.0
module restore scimods
module load cdo
module load nco
function run_section1_single_year {
local year=$1
python -c "
import sys
sys.path.append('$HOME/virtual-envs/scienv/lib/python3.8/site-packages')
from MESHpyPreProcessing.remap_rdrs_climate_data import remap_rdrs_climate_data_single_year
remap_rdrs_climate_data_single_year(
input_directory='$input_forcing_easymore',
output_directory='$ddb_remapped_output_forcing',
input_basin='$input_basin',
input_ddb='$input_ddb',
year=$year
)
"
}
function run_section1_all_years {
python -c "
import sys
sys.path.append('$HOME/virtual-envs/scienv/lib/python3.8/site-packages')
from MESHpyPreProcessing.remap_rdrs_climate_data import remap_rdrs_climate_data
remap_rdrs_climate_data(
input_directory='$input_forcing_easymore',
output_directory='$ddb_remapped_output_forcing',
input_basin='$input_basin',
input_ddb='$input_ddb',
start_year=$start_year,
end_year=$end_year
)
"
}
function run_section2 {
mkdir -p "$dir_merged_file"
merge_cmd="cdo mergetime"
for (( year=$start_year; year<=$end_year; year++ )); do
merge_cmd+=" ${ddb_remapped_output_forcing}/remapped_remapped_ncrb_model_${year}*.nc"
done
$merge_cmd "$merged_file"
}
function run_section3 {
ncatted -O -a units,RDRS_v2.1_P_TT_09944,o,c,"K" "$merged_file"
ncatted -O -a units,RDRS_v2.1_P_P0_SFC,o,c,"Pa" "$merged_file"
ncatted -O -a units,RDRS_v2.1_P_UVC_09944,o,c,"m s-1" "$merged_file"
ncatted -O -a units,RDRS_v2.1_A_PR0_SFC,o,c,"mm s-1" "$merged_file"
temp_file="${dir_merged_file}/${basin}_temp.nc"
cdo -z zip -b F32 -aexpr,'RDRS_v2.1_P_TT_09944=RDRS_v2.1_P_TT_09944 + 273.15; RDRS_v2.1_P_P0_SFC=RDRS_v2.1_P_P0_SFC * 100.0; RDRS_v2.1_P_UVC_09944=RDRS_v2.1_P_UVC_09944 * 0.514444; RDRS_v2.1_A_PR0_SFC=RDRS_v2.1_A_PR0_SFC / 3.6' "$merged_file" "$temp_file"
mv "$temp_file" "$merged_file"
}
for arg in "$@"; do
case $arg in
--section1)
if [ -z "$SLURM_ARRAY_TASK_ID" ]; then
run_section1_all_years
else
year=$((start_year + SLURM_ARRAY_TASK_ID))
run_section1_single_year $year
fi
;;
--section2)
run_section2
;;
--section3)
run_section3
;;
esac
done