Technical Workflow Documentation
SIPNET Uncertainty Quantification Pipeline
1 Pipeline Architecture
This workflow implements variance-based sensitivity analysis for SIPNET following the Dietze (2017) uncertainty partitioning framework. The pipeline decomposes forecast variance into parameter, initial condition, driver, and interaction components using complementary local (OAT) and global (Sobol) methods.
1.1 Script Execution Order
| Phase | Scripts | Purpose |
|---|---|---|
| Setup | 001–002 | Design point selection, PEcAn XML configuration |
| Local SA | 011–012 | OAT elasticities, partial variances |
| Global SA | 021–025 | Sobol design, events, model runs, \(S_i\)/\(T_i\) |
| Integration | 031 | Variance partitioning across uncertainty sources |
2 Configuration
Central configuration in 000-config.yml:
default:
paths:
ccmmf_dir: "/projectnb2/dietzelab/ccmmf"
data_dir: "data"
raw_data_dir: "data_raw"
sites:
n_sample: 10
sensitivity:
sigma_levels: [-2, -1, 1, 2]3 Phase 1: Setup
3.1 001_setup_design_points.R
Selects SA sites from the 198-site master design and maps LandIQ PFT names to PEcAn conventions.
PFT Mapping: - annual crop → grass - woody perennial crop → temperate.deciduous
Output: data_raw/sa_design_points.csv
3.2 002_build_xml.R
Generates multisite PEcAn settings XML with ensemble paths and sensitivity configuration.
Key settings: - Ensemble size: 20 (parameters) × 10 (met) - Simulation period: 2016–2023 - Sigma levels: ±1σ, ±2σ perturbations
Output: data_raw/settings_sa.xml
4 Phase 2: Local Sensitivity Analysis
4.1 011_run_local_sensitivity.R
Executes PEcAn’s OAT sensitivity workflow:
runModule.run.write.configs()— Generate model configurationsrunModule_start_model_runs()— Execute SIPNET via SGErunModule.get.results()— Extract outputsrunModule.run.sensitivity.analysis()— Compute elasticities
Rscript scripts/011_run_local_sensitivity.R --settings data_raw/settings_sa.xml4.2 012_aggregate_sensitivity.R
Aggregates site-level OAT results and analyzes environmental gradients.
Computed metrics:
| Metric | Definition |
|---|---|
| Elasticity | \(\varepsilon_i = \frac{\partial Y}{\partial \theta_i} \cdot \frac{\theta_i}{Y}\) |
| Partial variance | \(V_i = \varepsilon_i^2 \cdot \text{Var}(\theta_i)\) |
| Variance explained | \(V_i / \sum_j V_j \times 100\) |
Environmental gradient analysis: Regresses elasticity against MAT, MAP, clay, organic carbon, and TWI to identify context-dependent sensitivities.
Outputs: - data/aggregated_sensitivity.csv - data/parameter_rankings.csv - data/significant_gradients.csv
5 Phase 3: Global Sensitivity Analysis
5.1 021_generate_sobol_design.R
Generates Saltelli quasi-random design matrix for variance-based SA.
Design structure: - \(N \times (k + 2)\) model evaluations (Saltelli scheme) - Continuous parameters: quantile-transformed from posterior - Discrete inputs (IC, met): inverse transform method
# Discrete variable handling
design$ic_ensemble <- floor(qunif(design$ic_ensemble, min = 1, max = n_ic + 1))Dummy parameter: Uniform(0,1) baseline for numerical noise estimation.
Outputs: - data/sobol_design_matrix.csv - data/sobol_design_metadata.rds
5.2 022_prepare_pecan_inputs.R
Transforms Sobol design into PEcAn-compatible structures: - Parses pft.parameter namespacing → separate PFT lists - Creates ensemble.samples (list of data frames per PFT) - Creates input_design mapping sample_id → IC/met indices
Outputs: - output/samples.Rdata - cache/input_design.rds
5.3 024_run_global_sensitivity.R
Executes all \(N(k+2)\) model runs specified by the Sobol design.
settings$ensemble$size <- 1 # Each row = one run
settings <- runModule.run.write.configs(settings, input_design = input_design)5.4 025_compute_sobol_indices.R
Computes Sobol indices using sensobol:
idx <- sensobol::sobol_indices(Y = Y, N = N, params = params, boot = TRUE, R = 500)Output columns: - Si_original: First-order index (main effect) - Ti_original: Total-order index (main + interactions) - Bootstrap 95% CIs
Output: data/sobol_indices.csv
6 Phase 4: Variance Decomposition
6.1 031_partition_variance.R
Integrates local and global SA to partition total ensemble variance.
Category-level partition (from Sobol):
\[V_c = \sum_{i \in c} S_i \times \text{Var}(Y_{\text{ens}})\]
| Category | Source |
|---|---|
parameter |
All PFT parameters |
IC |
ic_ensemble index |
driver |
met_ensemble index |
dummy |
Numerical baseline |
interaction |
Residual: \(V_{total} - \sum V_i\) |
Parameter-level breakdown (from local SA):
Uses OAT partial variances to allocate \(V_{\text{parameter}}\) to individual parameters:
\[f_p = \frac{V_p^{OAT}}{\sum_j V_j^{OAT}}, \quad V_{\text{param},p} = f_p \times V_{\text{parameter}}\]
Outputs: - data/variance_partition_site_level.csv - data/variance_partition_parameters.csv - data/variance_partition_summary.csv
7 Function Library
7.1 R/local_sensitivity.R
| Function | Purpose |
|---|---|
aggregate_local_sa() |
Collect site-level OAT results, filter by site-PFT pairs |
summarize_local_sa() |
Cross-site rankings, PFT differences, constraint priority |
analyze_environmental_gradients() |
Regression of sensitivity vs. site covariates |
7.2 R/global_sensitivity.R
| Function | Purpose |
|---|---|
generate_sobol_design() |
Saltelli sampling with discrete input handling |
7.3 R/variance_decomposition.R
| Function | Purpose |
|---|---|
calculate_ensemble_variance() |
Extract Var(Y) from ensemble output files |
build_param_category_lookup() |
Map parameters → categories from metadata |
partition_variance_sources() |
Category-level variance attribution |
partition_parameter_variance_local() |
Individual parameter breakdown |
8 SGE Cluster Execution
Model runs are parallelized via SGE array jobs using sge_array_launcher.sh:
#$ -t 1-@NJOBS@
#$ -pe omp 1
#$ -l h_rt=2:00:00
#$ -l buyinThe launcher reads joblist.txt (generated by PEcAn) and executes the appropriate run directory for each task ID.
9 Execution
# Full pipeline (shell scripts)
bash scripts/run_pipeline.sh
# Or step-by-step:
bash scripts/run_local_sa.sh # 001 → 002 → 011 → 012
bash scripts/run_global_sa.sh # 021 → 022 → 023 → 024 → 025
Rscript scripts/031_partition_variance.R10 Output Variables
| Variable | Description | Units |
|---|---|---|
| NPP | Net Primary Productivity | kg C ha⁻¹ yr⁻¹ |
| AbvGrndWood | Aboveground Woody Biomass | kg C ha⁻¹ |
| Qle | Latent Heat Flux | W m⁻² |