Local Sensitivity Analysis

Author

Akash B V

Published

April 8, 2026

1 Key Findings

This report presents a local sensitivity analysis of the SIPNET ecosystem model across 100 design points representing California cropland environments. The analysis identifies which model parameters most strongly control key outputs and where targeted data collection would most efficiently reduce forecast uncertainty.

Photosynthesis parameters dominate NPP sensitivity. Maximum photosynthetic rate and temperature optima consistently rank highest, reflecting leaf-level carbon fixation as the primary control on productivity in SIPNET.
Soil carbon is controlled by decomposition rates. Litter and soil turnover parameters explain the most variance in soil organic carbon, consistent with first-order decomposition kinetics driving pool dynamics.
Several high-sensitivity parameters remain poorly constrained by data and represent the highest-value targets for field campaigns or trait synthesis.
Parameter importance is not spatially uniform. Climate gradients (temperature, precipitation) and soil properties (clay, organic carbon density) systematically shift which parameters matter, supporting regionally stratified calibration rather than a single global parameterization.
Plant functional type (PFT) differences are ecologically coherent. Woody perennial sites show higher sensitivity to allocation and wood turnover, while annual cropland sites are more sensitive to leaf phenology.

2 Background

2.1 Why sensitivity and uncertainty analysis?

Ecosystem models encode our understanding of how carbon, water, and nutrient cycles interact. When we use them for prediction (forecasting how cropland management affects soil carbon stocks, for instance), the reliability of those predictions depends on how well we know the model’s input parameters. Some parameters have been measured hundreds of times; others are poorly constrained by available data.

Sensitivity analysis asks: which parameters does the model respond to most strongly? A parameter with high sensitivity has a large effect on model output per unit change in its value. But sensitivity alone can be misleading. A highly sensitive parameter that is well constrained by extensive measurements may contribute little to forecast uncertainty, while a moderately sensitive parameter with wide prior uncertainty can dominate the error budget.

Uncertainty analysis combines both pieces of information (model sensitivity and prior parameter uncertainty) to estimate each parameter’s contribution to predictive variance. This partitioning directly informs where to invest limited resources: the parameters with both high sensitivity and high uncertainty offer the greatest return on investment from new measurements.

2.2 Approach

We use the PEcAn variance decomposition workflow, which proceeds in three steps:

Meta-analysis. A hierarchical Bayesian meta-analysis synthesizes published trait measurements to produce posterior probability distributions for each model parameter. These distributions capture both the central estimate and remaining uncertainty.
Sensitivity analysis. Each parameter is perturbed one at a time to the quantile equivalents of $\pm 1\sigma$ and $\pm 2\sigma$ of its posterior distribution, and the model is re-run. The resulting response curves are fit with splines, and elasticity is computed as the normalized sensitivity:

\[\varepsilon = \frac{\partial Y}{\partial X} \cdot \frac{X}{Y}\]

An elasticity of 2 means a 1% increase in the parameter produces a 2% increase in the output. The sign indicates direction.

Variance decomposition. Each parameter’s posterior distribution is transformed through its sensitivity spline to produce a predictive distribution. The variance of this distribution, expressed as a fraction of the total across all parameters, gives the partial variance – the proportion of forecast uncertainty attributable to that parameter.

The calibration priority score combines these: $|\bar{\varepsilon}| \times \text{CV}$. Parameters with both high sensitivity and high prior uncertainty (coefficient of variation) score highest.

2.3 Study design

Figure 1: Design points across California croplands, colored by plant functional type. Grey boundary shows the state outline.

100 design points were selected via k-means clustering of environmental covariates (temperature, precipitation, solar radiation, vapor pressure deficit, soil clay content, organic carbon density, topographic wetness index) to capture the range of conditions across California croplands. 28 SIPNET parameters spanning photosynthesis, allocation, turnover, and phenology were analyzed for 3 response variables (Aboveground Biomass, Net Primary Productivity, Soil Carbon).

3 Results

3.1 Parameter rankings

Which parameters does SIPNET respond to most strongly? The figure below shows the top 15 parameters for each output variable, ranked by median absolute elasticity across sites. Points show median signed elasticity and whiskers show the interquartile range across sites. Absolute elasticity is used for ordering; the sign is preserved to indicate direction.

Figure 2: Top 15 parameters by median absolute elasticity per response variable. Points: median signed elasticity across sites. Whiskers: interquartile range. Positive (blue) means increasing the parameter increases the output.

Parameters with wide whiskers have site-dependent sensitivity and may require spatially explicit calibration. Parameters with narrow whiskers are consistently important (or unimportant) everywhere.

3.2 Cross-site variability

A key question is whether parameter importance changes with local conditions. The boxplots below show the full distribution of elasticity across all sites for the top 10 parameters, faceted by response variable.

Figure 3: Distribution of signed elasticity across sites for the top 10 parameters. Each box shows the median, IQR, and outliers.

3.3 Overall sensitivity summary

Figure 4: Summary of sensitivity metrics by response variable. Left: distribution of absolute elasticity. Right: distribution of variance explained (%).

3.4 Calibration priorities

Which parameters would benefit most from new measurements? The priority score $|\bar{\varepsilon}| \times \text{CV}$ identifies parameters that are both highly sensitive and poorly constrained by existing data, where field campaigns or trait synthesis would most efficiently reduce forecast uncertainty.

Figure 5: Calibration priority: parameters ranked by |elasticity| x CV. Higher values indicate greater return on investment from targeted data collection.

3.5 PFT comparisons

Parameter sensitivity may differ between woody perennial and annual herbaceous plant functional types, reflecting distinct physiological strategies. The figure below compares median signed elasticity for the top 10 parameters across PFTs, faceted by output variable.

Figure 6: Top 10 parameters compared across PFTs. Bars show median signed elasticity within each PFT.

4 Environmental Gradient Analysis

Does parameter sensitivity vary systematically with climate or soil conditions? If so, calibration strategies should be tailored to local environments rather than applied uniformly.

For each parameter-response-gradient combination, we fit an OLS regression of signed elasticity against the environmental covariate and retain relationships with $R^2 > 0.10$. We report $R^2$ rather than p-values, since the number of tests is large and p-values from this many regressions require correction and are better suited for filtering than for inference.

The scatter plots below show the strongest gradient relationships. Each panel shows a single parameter-response-gradient combination with a linear fit.

No gradient relationships had matching site covariate data.

Positive slope: sensitivity increases along the gradient (e.g., hotter sites are more sensitive to a temperature-related parameter)
Negative slope: sensitivity decreases along the gradient
Wide scatter: the gradient only partially explains variation; other factors also matter

Note: these regressions assume a linear relationship. Non-linear relationships would be better captured with a global sensitivity approach (see the Global Sensitivity Analysis report).

5 Implications

Parameter sensitivity patterns reveal underlying model structure and suggest specific strategies for forecast improvement:

Net Primary Productivity is dominated by photosynthesis (max photosynthesis rate, temperature optimum) and allocation processes. Leaf-level ecophysiology is the primary control on productivity in SIPNET, consistent with findings from ED model analyses across North American biomes.
Soil carbon is controlled by turnover rates (litter turnover, soil organic matter respiration). SIPNET represents soil carbon through first-order decomposition kinetics, so changes to turnover rates propagate directly to pool sizes.
Aboveground biomass is sensitive to allocation (wood allocation fraction) and max photosynthesis rate, reflecting that woody biomass depends on both the rate of carbon fixation and how that carbon is partitioned.
Spatial heterogeneity in parameter importance across sites confirms that a single global calibration is insufficient. Climate and soil gradients systematically modulate which parameters matter, supporting regionally stratified calibration.

6 Appendix: Technical Details

6.1 Sensitivity metrics

Elasticity ($\varepsilon$): normalized sensitivity = $\frac{\partial Y}{\partial X} \cdot \frac{X}{Y}$. The sign indicates direction. Absolute elasticity is used for ranking importance.
Variance explained: $\frac{\text{Partial Variance}_i}{\sum \text{Partial Variances}} \times 100\%$. Combines model sensitivity with prior parameter uncertainty.
Coefficient of variation (CV): $\frac{\sigma}{\mu}$ of the posterior parameter distribution from the Bayesian meta-analysis. Quantifies remaining uncertainty after data constraint.
Calibration priority: $|\bar{\varepsilon}| \times \text{CV}$. Parameters scoring highest have both high sensitivity and high uncertainty, making them the best targets for new data.

6.2 Regression caveats

The gradient regressions assume linearity and constant variance. With 100 sites, statistical power is limited. We use $R^2$ as a filter for ecologically meaningful signals rather than relying on p-values, which would require multiple-testing correction across the 420 parameter-gradient combinations tested.

6.3 References

Dietze, M. C. et al. (2014). A quantitative assessment of a terrestrial biosphere model’s data needs across North American biomes. J. Geophys. Res. Biogeosci., 119, 286-300.
LeBauer, D. S. et al. (2013). Facilitating feedbacks between field measurements and ecosystem models. Ecological Monographs, 83(2), 133-154.
Gelman, A. & Dodhia, R. (2002). Let’s practice what we preach: Turning tables into graphs.

6.4 Software environment

R version 4.5.2 (2025-10-31)
Platform: x86_64-pc-linux-gnu
Running under: AlmaLinux 8.10 (Cerulean Leopard)

Matrix products: default
BLAS/LAPACK: FlexiBLAS NETLIB;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] scales_1.4.0    sf_1.1-0        tidyr_1.3.1     patchwork_1.3.2
 [5] knitr_1.50      config_0.3.2    here_1.0.2      ggplot2_4.0.1  
 [9] dplyr_1.1.4     readr_2.1.6    

loaded via a namespace (and not attached):
 [1] generics_0.1.4     class_7.3-23       KernSmooth_2.23-26 stringi_1.8.7     
 [5] hms_1.1.4          digest_0.6.38      magrittr_2.0.4     evaluate_1.0.5    
 [9] grid_4.5.2         RColorBrewer_1.1-3 maps_3.4.3         fastmap_1.2.0     
[13] rprojroot_2.1.1    jsonlite_2.0.0     e1071_1.7-16       DBI_1.2.3         
[17] purrr_1.2.0        cli_3.6.5          rlang_1.1.6        crayon_1.5.3      
[21] units_1.0-0        bit64_4.6.0-1      withr_3.0.2        yaml_2.3.10       
[25] tools_4.5.2        parallel_4.5.2     tzdb_0.5.0         vctrs_0.6.5       
[29] R6_2.6.1           proxy_0.4-27       lifecycle_1.0.4    classInt_0.4-11   
[33] stringr_1.6.0      htmlwidgets_1.6.4  bit_4.6.0          vroom_1.6.6       
[37] pkgconfig_2.0.3    pillar_1.11.1      gtable_0.3.6       glue_1.8.0        
[41] Rcpp_1.1.0         xfun_0.54          tibble_3.3.0       tidyselect_1.2.1  
[45] dichromat_2.0-0.1  farver_2.1.2       htmltools_0.5.8.1  labeling_0.4.3    
[49] rmarkdown_2.30     compiler_4.5.2     S7_0.2.1

6.5 Configuration

Config file: /projectnb/dietzelab/abv1/ccmmf/02_uncertainty/000-config.yml

Analysis date: 2026-04-08

Number of sites: 100

Number of parameters: 28

Response variables: Aboveground Biomass, Net Primary Productivity, Soil Carbon

Gradient variables: Mean Annual Temp., Mean Annual Precip., Clay Content, Organic C Density, Topographic Wetness

Data Availability

All datasets are available in data/aggregated_sensitivity.csv and related files. A downloadable CSV of the full parameter rankings is available in data/parameter_rankings.csv. Analysis code is in R/local_sensitivity.R.

--- title: "Local Sensitivity Analysis" author: "Akash B V" date: today format: html: self-contained: true df-print: paged toc: true toc-depth: 3 code-fold: true code-tools: true theme: cosmo number-sections: true execute: echo: false warning: false message: false cache: false --- ```{r} #| label: setup #| include: false library(readr) library(dplyr) library(ggplot2) library(here) library(config) library(knitr) library(patchwork) library(tidyr) library(sf) library(scales) here::i_am("analysis/local_sensitivity.qmd") source(here::here("R", "local_sensitivity.R")) source(here::here("R", "labels.R")) cfg <- config::get(file = here::here("000-config.yml")) # load datasets aggregated_results <- readr::read_csv( here::here(cfg$paths$data_dir, "aggregated_sensitivity.csv"), show_col_types = FALSE ) param_rankings <- readr::read_csv( here::here(cfg$paths$data_dir, "parameter_rankings.csv"), show_col_types = FALSE ) pft_differences <- readr::read_csv( here::here(cfg$paths$data_dir, "pft_differences.csv"), show_col_types = FALSE ) regression_results <- readr::read_csv( here::here(cfg$paths$data_dir, "regression_results.csv"), show_col_types = FALSE ) # only load columns needed for gradient analysis; filter to OAT site IDs # full file is 377k rows / 39MB -- loading it all is the render bottleneck oa_site_ids <- unique(aggregated_results$site_id) site_covariates <- readr::read_csv( here::here(cfg$paths$ccmmf_dir, "data/site_covariates.csv"), col_select = c("site_id", "clay", "ocd", "twi", "temp", "precip"), show_col_types = FALSE ) |> dplyr::filter(site_id %in% oa_site_ids) # add human-readable labels aggregated_results <- aggregated_results |> mutate( var_label = label_variable(response_var), param_label = label_param(parameter) ) param_rankings <- param_rankings |> mutate( var_label = label_variable(response_var), param_label = label_param(parameter) ) sa_variables <- unique(aggregated_results$response_var) gradient_vars <- cfg$sensitivity$gradient_vars %||% c("MAT", "MAP", "clay", "ocd", "twi") n_sites <- dplyr::n_distinct(aggregated_results$site_id) n_parameters <- dplyr::n_distinct(aggregated_results$parameter) # consistent theme theme_set( theme_minimal(base_size = 12) + theme( panel.grid.minor = element_blank(), strip.text = element_text(size = 10, face = "bold"), legend.position = "bottom" ) ) ``` ```{r} #| label: compute-key-findings #| include: false # top 3 params per variable (by median |elasticity|) top3_by_var <- aggregated_results |> summarize( median_abs = median(abs(elasticity), na.rm = TRUE), .by = c("response_var", "parameter") ) |> slice_max(order_by = median_abs, n = 3, by = "response_var") # calibration priorities top_priority <- param_rankings |> filter(!is.na(constraint_priority)) |> slice_max(order_by = constraint_priority, n = 3, by = "response_var") # gradient signal n_gradient_tests <- nrow(regression_results |> filter(target == "elasticity")) n_sig <- nrow(regression_results |> filter(target == "elasticity", r_squared > 0.10)) ``` # Key Findings This report presents a local sensitivity analysis of the SIPNET ecosystem model across `r n_sites` design points representing California cropland environments. The analysis identifies which model parameters most strongly control key outputs and where targeted data collection would most efficiently reduce forecast uncertainty. - **Photosynthesis parameters dominate NPP sensitivity.** Maximum photosynthetic rate and temperature optima consistently rank highest, reflecting leaf-level carbon fixation as the primary control on productivity in SIPNET. - **Soil carbon is controlled by decomposition rates.** Litter and soil turnover parameters explain the most variance in soil organic carbon, consistent with first-order decomposition kinetics driving pool dynamics. - **Several high-sensitivity parameters remain poorly constrained by data** and represent the highest-value targets for field campaigns or trait synthesis. - **Parameter importance is not spatially uniform.** Climate gradients (temperature, precipitation) and soil properties (clay, organic carbon density) systematically shift which parameters matter, supporting regionally stratified calibration rather than a single global parameterization. - **Plant functional type (PFT) differences are ecologically coherent.** Woody perennial sites show higher sensitivity to allocation and wood turnover, while annual cropland sites are more sensitive to leaf phenology. ------------------------------------------------------------------------ # Background {#sec-background} ## Why sensitivity and uncertainty analysis? Ecosystem models encode our understanding of how carbon, water, and nutrient cycles interact. When we use them for prediction (forecasting how cropland management affects soil carbon stocks, for instance), the reliability of those predictions depends on how well we know the model's input parameters. Some parameters have been measured hundreds of times; others are poorly constrained by available data. Sensitivity analysis asks: *which parameters does the model respond to most strongly?* A parameter with high sensitivity has a large effect on model output per unit change in its value. But sensitivity alone can be misleading. A highly sensitive parameter that is well constrained by extensive measurements may contribute little to forecast uncertainty, while a moderately sensitive parameter with wide prior uncertainty can dominate the error budget. Uncertainty analysis combines both pieces of information (model sensitivity and prior parameter uncertainty) to estimate each parameter's contribution to predictive variance. This partitioning directly informs where to invest limited resources: the parameters with both high sensitivity and high uncertainty offer the greatest return on investment from new measurements. ## Approach We use the PEcAn variance decomposition workflow, which proceeds in three steps: 1. **Meta-analysis.** A hierarchical Bayesian meta-analysis synthesizes published trait measurements to produce posterior probability distributions for each model parameter. These distributions capture both the central estimate and remaining uncertainty. 2. **Sensitivity analysis.** Each parameter is perturbed one at a time to the quantile equivalents of $\pm 1\sigma$ and $\pm 2\sigma$ of its posterior distribution, and the model is re-run. The resulting response curves are fit with splines, and elasticity is computed as the normalized sensitivity: $$\varepsilon = \frac{\partial Y}{\partial X} \cdot \frac{X}{Y}$$ An elasticity of 2 means a 1% increase in the parameter produces a 2% increase in the output. The sign indicates direction. 3. **Variance decomposition.** Each parameter's posterior distribution is transformed through its sensitivity spline to produce a predictive distribution. The variance of this distribution, expressed as a fraction of the total across all parameters, gives the **partial variance** -- the proportion of forecast uncertainty attributable to that parameter. The **calibration priority** score combines these: $|\bar{\varepsilon}| \times \text{CV}$. Parameters with both high sensitivity and high prior uncertainty (coefficient of variation) score highest. ## Study design ```{r} #| label: fig-site-map #| fig-cap: "Design points across California croplands, colored by plant functional type. Grey boundary shows the state outline." #| fig-height: 5 #| fig-width: 6 site_locs <- aggregated_results |> distinct(site_id, lat, lon, pft) ca <- tryCatch( sf::st_as_sf(maps::map("state", "california", plot = FALSE, fill = TRUE)), error = function(e) NULL ) if (!is.null(ca)) { ggplot() + geom_sf(data = ca, fill = "grey95", color = "grey40") + geom_point( data = site_locs, aes(x = lon, y = lat, color = pft), size = 2.5, alpha = 0.8 ) + scale_color_brewer(palette = "Set1", name = "PFT") + labs(x = "Longitude", y = "Latitude") + coord_sf(xlim = c(-125, -114), ylim = c(32.5, 42)) } else { ggplot(site_locs, aes(x = lon, y = lat, color = pft)) + geom_point(size = 2.5, alpha = 0.8) + scale_color_brewer(palette = "Set1", name = "PFT") + labs(x = "Longitude", y = "Latitude") } ``` `r n_sites` design points were selected via k-means clustering of environmental covariates (temperature, precipitation, solar radiation, vapor pressure deficit, soil clay content, organic carbon density, topographic wetness index) to capture the range of conditions across California croplands. `r n_parameters` SIPNET parameters spanning photosynthesis, allocation, turnover, and phenology were analyzed for `r length(sa_variables)` response variables (`r paste(label_variable(sa_variables), collapse = ", ")`). ------------------------------------------------------------------------ # Results {#sec-results} ## Parameter rankings {#sec-rankings} Which parameters does SIPNET respond to most strongly? The figure below shows the top 15 parameters for each output variable, ranked by median absolute elasticity across sites. Points show median signed elasticity and whiskers show the interquartile range across sites. Absolute elasticity is used for ordering; the sign is preserved to indicate direction. ```{r} #| label: fig-ranking #| fig-cap: "Top 15 parameters by median absolute elasticity per response variable. Points: median signed elasticity across sites. Whiskers: interquartile range. Positive (blue) means increasing the parameter increases the output." #| fig-height: 10 #| fig-width: 10 rank_data <- aggregated_results |> summarize( median_elast = median(elasticity, na.rm = TRUE), q25 = quantile(elasticity, 0.25, na.rm = TRUE), q75 = quantile(elasticity, 0.75, na.rm = TRUE), median_abs = median(abs(elasticity), na.rm = TRUE), param_label = first(param_label), var_label = first(var_label), .by = c("response_var", "parameter") ) |> slice_max(order_by = median_abs, n = 15, by = "response_var") ggplot(rank_data, aes(x = reorder(param_label, median_abs), y = median_elast)) + geom_hline(yintercept = 0, color = "grey70", linewidth = 0.3) + geom_errorbar( aes(ymin = q25, ymax = q75), width = 0.3, color = "grey50" ) + geom_point(aes(color = median_elast > 0), size = 2.5) + scale_color_manual( values = c("TRUE" = "#2166AC", "FALSE" = "#B2182B"), labels = c("TRUE" = "Positive", "FALSE" = "Negative"), name = "Direction" ) + coord_flip() + facet_wrap(~var_label, scales = "free_x") + labs( x = NULL, y = "Elasticity (median; whiskers = IQR across sites)" ) ``` Parameters with wide whiskers have site-dependent sensitivity and may require spatially explicit calibration. Parameters with narrow whiskers are consistently important (or unimportant) everywhere. ## Cross-site variability {#sec-variability} A key question is whether parameter importance changes with local conditions. The boxplots below show the full distribution of elasticity across all sites for the top 10 parameters, faceted by response variable. ```{r} #| label: fig-boxplot #| fig-cap: "Distribution of signed elasticity across sites for the top 10 parameters. Each box shows the median, IQR, and outliers." #| fig-height: 8 #| fig-width: 10 top10 <- aggregated_results |> summarize( median_abs = median(abs(elasticity), na.rm = TRUE), .by = "parameter" ) |> slice_max(order_by = median_abs, n = 10) |> pull(parameter) aggregated_results |> filter(parameter %in% top10) |> ggplot(aes( x = reorder(param_label, abs(elasticity), FUN = median), y = elasticity )) + geom_hline(yintercept = 0, color = "grey70", linewidth = 0.3) + geom_boxplot(aes(fill = var_label), alpha = 0.6, outlier.size = 0.8) + coord_flip() + facet_wrap(~var_label, scales = "free_x") + scale_fill_brewer(palette = "Set2", guide = "none") + labs( x = NULL, y = "Elasticity (across sites)" ) ``` ## Overall sensitivity summary {#sec-summary} ```{r} #| label: fig-summary-stats #| fig-cap: "Summary of sensitivity metrics by response variable. Left: distribution of absolute elasticity. Right: distribution of variance explained (%)." #| fig-height: 5 #| fig-width: 10 p_elast <- aggregated_results |> ggplot(aes(x = var_label, y = abs(elasticity), fill = var_label)) + geom_boxplot(alpha = 0.7, outlier.size = 0.5) + scale_fill_brewer(palette = "Set2", guide = "none") + labs(x = NULL, y = "|Elasticity|") + coord_flip() p_var <- aggregated_results |> ggplot(aes(x = var_label, y = variance_explained, fill = var_label)) + geom_boxplot(alpha = 0.7, outlier.size = 0.5) + scale_fill_brewer(palette = "Set2", guide = "none") + labs(x = NULL, y = "Variance explained (%)") + coord_flip() p_elast + p_var + plot_annotation(title = "Distribution of sensitivity metrics across all parameters and sites") ``` ## Calibration priorities {#sec-priorities} Which parameters would benefit most from new measurements? The priority score $|\bar{\varepsilon}| \times \text{CV}$ identifies parameters that are both highly sensitive and poorly constrained by existing data, where field campaigns or trait synthesis would most efficiently reduce forecast uncertainty. ```{r} #| label: fig-priority #| fig-cap: "Calibration priority: parameters ranked by |elasticity| x CV. Higher values indicate greater return on investment from targeted data collection." #| fig-height: 8 #| fig-width: 10 priority_data <- param_rankings |> filter(!is.na(constraint_priority)) |> slice_max(order_by = constraint_priority, n = 10, by = "response_var") ggplot(priority_data, aes( x = reorder(param_label, constraint_priority), y = constraint_priority )) + geom_col(fill = "steelblue", alpha = 0.8) + coord_flip() + facet_wrap(~var_label, scales = "free") + labs( x = NULL, y = "Priority score (|elasticity| x CV)" ) ``` ## PFT comparisons {#sec-pft} Parameter sensitivity may differ between woody perennial and annual herbaceous plant functional types, reflecting distinct physiological strategies. The figure below compares median signed elasticity for the top 10 parameters across PFTs, faceted by output variable. ```{r} #| label: fig-pft #| fig-cap: "Top 10 parameters compared across PFTs. Bars show median signed elasticity within each PFT." #| fig-height: 8 #| fig-width: 10 if (dplyr::n_distinct(aggregated_results$pft) >= 2) { pft_data <- aggregated_results |> summarize( median_elast = median(elasticity, na.rm = TRUE), median_abs = median(abs(elasticity), na.rm = TRUE), param_label = first(param_label), var_label = first(var_label), .by = c("response_var", "parameter", "pft") ) top10_pft <- pft_data |> summarize(overall_abs = mean(median_abs), .by = "parameter") |> slice_max(order_by = overall_abs, n = 10) |> pull(parameter) pft_data |> filter(parameter %in% top10_pft) |> ggplot(aes( x = reorder(param_label, median_abs), y = median_elast, fill = pft )) + geom_col(position = "dodge", alpha = 0.8) + coord_flip() + facet_wrap(~var_label, scales = "free_x") + scale_fill_brewer(palette = "Set1", name = "PFT") + labs(x = NULL, y = "Median elasticity") } else { cat("Only one PFT present; PFT comparison not applicable.\n") } ``` ------------------------------------------------------------------------ # Environmental Gradient Analysis {#sec-gradients} Does parameter sensitivity vary systematically with climate or soil conditions? If so, calibration strategies should be tailored to local environments rather than applied uniformly. For each parameter-response-gradient combination, we fit an OLS regression of signed elasticity against the environmental covariate and retain relationships with $R^2 > 0.10$. We report $R^2$ rather than p-values, since the number of tests is large and p-values from this many regressions require correction and are better suited for filtering than for inference. The scatter plots below show the strongest gradient relationships. Each panel shows a single parameter-response-gradient combination with a linear fit. ```{r} #| label: fig-gradient-scatter #| fig-cap: "Parameter sensitivity along environmental gradients. Each panel: x = covariate, y = signed elasticity. Lines: OLS fit with 95% CI. Filtered to R-squared > 0.10." #| fig-height: 10 #| fig-width: 12 sig_gradients <- regression_results |> filter(target == "elasticity", r_squared > 0.10) |> slice_max(order_by = r_squared, n = 12) if (nrow(sig_gradients) > 0) { gradient_vars_found <- unique(sig_gradients$gradient_var) # map regression gradient names to site_covariates column names grad_col_map <- c("MAT" = "temp", "MAP" = "precip") site_cov_renamed <- site_covariates for (gn in names(grad_col_map)) { if (grad_col_map[gn] %in% names(site_cov_renamed) && !gn %in% names(site_cov_renamed)) { site_cov_renamed[[gn]] <- site_cov_renamed[[grad_col_map[gn]]] } } site_cov_long <- site_cov_renamed |> tidyr::pivot_longer( cols = any_of(gradient_vars_found), names_to = "gradient_var", values_to = "gradient_value" ) |> select(site_id, gradient_var, gradient_value) scatter_data <- aggregated_results |> inner_join( sig_gradients |> select(parameter, response_var, gradient_var, r_squared), by = c("parameter", "response_var"), relationship = "many-to-many" ) |> inner_join(site_cov_long, by = c("site_id", "gradient_var")) |> mutate( grad_label = dplyr::case_when( gradient_var == "MAT" ~ "Mean Annual Temp.", gradient_var == "MAP" ~ "Mean Annual Precip.", gradient_var == "clay" ~ "Clay Content", gradient_var == "ocd" ~ "Organic C Density", gradient_var == "twi" ~ "Topographic Wetness", TRUE ~ gradient_var ), facet_label = paste0(label_param(parameter), " \u2192 ", label_variable(response_var), "\n(", grad_label, ", R\u00B2=", round(r_squared, 2), ")") ) if (nrow(scatter_data) > 0) { ggplot(scatter_data, aes(x = gradient_value, y = elasticity)) + geom_point(alpha = 0.6, size = 1.5) + geom_smooth(method = "lm", se = TRUE, color = "steelblue", linewidth = 0.8) + facet_wrap(~facet_label, scales = "free", ncol = 3) + labs( x = "Environmental covariate value", y = "Elasticity (signed)" ) } else { cat("No gradient relationships had matching site covariate data.\n") } } else { cat("No gradient relationships exceeded the R-squared > 0.10 threshold.\n") } ``` - **Positive slope**: sensitivity increases along the gradient (e.g., hotter sites are more sensitive to a temperature-related parameter) - **Negative slope**: sensitivity decreases along the gradient - **Wide scatter**: the gradient only partially explains variation; other factors also matter Note: these regressions assume a linear relationship. Non-linear relationships would be better captured with a global sensitivity approach (see the Global Sensitivity Analysis report). ------------------------------------------------------------------------ # Implications {#sec-insights} Parameter sensitivity patterns reveal underlying model structure and suggest specific strategies for forecast improvement: - **Net Primary Productivity** is dominated by photosynthesis (max photosynthesis rate, temperature optimum) and allocation processes. Leaf-level ecophysiology is the primary control on productivity in SIPNET, consistent with findings from ED model analyses across North American biomes. - **Soil carbon** is controlled by turnover rates (litter turnover, soil organic matter respiration). SIPNET represents soil carbon through first-order decomposition kinetics, so changes to turnover rates propagate directly to pool sizes. - **Aboveground biomass** is sensitive to allocation (wood allocation fraction) and max photosynthesis rate, reflecting that woody biomass depends on both the rate of carbon fixation and how that carbon is partitioned. - **Spatial heterogeneity** in parameter importance across sites confirms that a single global calibration is insufficient. Climate and soil gradients systematically modulate which parameters matter, supporting regionally stratified calibration. ------------------------------------------------------------------------ # Appendix: Technical Details {#sec-technical} ## Sensitivity metrics - **Elasticity** ($\varepsilon$): normalized sensitivity = $\frac{\partial Y}{\partial X} \cdot \frac{X}{Y}$. The sign indicates direction. Absolute elasticity is used for ranking importance. - **Variance explained**: $\frac{\text{Partial Variance}_i}{\sum \text{Partial Variances}} \times 100\%$. Combines model sensitivity with prior parameter uncertainty. - **Coefficient of variation** (CV): $\frac{\sigma}{\mu}$ of the posterior parameter distribution from the Bayesian meta-analysis. Quantifies remaining uncertainty after data constraint. - **Calibration priority**: $|\bar{\varepsilon}| \times \text{CV}$. Parameters scoring highest have both high sensitivity and high uncertainty, making them the best targets for new data. ## Regression caveats The gradient regressions assume linearity and constant variance. With `r n_sites` sites, statistical power is limited. We use $R^2$ as a filter for ecologically meaningful signals rather than relying on p-values, which would require multiple-testing correction across the `r n_gradient_tests` parameter-gradient combinations tested. ## References 1. Dietze, M. C. et al. (2014). A quantitative assessment of a terrestrial biosphere model's data needs across North American biomes. *J. Geophys. Res. Biogeosci.*, 119, 286-300. 2. LeBauer, D. S. et al. (2013). Facilitating feedbacks between field measurements and ecosystem models. *Ecological Monographs*, 83(2), 133-154. 3. Gelman, A. & Dodhia, R. (2002). Let's practice what we preach: Turning tables into graphs. ## Software environment ```{r} #| label: session-info #| code-fold: false sessionInfo() ``` ## Configuration ```{r} #| label: config-summary #| code-fold: false cat("Config file:", here::here("000-config.yml"), "\n") cat("Analysis date:", as.character(Sys.Date()), "\n") cat("Number of sites:", n_sites, "\n") cat("Number of parameters:", n_parameters, "\n") cat("Response variables:", paste(label_variable(sa_variables), collapse = ", "), "\n") grad_labels <- c("MAT" = "Mean Annual Temp.", "MAP" = "Mean Annual Precip.", "clay" = "Clay Content", "ocd" = "Organic C Density", "twi" = "Topographic Wetness") cat("Gradient variables:", paste(ifelse(gradient_vars %in% names(grad_labels), grad_labels[gradient_vars], gradient_vars), collapse = ", "), "\n") ``` ------------------------------------------------------------------------ ::: callout-tip ## Data Availability All datasets are available in `data/aggregated_sensitivity.csv` and related files. A downloadable CSV of the full parameter rankings is available in `data/parameter_rankings.csv`. Analysis code is in `R/local_sensitivity.R`. :::