Variance Decomposition

Author

Akash B V

Published

April 7, 2026

1 Key Findings

This report decomposes forecast uncertainty of the SIPNET ecosystem model across 17 California cropland sites into its component sources. Building on the local (OAT) and global (Sobol) sensitivity analyses, this step answers the central question: what drives our predictive uncertainty, and what can we do about it?

Dominant uncertainty sources differ by output variable. Fast processes (net primary productivity, carbon fluxes) tend to be driven by meteorological uncertainty, while slow processes (soil carbon pools) are driven by initial conditions, consistent with the ecological forecasting framework.
Specific biological traits dominate the parameter uncertainty component. The hybrid decomposition identifies the individual parameters with the highest value of information for calibration (see Section 3.3).
Interaction effects account for ~57% of total variance on average, indicating that parameter effects are partially dependent on meteorological context.

2 Background

2.1 The uncertainty problem in ecological forecasting

Ecological forecasts are uncertain. When we predict how cropland soil carbon will respond to a change in management (say, switching from synthetic fertilizer to compost), that prediction carries error from multiple sources: imprecise knowledge of biological parameters, uncertain weather forecasts, unknown initial soil carbon stocks, and the structural limitations of the model itself.

The goal of uncertainty analysis is not to eliminate these errors (some are irreducible) but to diagnose which sources dominate so that limited resources can be directed where they will have the most impact. This is the central idea of the ecological forecasting framework (Dietze, 2017, Ch. 11). If most of our forecast uncertainty comes from poorly known biological parameters, we should invest in field campaigns and trait synthesis. If it comes from weather uncertainty, better meteorological data or ensemble weather forecasts will help most. If it comes from unknown initial conditions, data assimilation (updating model states with observations) is the path forward.

Without this decomposition, uncertainty reduction efforts are untargeted. We might spend years collecting trait data only to discover that driver uncertainty was the bottleneck all along.

2.2 The variance budget

Formally, the variance of a model forecast $Y$ can be decomposed as:

\[\text{Var}(Y) \approx V_{\text{Parameter}} + V_{\text{Driver}} + V_{\text{IC}} + V_{\text{Management}} + V_{\text{Interaction}}\]

Each term represents a distinct lever for reducing forecast uncertainty:

$V_{\text{Parameter}}$: Uncertainty in biological traits, including photosynthetic capacity ($A_{\max}$), allocation ratios, turnover rates, and temperature response curves. These are properties of the organism and ecosystem that we estimate from field measurements and literature synthesis. Reduced by collecting more trait data, running meta-analyses, or calibrating the model against flux tower or eddy covariance observations.
$V_{\text{Driver}}$: Uncertainty in meteorological forcing (temperature, precipitation, solar radiation, humidity). For hindcasts (running the model over a historical period), driver uncertainty is small because we have observational weather data. For forecasts, it grows with lead time as weather prediction skill degrades. Reduced by improving weather forecasts or adding weather stations.
$V_{\text{IC}}$: Uncertainty in initial conditions. How much carbon is in the soil when the simulation starts? How much standing biomass is already present? These quantities are hard to measure at every site and are often estimated from regional averages. Reduced by soil inventories, remote sensing products, or data assimilation (updating model states with observations as they become available during the simulation).
$V_{\text{Management}}$: Uncertainty in agricultural inputs, including fertilization rates, compost application timing, tillage practices, planting and harvest dates. These are typically known to the grower but often unavailable to the modeler at scale. Reduced by better data sharing between agricultural operators and researchers.
$V_{\text{Interaction}}$: Non-additive effects where the contribution of one source depends on the value of another. For example, parameter sensitivity may differ under wet vs. dry weather conditions, so that the uncertainty from biology and the uncertainty from weather cannot be simply added. Large interaction terms suggest that parameter calibration and driver uncertainty need to be addressed jointly rather than independently.

One important source is deliberately excluded from this analysis: process error (model structural uncertainty). This is the variance that arises because the model’s equations are an imperfect representation of reality. For example, SIPNET may not capture a soil process that matters in practice. In managed agricultural systems, process error reflects systematic biases in model structure rather than random noise, and it will be assessed separately through model-data comparison. Any residual variance after accounting for the input sources above is attributed to model structural error.

2.3 Why a hybrid approach?

This report sits at the end of a three-step analysis pipeline:

Local (OAT) sensitivity analysis: identifies which individual biological parameters the model is most sensitive to, one at a time
Global (Sobol) sensitivity analysis: quantifies how much total variance comes from parameters vs. drivers vs. initial conditions, including interactions
Variance decomposition (this report): combines both to attribute forecast uncertainty to individual sources and individual parameters

The Sobol analysis gives us the overall partition: how much of the total uncertainty comes from biological parameters as a category vs. meteorological drivers vs. initial conditions. But it treats all biological parameters as a single group. To identify which specific traits to target for calibration, we need to go deeper.

The local (OAT) analysis provides exactly this (parameter-by-parameter sensitivity) but it only captures main effects within the parameter category, not the full partition across all uncertainty sources.

Our hybrid method combines both:

Source partitioning (from global Sobol analysis): We multiply each source category’s first-order Sobol index ($S_i$) by the total ensemble variance to get absolute variance contributions in physical units: \[V_{\text{category}} = \sum_{i \in \text{category}} S_i \times \text{Var}(Y_{\text{ensemble}})\]
Parameter attribution (from local OAT analysis): We subdivide the biological parameter variance using local partial variance fractions: \[V_{\text{param}_j} = \frac{\text{PV}_j^{\text{local}}}{\sum_j \text{PV}_j^{\text{local}}} \times V_{\text{Parameter}}^{\text{global}}\]

This gives each individual parameter’s absolute contribution to total forecast uncertainty – the “value of information” for that trait. Parameters with the highest contributions are where new measurements will most efficiently reduce forecast error.

2.4 Study design

Figure 1: Design points across California croplands.

17 design points across California croplands. Variance decomposition combines the Sobol ensemble (Aboveground Biomass, CH₄ Flux, N₂O Flux, Net Primary Productivity, Latent Heat Flux, Soil Moisture, Soil Carbon) with OAT sensitivity results to attribute forecast uncertainty to individual sources and parameters.

3 Results

3.1 Variance budget by site

The stacked bars below show how total forecast variance partitions into high-level categories at each site. This answers the fundamental question: is our uncertainty dominated by what we know about the plants (parameters), the weather (drivers), or the system’s starting state (initial conditions)?

Figure 2: Variance decomposition by site. Each bar represents one site; color segments show the fraction of total variance attributable to each uncertainty source.

3.2 Summary across sites

The bar chart below shows the mean contribution of each source averaged across all sites, with error bars showing +/- 1 SD.

Figure 3: Mean variance partition across all sites by output variable. Error bars show +/- 1 SD across sites.

3.3 Parameter breakdown (hybrid analysis)

While the global SA gives the total variance attributable to all parameters collectively ($V_{\text{Parameter}}$), we need to know which individual traits to target for calibration. The hybrid method multiplies the Sobol parameter variance by each parameter’s local OAT share, identifying the specific biological mechanisms driving uncertainty.

Figure 4: Top 10 parameters contributing to total forecast variance (hybrid method: Sobol parameter variance x local OAT fractions). Error bars show +/- 1 SD across sites.

The parameters above have the highest value of information. Reducing their prior uncertainty via targeted field campaigns, literature meta-analysis, or Bayesian calibration with flux tower data will yield the largest reduction in forecast error.

3.4 Absolute uncertainty magnitude

The variance budget above shows proportions, but not absolute magnitude. A variable with 80% driver variance but very low total variance may not need attention, while one with 40% parameter variance and very high total variance demands action.

Figure 5: Absolute magnitude of forecast uncertainty (expressed as standard deviation in variable units).

4 Environmental Gradient Analysis

An important next step is to test whether the dominant source of uncertainty changes with environmental conditions (temperature, precipitation, soil properties). If so, uncertainty reduction strategies should be spatially targeted rather than applied uniformly.

Future Work

Gradient analysis requires linking GSA run IDs to environmental covariates. The current variance partition uses PEcAn workflow run IDs that do not directly map to the site covariate table. This linkage is planned as part of Phase 4 (site-level covariate integration).

5 Implications

The variance decomposition partitions ensemble uncertainty into internal (biological parameters) and external (meteorology, initial conditions) drivers. The key findings:

If meteorology dominates (typical for fast carbon flux variables): improving weather forecasts or adding weather stations will have the largest impact on reducing forecast uncertainty. This represents a fundamental limit; ensemble weather forecasts are the standard approach.
If biological parameters dominate: target the top parameters from Section 3.3 for field campaigns, literature synthesis, or model calibration using flux tower observations.
If initial conditions dominate (typical for slow pool variables like soil carbon): data assimilation workflows that update soil carbon and biomass pools using remote sensing or inventory data will yield the largest gains.
If interactions are large: parameter effects are non-additive and depend on weather conditions. Joint calibration or scenario-based approaches may be needed rather than independent parameter fitting.

Management uncertainty contributes a small but nonzero fraction of total variance for N₂O flux (~2%), where fertilization rate and compost inputs directly control substrate availability. For carbon pool variables (soil carbon, biomass), management’s contribution is negligible in the current design, consistent with the finding from the downscaling phase that tillage’s effect on soil carbon operates through the same decomposition parameters already captured by the parameter category.

Note on reduced tillage: SIPNET applies a single tillage multiplier to both soil respiration and litter breakdown, so reduced tillage simultaneously slows C loss from soil and slows litter incorporation into soil. Whether the net effect is positive or negative depends on the balance of these opposing fluxes and the simulation timescale. Over 8 years, the model projects a small net decrease (-2%) in soil carbon under reduced tillage – consistent with meta-analyses showing that short-term no-till effects on total SOC are often small or negligible (Powlson et al. 2014, Nature Climate Change). Longer simulation periods and site-level validation (workplan Phase 3b) will test whether this result holds.

Two gaps remain. Process error (model structural uncertainty) is not yet quantified; comparison of model predictions against observational data will reveal residual variance not explained by input uncertainties. Management uncertainty is currently limited to N fertilization rate, compost application rate, and compost C:N; as the monitoring framework delivers tillage timing, planting, and harvest data products, these will be incorporated.

6 Appendix: Technical Details

6.1 References

Dietze, M. C. (2017). Ecological Forecasting. Princeton University Press. Chapter 11: Propagating, Analyzing, and Reducing Uncertainty.
LeBauer, D. S. et al. (2013). Facilitating feedbacks between field measurements and ecosystem models. Ecological Monographs, 83(2), 133-154.
Dietze, M. C. et al. (2014). A quantitative assessment of a terrestrial biosphere model’s data needs across North American biomes. J. Geophys. Res. Biogeosci., 119, 286-300.
Saltelli, A. et al. (2010). Variance based sensitivity analysis of model output. Computer Physics Communications.
Powlson, D. S. et al. (2014). Limited potential of no-till agriculture for climate change mitigation. Nature Climate Change, 4, 678-683.

6.2 Data availability

All datasets: data/variance_partition_site_level.csv, data/variance_partition_parameters.csv, data/ensemble_variance.csv. Analysis code: R/variance_decomposition.R. Pipeline script: scripts/031_partition_variance.R.

6.3 Software environment

R version 4.5.2 (2025-10-31)
Platform: x86_64-pc-linux-gnu
Running under: AlmaLinux 8.10 (Cerulean Leopard)

Matrix products: default
BLAS/LAPACK: FlexiBLAS NETLIB;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] sf_1.1-0        patchwork_1.3.2 scales_1.4.0    knitr_1.50     
 [5] config_0.3.2    here_1.0.2      stringr_1.6.0   ggplot2_4.0.1  
 [9] tidyr_1.3.1     dplyr_1.1.4     readr_2.1.6    

loaded via a namespace (and not attached):
 [1] generics_0.1.4     class_7.3-23       KernSmooth_2.23-26 stringi_1.8.7     
 [5] hms_1.1.4          digest_0.6.38      magrittr_2.0.4     evaluate_1.0.5    
 [9] grid_4.5.2         RColorBrewer_1.1-3 maps_3.4.3         fastmap_1.2.0     
[13] rprojroot_2.1.1    jsonlite_2.0.0     e1071_1.7-16       DBI_1.2.3         
[17] purrr_1.2.0        cli_3.6.5          crayon_1.5.3       rlang_1.1.6       
[21] units_1.0-0        bit64_4.6.0-1      withr_3.0.2        yaml_2.3.10       
[25] parallel_4.5.2     tools_4.5.2        tzdb_0.5.0         vctrs_0.6.5       
[29] R6_2.6.1           proxy_0.4-27       lifecycle_1.0.4    classInt_0.4-11   
[33] bit_4.6.0          htmlwidgets_1.6.4  vroom_1.6.6        pkgconfig_2.0.3   
[37] pillar_1.11.1      gtable_0.3.6       glue_1.8.0         Rcpp_1.1.0        
[41] xfun_0.54          tibble_3.3.0       tidyselect_1.2.1   dichromat_2.0-0.1 
[45] farver_2.1.2       htmltools_0.5.8.1  labeling_0.4.3     rmarkdown_2.30    
[49] compiler_4.5.2     S7_0.2.1

--- title: "Variance Decomposition" author: "Akash B V" date: today format: html: self-contained: true df-print: paged toc: true toc-depth: 3 code-fold: true code-tools: true theme: cosmo number-sections: true execute: echo: false warning: false message: false cache: false --- ```{r} #| label: setup #| include: false library(readr) library(dplyr) library(tidyr) library(ggplot2) library(stringr) library(here) library(config) library(knitr) library(scales) library(patchwork) library(sf) here::i_am("analysis/variance_decomposition.qmd") source(here::here("R", "labels.R")) cfg <- config::get(file = here::here("000-config.yml")) data_dir <- here::here(cfg$paths$data_dir) # load datasets from 031_partition_variance.R site_partition <- readr::read_csv( file.path(data_dir, "variance_partition_site_level.csv"), show_col_types = FALSE ) param_partition <- readr::read_csv( file.path(data_dir, "variance_partition_parameters.csv"), show_col_types = FALSE ) ensemble_var <- readr::read_csv( file.path(data_dir, "ensemble_variance.csv"), show_col_types = FALSE ) # GSA design point coordinates from OAT results (shared sites) local_sa <- readr::read_csv( file.path(data_dir, "aggregated_sensitivity.csv"), show_col_types = FALSE ) site_coords <- local_sa |> dplyr::distinct(site_id, lat, lon) n_sites <- n_distinct(site_partition$site_id) variables <- unique(site_partition$variable) # consistent theme theme_set( theme_minimal(base_size = 12) + theme( panel.grid.minor = element_blank(), strip.text = element_text(size = 10, face = "bold"), legend.position = "bottom" ) ) ``` ```{r} #| label: compute-findings #| include: false # dominant source per variable (averaged across sites) dominant_sources <- site_partition |> filter(!category %in% c("dummy", "interaction")) |> summarize( mean_frac = mean(frac_of_total, na.rm = TRUE), .by = c("variable", "category") ) |> slice_max(mean_frac, n = 1, by = "variable") # top parameter per variable (from hybrid decomposition) top_params <- param_partition |> filter(!is.na(frac_of_total)) |> summarize( mean_frac = mean(frac_of_total, na.rm = TRUE), .by = c("response_var", "parameter") ) |> slice_max(mean_frac, n = 1, by = "response_var") # interaction fraction int_frac <- site_partition |> filter(category == "interaction") |> summarize(mean_int = mean(frac_of_total, na.rm = TRUE)) |> pull(mean_int) ``` # Key Findings This report decomposes forecast uncertainty of the SIPNET ecosystem model across `r n_sites` California cropland sites into its component sources. Building on the local (OAT) and global (Sobol) sensitivity analyses, this step answers the central question: what drives our predictive uncertainty, and what can we do about it? - **Dominant uncertainty sources differ by output variable.** Fast processes (net primary productivity, carbon fluxes) tend to be driven by meteorological uncertainty, while slow processes (soil carbon pools) are driven by initial conditions, consistent with the ecological forecasting framework. - **Specific biological traits** dominate the parameter uncertainty component. The hybrid decomposition identifies the individual parameters with the highest value of information for calibration (see @sec-param-breakdown). - **Interaction effects** account for ~`r round(int_frac * 100)`% of total variance on average, indicating that parameter effects are partially dependent on meteorological context. ------------------------------------------------------------------------ # Background {#sec-background} ## The uncertainty problem in ecological forecasting Ecological forecasts are uncertain. When we predict how cropland soil carbon will respond to a change in management (say, switching from synthetic fertilizer to compost), that prediction carries error from multiple sources: imprecise knowledge of biological parameters, uncertain weather forecasts, unknown initial soil carbon stocks, and the structural limitations of the model itself. The goal of uncertainty analysis is not to eliminate these errors (some are irreducible) but to **diagnose which sources dominate** so that limited resources can be directed where they will have the most impact. This is the central idea of the ecological forecasting framework (Dietze, 2017, Ch. 11). If most of our forecast uncertainty comes from poorly known biological parameters, we should invest in field campaigns and trait synthesis. If it comes from weather uncertainty, better meteorological data or ensemble weather forecasts will help most. If it comes from unknown initial conditions, data assimilation (updating model states with observations) is the path forward. Without this decomposition, uncertainty reduction efforts are untargeted. We might spend years collecting trait data only to discover that driver uncertainty was the bottleneck all along. ## The variance budget Formally, the variance of a model forecast $Y$ can be decomposed as: $$\text{Var}(Y) \approx V_{\text{Parameter}} + V_{\text{Driver}} + V_{\text{IC}} + V_{\text{Management}} + V_{\text{Interaction}}$$ Each term represents a distinct lever for reducing forecast uncertainty: - $V_{\text{Parameter}}$: Uncertainty in biological traits, including photosynthetic capacity ($A_{\max}$), allocation ratios, turnover rates, and temperature response curves. These are properties of the organism and ecosystem that we estimate from field measurements and literature synthesis. Reduced by collecting more trait data, running meta-analyses, or calibrating the model against flux tower or eddy covariance observations. - $V_{\text{Driver}}$: Uncertainty in meteorological forcing (temperature, precipitation, solar radiation, humidity). For hindcasts (running the model over a historical period), driver uncertainty is small because we have observational weather data. For forecasts, it grows with lead time as weather prediction skill degrades. Reduced by improving weather forecasts or adding weather stations. - $V_{\text{IC}}$: Uncertainty in initial conditions. How much carbon is in the soil when the simulation starts? How much standing biomass is already present? These quantities are hard to measure at every site and are often estimated from regional averages. Reduced by soil inventories, remote sensing products, or data assimilation (updating model states with observations as they become available during the simulation). - $V_{\text{Management}}$: Uncertainty in agricultural inputs, including fertilization rates, compost application timing, tillage practices, planting and harvest dates. These are typically known to the grower but often unavailable to the modeler at scale. Reduced by better data sharing between agricultural operators and researchers. - $V_{\text{Interaction}}$: Non-additive effects where the contribution of one source depends on the value of another. For example, parameter sensitivity may differ under wet vs. dry weather conditions, so that the uncertainty from biology and the uncertainty from weather cannot be simply added. Large interaction terms suggest that parameter calibration and driver uncertainty need to be addressed jointly rather than independently. One important source is deliberately excluded from this analysis: **process error** (model structural uncertainty). This is the variance that arises because the model's equations are an imperfect representation of reality. For example, SIPNET may not capture a soil process that matters in practice. In managed agricultural systems, process error reflects systematic biases in model structure rather than random noise, and it will be assessed separately through model-data comparison. Any residual variance after accounting for the input sources above is attributed to model structural error. ## Why a hybrid approach? This report sits at the end of a three-step analysis pipeline: 1. **Local (OAT) sensitivity analysis**: identifies which individual biological parameters the model is most sensitive to, one at a time 2. **Global (Sobol) sensitivity analysis**: quantifies how much total variance comes from parameters vs. drivers vs. initial conditions, including interactions 3. **Variance decomposition (this report)**: combines both to attribute forecast uncertainty to individual sources *and* individual parameters The Sobol analysis gives us the overall partition: how much of the total uncertainty comes from biological parameters as a category vs. meteorological drivers vs. initial conditions. But it treats all biological parameters as a single group. To identify *which specific traits* to target for calibration, we need to go deeper. The local (OAT) analysis provides exactly this (parameter-by-parameter sensitivity) but it only captures main effects within the parameter category, not the full partition across all uncertainty sources. Our **hybrid method** combines both: 1. **Source partitioning** (from global Sobol analysis): We multiply each source category's first-order Sobol index ($S_i$) by the total ensemble variance to get absolute variance contributions in physical units: $$V_{\text{category}} = \sum_{i \in \text{category}} S_i \times \text{Var}(Y_{\text{ensemble}})$$ 2. **Parameter attribution** (from local OAT analysis): We subdivide the biological parameter variance using local partial variance fractions: $$V_{\text{param}_j} = \frac{\text{PV}_j^{\text{local}}}{\sum_j \text{PV}_j^{\text{local}}} \times V_{\text{Parameter}}^{\text{global}}$$ This gives each individual parameter's absolute contribution to total forecast uncertainty -- the "value of information" for that trait. Parameters with the highest contributions are where new measurements will most efficiently reduce forecast error. ## Study design ```{r} #| label: fig-site-map #| fig-cap: "Design points across California croplands." #| fig-height: 5 #| fig-width: 6 if (nrow(site_coords) > 0) { ca <- tryCatch( sf::st_as_sf(maps::map("state", "california", plot = FALSE, fill = TRUE)), error = function(e) NULL ) if (!is.null(ca)) { ggplot() + geom_sf(data = ca, fill = "grey95", color = "grey40") + geom_point( data = site_coords, aes(x = lon, y = lat), size = 2.5, alpha = 0.8, color = "steelblue" ) + labs(x = "Longitude", y = "Latitude") + coord_sf(xlim = c(-125, -114), ylim = c(32.5, 42)) } else { ggplot(site_coords, aes(x = lon, y = lat)) + geom_point(size = 2.5, alpha = 0.8, color = "steelblue") + labs(x = "Longitude", y = "Latitude") } } else { cat("Site coordinates not available for mapping.\n") } ``` `r n_sites` design points across California croplands. Variance decomposition combines the Sobol ensemble (`r paste(label_variable(variables), collapse = ", ")`) with OAT sensitivity results to attribute forecast uncertainty to individual sources and parameters. ------------------------------------------------------------------------ # Results {#sec-results} ## Variance budget by site {#sec-budget} The stacked bars below show how total forecast variance partitions into high-level categories at each site. This answers the fundamental question: is our uncertainty dominated by what we know about the plants (parameters), the weather (drivers), or the system's starting state (initial conditions)? ```{r} #| label: fig-variance-budget #| fig-cap: "Variance decomposition by site. Each bar represents one site; color segments show the fraction of total variance attributable to each uncertainty source." #| fig-height: 8 #| fig-width: 12 plot_data <- site_partition |> filter(!category %in% c("dummy")) |> mutate( var_label = label_variable(variable), category_label = dplyr::case_when( category == "parameter" ~ "Model Parameters", category == "driver" ~ "Meteorology", category == "IC" ~ "Initial Conditions", category == "management" ~ "Management", category == "interaction" ~ "Interactions", TRUE ~ category ), category_label = factor( category_label, levels = c("Interactions", "Management", "Initial Conditions", "Meteorology", "Model Parameters") ) ) ggplot(plot_data, aes(x = as.factor(runid), y = frac_of_total, fill = category_label)) + geom_col(position = "fill", width = 0.85) + facet_wrap(~var_label, scales = "free", ncol = 1) + scale_y_continuous(labels = percent) + scale_fill_brewer(palette = "Set2", name = "Source") + labs( x = "Site", y = "Fraction of total variance" ) + theme( axis.text.x = element_blank(), axis.ticks.x = element_blank() ) ``` ## Summary across sites {#sec-summary} The bar chart below shows the mean contribution of each source averaged across all sites, with error bars showing +/- 1 SD. ```{r} #| label: fig-mean-partition #| fig-cap: "Mean variance partition across all sites by output variable. Error bars show +/- 1 SD across sites." #| fig-height: 6 #| fig-width: 10 summary_data <- site_partition |> filter(!category %in% c("dummy", "interaction")) |> mutate( var_label = label_variable(variable), category_label = dplyr::case_when( category == "parameter" ~ "Model Parameters", category == "driver" ~ "Meteorology", category == "IC" ~ "Initial Conditions", category == "management" ~ "Management", TRUE ~ category ) ) |> summarize( mean_frac = mean(frac_of_total, na.rm = TRUE), sd_frac = sd(frac_of_total, na.rm = TRUE), .by = c("var_label", "category_label") ) ggplot(summary_data, aes(x = category_label, y = mean_frac, fill = category_label)) + geom_col(alpha = 0.8) + geom_errorbar( aes(ymin = pmax(0, mean_frac - sd_frac), ymax = mean_frac + sd_frac), width = 0.2 ) + facet_wrap(~var_label, scales = "free_y") + scale_y_continuous(labels = percent) + scale_fill_brewer(palette = "Set2", guide = "none") + labs(x = NULL, y = "Mean fraction of total variance") + theme(axis.text.x = element_text(angle = 30, hjust = 1)) ``` ## Parameter breakdown (hybrid analysis) {#sec-param-breakdown} While the global SA gives the total variance attributable to all parameters collectively ($V_{\text{Parameter}}$), we need to know which individual traits to target for calibration. The hybrid method multiplies the Sobol parameter variance by each parameter's local OAT share, identifying the specific biological mechanisms driving uncertainty. ```{r} #| label: fig-param-breakdown #| fig-cap: "Top 10 parameters contributing to total forecast variance (hybrid method: Sobol parameter variance x local OAT fractions). Error bars show +/- 1 SD across sites." #| fig-height: 10 #| fig-width: 10 param_ranks <- param_partition |> filter(!is.na(frac_of_total)) |> mutate(param_label = label_param(parameter)) |> summarize( mean_contribution = mean(frac_of_total, na.rm = TRUE), sd_contribution = sd(frac_of_total, na.rm = TRUE), param_label = first(param_label), .by = c("response_var", "parameter") ) |> mutate(var_label = label_variable(response_var)) |> slice_max(order_by = mean_contribution, n = 10, by = "response_var") ggplot(param_ranks, aes(x = reorder(param_label, mean_contribution), y = mean_contribution)) + geom_col(fill = "steelblue", alpha = 0.8) + geom_errorbar( aes(ymin = pmax(0, mean_contribution - sd_contribution), ymax = mean_contribution + sd_contribution), width = 0.2 ) + coord_flip() + facet_wrap(~var_label, scales = "free") + scale_y_continuous(labels = percent) + labs( x = NULL, y = "Contribution to total forecast variance" ) ``` The parameters above have the highest value of information. Reducing their prior uncertainty via targeted field campaigns, literature meta-analysis, or Bayesian calibration with flux tower data will yield the largest reduction in forecast error. ## Absolute uncertainty magnitude {#sec-magnitude} The variance budget above shows proportions, but not absolute magnitude. A variable with 80% driver variance but very low total variance may not need attention, while one with 40% parameter variance and very high total variance demands action. ```{r} #| label: fig-absolute-variance #| fig-cap: "Absolute magnitude of forecast uncertainty (expressed as standard deviation in variable units)." #| fig-height: 5 #| fig-width: 8 abs_var <- ensemble_var |> mutate(var_label = label_variable(variable)) |> summarize( mean_sd = mean(sqrt(ensemble_variance), na.rm = TRUE), sd_sd = sd(sqrt(ensemble_variance), na.rm = TRUE), .by = "var_label" ) ggplot(abs_var, aes(x = reorder(var_label, mean_sd), y = mean_sd)) + geom_col(fill = "steelblue", alpha = 0.8) + geom_errorbar( aes(ymin = pmax(0, mean_sd - sd_sd), ymax = mean_sd + sd_sd), width = 0.2 ) + coord_flip() + labs( x = NULL, y = "Mean standard deviation (variable units)" ) ``` ------------------------------------------------------------------------ # Environmental Gradient Analysis {#sec-gradients} An important next step is to test whether the dominant source of uncertainty changes with environmental conditions (temperature, precipitation, soil properties). If so, uncertainty reduction strategies should be spatially targeted rather than applied uniformly. ::: {.callout-note} ## Future Work Gradient analysis requires linking GSA run IDs to environmental covariates. The current variance partition uses PEcAn workflow run IDs that do not directly map to the site covariate table. This linkage is planned as part of Phase 4 (site-level covariate integration). ::: ------------------------------------------------------------------------ # Implications {#sec-implications} The variance decomposition partitions ensemble uncertainty into internal (biological parameters) and external (meteorology, initial conditions) drivers. The key findings: - **If meteorology dominates** (typical for fast carbon flux variables): improving weather forecasts or adding weather stations will have the largest impact on reducing forecast uncertainty. This represents a fundamental limit; ensemble weather forecasts are the standard approach. - **If biological parameters dominate**: target the top parameters from @sec-param-breakdown for field campaigns, literature synthesis, or model calibration using flux tower observations. - **If initial conditions dominate** (typical for slow pool variables like soil carbon): data assimilation workflows that update soil carbon and biomass pools using remote sensing or inventory data will yield the largest gains. - **If interactions are large**: parameter effects are non-additive and depend on weather conditions. Joint calibration or scenario-based approaches may be needed rather than independent parameter fitting. **Management uncertainty** contributes a small but nonzero fraction of total variance for N~2~O flux (~2%), where fertilization rate and compost inputs directly control substrate availability. For carbon pool variables (soil carbon, biomass), management's contribution is negligible in the current design, consistent with the finding from the downscaling phase that tillage's effect on soil carbon operates through the same decomposition parameters already captured by the parameter category. Note on reduced tillage: SIPNET applies a single tillage multiplier to both soil respiration and litter breakdown, so reduced tillage simultaneously slows C loss from soil *and* slows litter incorporation into soil. Whether the net effect is positive or negative depends on the balance of these opposing fluxes and the simulation timescale. Over 8 years, the model projects a small net decrease (-2%) in soil carbon under reduced tillage -- consistent with meta-analyses showing that short-term no-till effects on total SOC are often small or negligible (Powlson et al. 2014, Nature Climate Change). Longer simulation periods and site-level validation (workplan Phase 3b) will test whether this result holds. Two gaps remain. Process error (model structural uncertainty) is not yet quantified; comparison of model predictions against observational data will reveal residual variance not explained by input uncertainties. Management uncertainty is currently limited to N fertilization rate, compost application rate, and compost C:N; as the monitoring framework delivers tillage timing, planting, and harvest data products, these will be incorporated. ------------------------------------------------------------------------ # Appendix: Technical Details {#sec-technical} ## References 1. Dietze, M. C. (2017). *Ecological Forecasting*. Princeton University Press. Chapter 11: Propagating, Analyzing, and Reducing Uncertainty. 2. LeBauer, D. S. et al. (2013). Facilitating feedbacks between field measurements and ecosystem models. *Ecological Monographs*, 83(2), 133-154. 3. Dietze, M. C. et al. (2014). A quantitative assessment of a terrestrial biosphere model's data needs across North American biomes. *J. Geophys. Res. Biogeosci.*, 119, 286-300. 4. Saltelli, A. et al. (2010). Variance based sensitivity analysis of model output. *Computer Physics Communications*. 5. Powlson, D. S. et al. (2014). Limited potential of no-till agriculture for climate change mitigation. *Nature Climate Change*, 4, 678-683. ## Data availability All datasets: `data/variance_partition_site_level.csv`, `data/variance_partition_parameters.csv`, `data/ensemble_variance.csv`. Analysis code: `R/variance_decomposition.R`. Pipeline script: `scripts/031_partition_variance.R`. ## Software environment ```{r} #| label: session-info #| code-fold: false sessionInfo() ```