Method testing with spesim • spesim

spesim is designed for teaching and method testing.

Method testing means: you keep the underlying truth fixed (the simulated community + spatial process), and vary the sampling design or analysis choices, then check how your conclusions change.

This vignette shows a simple, reproducible workflow:

Run a baseline simulation.
Re-run the same truth under different sampling schemes.
Compare (i) what you would conclude from the usual plots/tables and (ii) what the audit layer says you actually got.

1. Start from a baseline parameter set

We start from the packaged init file and then modify only a few parameters.

`%||%` <- function(a, b) if (!is.null(a)) a else b

init_file <- system.file("examples/spesim_init_basic.txt", package = "spesim")
if (identical(init_file, "")) {
  # Useful when rendering from the source tree without a prior install.
  # During local rendering, the working directory is typically `vignettes/`.
  cand <- c(
    file.path("..", "inst", "examples", "spesim_init_basic.txt"),
    file.path("..", "spesim_init_basic.txt")
  )
  init_file <- cand[file.exists(cand)][1]
}

P0 <- load_config(init_file)
#> ========== INITIALISING SPATIAL SAMPLING SIMULATION ==========

# Keep this vignette reasonably fast.
P0$N_INDIVIDUALS <- 1500L
P0$N_QUADRATS <- 25L

# Ensure we have filtering species (default init already does), and keep the report light.
P0$ADVANCED_ANALYSIS <- FALSE

# Fix seed so the *truth* is reproducible.
P0$SEED <- 123L

2. A helper to run the same truth under different sampling schemes

spesim_method_test() is a convenience wrapper that returns: - res: the simulation result - audit: a lightweight audit of the realised regime - optional standard plots and tables

run_scheme <- function(P, scheme) {
  P2 <- P
  P2$SAMPLING_SCHEME <- scheme

  # Keep audit deterministic and dependency-minimal for vignettes.
  spesim_method_test(P = P2, diagnostics = "nn", make_tables = TRUE, make_plots = FALSE)
}

schemes <- c("random", "systematic", "tiled", "transect", "voronoi")
outs <- setNames(lapply(schemes, function(s) run_scheme(P0, s)), schemes)
#> Note: no non-1 interactions for species: A, B, C, D, E, F, G, H, I, J.
#> ---- Interactions Summary ----
#> Species: 10 (A..J)
#> Radius : 0
#> Non-1 entries: 0 (0.0% of 100)
#> 
#> Matrix ('.' = 1):
#>   A B C D E F G H I J
#> A . . . . . . . . . .
#> B . . . . . . . . . .
#> C . . . . . . . . . .
#> D . . . . . . . . . .
#> E . . . . . . . . . .
#> F . . . . . . . . . .
#> G . . . . . . . . . .
#> H . . . . . . . . . .
#> I . . . . . . . . . .
#> J . . . . . . . . . .
#> ------------------------------
#> Note: no non-1 interactions for species: A, B, C, D, E, F, G, H, I, J.
#> ---- Interactions Summary ----
#> Species: 10 (A..J)
#> Radius : 0
#> Non-1 entries: 0 (0.0% of 100)
#> 
#> Matrix ('.' = 1):
#>   A B C D E F G H I J
#> A . . . . . . . . . .
#> B . . . . . . . . . .
#> C . . . . . . . . . .
#> D . . . . . . . . . .
#> E . . . . . . . . . .
#> F . . . . . . . . . .
#> G . . . . . . . . . .
#> H . . . . . . . . . .
#> I . . . . . . . . . .
#> J . . . . . . . . . .
#> ------------------------------
#> Note: no non-1 interactions for species: A, B, C, D, E, F, G, H, I, J.
#> ---- Interactions Summary ----
#> Species: 10 (A..J)
#> Radius : 0
#> Non-1 entries: 0 (0.0% of 100)
#> 
#> Matrix ('.' = 1):
#>   A B C D E F G H I J
#> A . . . . . . . . . .
#> B . . . . . . . . . .
#> C . . . . . . . . . .
#> D . . . . . . . . . .
#> E . . . . . . . . . .
#> F . . . . . . . . . .
#> G . . . . . . . . . .
#> H . . . . . . . . . .
#> I . . . . . . . . . .
#> J . . . . . . . . . .
#> ------------------------------
#> Note: no non-1 interactions for species: A, B, C, D, E, F, G, H, I, J.
#> ---- Interactions Summary ----
#> Species: 10 (A..J)
#> Radius : 0
#> Non-1 entries: 0 (0.0% of 100)
#> 
#> Matrix ('.' = 1):
#>   A B C D E F G H I J
#> A . . . . . . . . . .
#> B . . . . . . . . . .
#> C . . . . . . . . . .
#> D . . . . . . . . . .
#> E . . . . . . . . . .
#> F . . . . . . . . . .
#> G . . . . . . . . . .
#> H . . . . . . . . . .
#> I . . . . . . . . . .
#> J . . . . . . . . . .
#> ------------------------------
#> Note: no non-1 interactions for species: A, B, C, D, E, F, G, H, I, J.
#> ---- Interactions Summary ----
#> Species: 10 (A..J)
#> Radius : 0
#> Non-1 entries: 0 (0.0% of 100)
#> 
#> Matrix ('.' = 1):
#>   A B C D E F G H I J
#> A . . . . . . . . . .
#> B . . . . . . . . . .
#> C . . . . . . . . . .
#> D . . . . . . . . . .
#> E . . . . . . . . . .
#> F . . . . . . . . . .
#> G . . . . . . . . . .
#> H . . . . . . . . . .
#> I . . . . . . . . . .
#> J . . . . . . . . . .
#> ------------------------------
#> Warning in place_quadrats_voronoi(domain, P$N_QUADRATS, P$QUADRAT_SIZE, :
#> Voronoi placement found 17 suitable locations; using all.

3. What did the sampling design actually do?

Sampling schemes differ in how much they are constrained by the boundary and by the non-overlap rule.

sampling_summary <- bind_rows(lapply(names(outs), function(s) {
  a <- outs[[s]]$audit
  data.frame(
    scheme = s,
    n_requested = a$sampling$n_requested %||% NA_integer_,
    n_returned  = a$sampling$n_returned  %||% NA_integer_,
    reject_boundary = a$sampling$reject_boundary %||% NA_integer_,
    reject_overlap  = a$sampling$reject_overlap  %||% NA_integer_,
    safe_area_fraction = a$sampling$safe_area_fraction %||% NA_real_
  )
}))

sampling_summary
#>       scheme n_requested n_returned reject_boundary reject_overlap
#> 1     random          25         25              45             22
#> 2 systematic          25         12              NA             NA
#> 3      tiled          25         25              NA             NA
#> 4   transect           8          8              NA             NA
#> 5    voronoi          17         17              NA             NA
#>   safe_area_fraction
#> 1          0.6908764
#> 2          0.6908764
#> 3          0.6908764
#> 4          0.6908764
#> 5          0.6908764

Interpretation tips:

safe_area_fraction is a conservative estimate of how much of the domain is a feasible sampling frame for quadrat centres.
High reject_boundary or low safe_area_fraction means your design is effectively sampling the interior only.

4. Compare how the same truth looks under different schemes

A small, practical comparison is to look at a few common summaries.

4.1 Mean richness and abundance per quadrat

quadrat_summaries <- bind_rows(lapply(names(outs), function(s) {
  res <- outs[[s]]$res
  abund <- res$abund_matrix[, setdiff(colnames(res$abund_matrix), "site"), drop = FALSE]

  data.frame(
    scheme = s,
    mean_richness = mean(rowSums(abund > 0)),
    mean_abundance = mean(rowSums(abund))
  )
}))

quadrat_summaries
#>       scheme mean_richness mean_abundance
#> 1     random      6.040000       15.56000
#> 2 systematic      5.333333       13.41667
#> 3      tiled      5.680000       13.48000
#> 4   transect      5.000000       12.50000
#> 5    voronoi      5.705882       12.05882

4.2 Distance-decay (beta diversity vs distance)

We compute a distance-decay table for each scheme and compare the fitted slope.

slopes <- bind_rows(lapply(names(outs), function(s) {
  res <- outs[[s]]$res
  dd <- calculate_distance_decay(res$abund_matrix, res$site_coords)

  # Simple linear trend (teaching-level summary): dissimilarity ~ distance
  if (nrow(dd) == 0) {
    return(data.frame(scheme = s, slope = NA_real_, r2 = NA_real_))
  }
  fit <- lm(Dissimilarity ~ Distance, data = dd)

  data.frame(
    scheme = s,
    slope = unname(coef(fit)["Distance"]),
    r2 = summary(fit)$r.squared
  )
}))

slopes
#>       scheme        slope           r2
#> 1     random 0.0085537650 3.892777e-02
#> 2 systematic 0.0061305691 1.544999e-02
#> 3      tiled 0.0004024751 9.814643e-05
#> 4   transect 0.0040633095 9.139229e-03
#> 5    voronoi 0.0026702129 5.297278e-03

Method-testing point: If different sampling schemes change the slope materially, your inference about spatial turnover may be design-sensitive.

5. Use the audit layer to prevent over-interpretation

The audit tells you whether the regime you think you requested is visible in the sampled data.

For example, environmental filtering checks can be noisy when a species is sparse.

filtering_summary <- bind_rows(lapply(names(outs), function(s) {
  f <- outs[[s]]$audit$filtering
  if (nrow(f) == 0) return(data.frame())

  # Keep a compact view for the vignette
  f %>%
    mutate(scheme = s) %>%
    select(scheme, species, gradient, n_sites, occupancy_sites, rho, slope, qualitative)
}))

filtering_summary
#>        scheme species    gradient n_sites occupancy_sites        rho      slope
#> 1      random       A temperature      25              23 0.67494289 -2.6004465
#> 2      random       B temperature      25              22 0.54232166 -1.4947431
#> 3      random       C   elevation      25              16 0.71794538 -1.2127847
#> 4      random       D    rainfall      25              16 0.51133167 -0.9813902
#> 5  systematic       A temperature      12              11 0.83953882 -4.1361412
#> 6  systematic       B temperature      12               9 0.49664085 -1.0053614
#> 7  systematic       C   elevation      12               8 0.65746497 -0.8662085
#> 8  systematic       D    rainfall      12               8 0.78889253 -1.4317988
#> 9       tiled       A temperature      25              20 0.87447845 -2.7603278
#> 10      tiled       B temperature      25              19 0.42187079 -1.0752248
#> 11      tiled       C   elevation      25              19 0.67288195 -1.4106395
#> 12      tiled       D    rainfall      25              21 0.53907536 -0.8543505
#> 13   transect       A temperature       8               7 0.86753285 -3.4907501
#> 14   transect       B temperature       8               8 0.04941662 -0.0423553
#> 15   transect       C   elevation       8               4 0.93222724 -1.2215126
#> 16   transect       D    rainfall       8               7 0.28234145 -1.5044391
#> 17    voronoi       A temperature      17              15 0.85011483 -2.0097544
#> 18    voronoi       B temperature      17              13 0.43647549 -1.1226363
#> 19    voronoi       C   elevation      17              13 0.74032507 -1.3365611
#> 20    voronoi       D    rainfall      17              12 0.59144715 -0.6964973
#>                     qualitative
#> 1  abundance_peaks_near_optimum
#> 2  abundance_peaks_near_optimum
#> 3  abundance_peaks_near_optimum
#> 4  abundance_peaks_near_optimum
#> 5  abundance_peaks_near_optimum
#> 6  abundance_peaks_near_optimum
#> 7  abundance_peaks_near_optimum
#> 8  abundance_peaks_near_optimum
#> 9  abundance_peaks_near_optimum
#> 10 abundance_peaks_near_optimum
#> 11 abundance_peaks_near_optimum
#> 12 abundance_peaks_near_optimum
#> 13 abundance_peaks_near_optimum
#> 14            weak_or_no_signal
#> 15         too_sparse_to_assess
#> 16 abundance_peaks_near_optimum
#> 17 abundance_peaks_near_optimum
#> 18 abundance_peaks_near_optimum
#> 19 abundance_peaks_near_optimum
#> 20 abundance_peaks_near_optimum

Interpretation tips:

Prefer filtering interpretations when occupancy_sites is not tiny.
Treat too_sparse_to_assess as a guardrail: it means your sampled data don’t support a strong statement.

6. What to do next

To turn this into a real method-testing exercise:

Repeat the comparison under different domain shapes (e.g., very concave polygons).
Vary quadrat size and number of quadrats.
Hold sampling fixed and vary the spatial process (Poisson vs Strauss vs Thomas).
Use spesim_audit(..., diagnostics = c("nn", "spatstat")) when spatstat is installed for edge-corrected point-process diagnostics.