Performance notes • spesim

spesim is intended for teaching and method testing, where you usually want runs to be interactive (seconds, not minutes).

This vignette gives conservative rules of thumb for keeping runtime and memory use reasonable.

What tends to dominate runtime

Neutral-model simulation loop (when MODEL_FAMILY = "hybrid" or "neutral").
- The birth–death–dispersal loop runs for NEUTRAL_MAX_STEPS iterations. The inner loop is written in R; cost scales with N_INDIVIDUALS and the number of steps to convergence. As of v0.5.2 the main per-step costs are the C++ point-in-polygon check (pip_cpp) and the R interpreter overhead of the loop body itself — the environmental acceptance and displacement sampling steps are now fully vectorised over the candidate batch.
Spatial intersections (sf::st_intersects) between individuals and quadrats.
- Roughly scales with the number of individuals × number of quadrats.
Environmental grid resolution (the env_gradients raster-like table).
- Higher resolution increases memory and slows operations that summarise environment per quadrat.
Optional analyses (reports, panels, diversity calculations).
- ADVANCED_ANALYSIS = TRUE is intentionally heavier.

Safe interactive ranges (rules of thumb)

These are not hard limits—just settings that tend to work well on a laptop.

Individuals

Good default: N_INDIVIDUALS ≈ 1,000–5,000
Still workable for method testing: up to ~20,000 (expect slower plotting/intersections)

Quadrat sampling

Good default: N_QUADRATS ≈ 10–50
Very large N_QUADRATS increases intersection work and can make some analyses noisy.

Environmental grid

SAMPLING_RESOLUTION is the number of grid cells per side for the environmental surface.
Good default: 30–80
Above ~150 you should expect larger memory use and slower quadrat-environment summaries.

Advanced outputs

For interactive work, start with:

P$ADVANCED_ANALYSIS <- FALSE
x <- spesim_method_test(P = P, make_plots = TRUE)

Enable advanced reporting once your parameters are stable:

P$ADVANCED_ANALYSIS <- TRUE
res <- spesim_run(P, write_outputs = TRUE, seed = P$SEED)
cat(generate_full_report(res, include_audit = TRUE), sep = "\n")

Practical tips

When comparing sampling schemes, keep the truth fixed (same seed) and change only sampling parameters.
Prefer running many small replicates over one enormous run.
If a sampling scheme returns very few quadrats, some downstream summaries may be undefined or uninformative (distance–decay requires ≥2 sites, etc.).

If you need to go bigger

Reduce plotting.
Increase N_INDIVIDUALS gradually.
Consider using the fast point-process engines when available.

Engine performance history

Internal profiling (N = 1,500, S = 15, hybrid model) shows the following cumulative improvements since v0.5.1:

Release	Change	Approx. wall time
v0.5.1	C++ PIP engine + vectorised domain checks	17,690 ms
v0.5.2	Vectorised env-acceptance + displacement sampling	2,600 ms

The residual cost is now dominated by the C++ PIP engine and the R interpreter overhead of the simulation loop itself, both of which are at or near their practical floor without a full Rcpp port of the inner loop.