Benchmarks

kuva uses Criterion for statistical micro-benchmarks. All numbers on this page were collected on a release build (opt-level = 3) on AMD64 Linux. Timing is median wall-clock; HTML reports with per-sample distributions live in target/criterion/ after running.

Running the benchmarks

# All benchmark groups (requires the png + pdf backends to compile cleanly)
cargo bench --features full

# A single group
cargo bench --features full -- render

# HTML report (opens in browser)
open target/criterion/report/index.html

Criterion runs a 3-second warm-up then collects 100 samples per benchmark. The measurement is median time; outlier detection flags any samples more than 2 IQRs from the median.

Benchmark files

filewhat it measures
benches/render.rsFull pipeline (scene build + SVG emit) and SVG-only for scatter, scatter+errorbars, line, violin, manhattan, heatmap
benches/kde.rssimple_kde in isolation at 100 → 100k samples
benches/svg.rsSvgBackend::render_scene alone, N Circle primitives, no rendering pipeline

Results

kde — Gaussian KDE (simple_kde, 256 evaluation points)

The truncated kernel sorts the input once then binary-searches for the window [x ± 4·bw] around each evaluation point. Only values inside that window contribute (Gaussian weight beyond 4σ is < 0.003%).

n samplestime
10085.7 µs
1,000621 µs
10,0004.07 ms
100,00028.0 ms

Scaling: ~6.8× per decade of n. For uniformly distributed data the window captures ~10% of points, giving roughly O(n^0.8 × samples) total work instead of the naive O(n × samples). On bounded data (e.g. sin-shaped violin groups) the window covers ~30% of points, yielding a ~3× improvement over naive.

render — scatter and line

n=100n=1kn=10kn=100kn=1M
scatter scene+svg41 µs314 µs3.10 ms34.5 ms414 ms
scatter svg-only34 µs294 µs3.07 ms31.2 ms317 ms
scatter scene build~7 µs~20 µs~30 µs~3 ms~97 ms
line scene+svg39 µs262 µs2.49 ms28.6 ms308 ms

At 1M points the SVG emit accounts for 317 ms (77% of total). The scene build — coordinate mapping, Vec allocation — costs ~97 ms. SVG generation rate: ~200 ns/element.

render — scatter with error bars

Each error bar adds 3 Line primitives (cap–shaft–cap), so n=100k scatter+errorbars emits 400k primitives vs 100k for plain scatter.

nplain scatterwith y_erroverhead
10041 µs175 µs4.3×
1,000314 µs1.70 ms5.4×
10,0003.10 ms18.8 ms6.1×
100,00034.5 ms222 ms6.5×

render — violin (3 groups)

Violin SVG is cheap — three KDE curves emit ~10 path primitives regardless of n. All time is in KDE.

n per groupscene+svgsvg-onlyKDE cost
100557 µs20 µs~537 µs
1,0002.00 ms19 µs~1.98 ms
10,00012.7 ms25 µs~12.7 ms
100,00089.0 ms34 µs~89 ms

The violin SVG time is flat at ~25 µs across all scales. KDE matches 3× the kde bench values exactly. Practical advice: violin plots are fast at the scales bioinformatics data actually produces (100–5000 points per group); 100k/group is an extreme stress test.

render — manhattan (22 chromosomes)

Pre-bucketing builds a HashMap<chr → Vec<idx>> once before the span loop, reducing chromosome lookups from O(22 × n) to O(n).

n SNPstime
1,000372 µs
10,0003.88 ms
100,00042.0 ms
1,000,000501 ms

Scales linearly: each 10× increase in n costs ~10–12× more time. At 1M SNPs (a full GWAS): 501 ms including scene build and SVG emit. Note: this is render-only; file I/O for a 1M-row TSV adds read time on top.

render — heatmap (n×n matrix)

ncellsno valueswith valuestext overhead
1010061.7 µs114 µs1.85×
502,5001.18 ms2.54 ms2.15×
10010,0004.86 ms10.4 ms2.14×
20040,00024.6 ms
500250,000154 ms

Scales with n² (doubling n quadruples cells). The single-loop merge means show_values costs exactly ~2× no-values — one extra format! and Text primitive per cell.

svg — raw string generation

Measures SvgBackend::render_scene on a scene containing only Circle primitives. Isolates the string formatting cost with no rendering pipeline.

n circlestimens/element
1,000198 µs198
10,0001.99 ms199
100,00019.9 ms199
1,000,000213 ms213

Perfectly linear. String::with_capacity pre-allocation eliminates reallocation variance. ~200 ns/element is the baseline cost of format! through the std::fmt machinery.

Interpretation

SVG string generation dominates at scale. For scatter/line at 1M points, 77% of total time is in SvgBackend::render_scene. Improving the SVG backend (e.g. write-to-file streaming, avoiding format! for simple floats) would have the most impact at extreme scales.

Violin is all KDE. The SVG stage for a violin scene is ~25 µs at any n — it's a handful of path curves. For n > 10k/group, consider whether that resolution is actually needed; downsampling to 10k rarely changes the visible shape.

Manhattan is now O(n). Pre-bucketing reduced chromosome filtering from O(22n) to O(n). A 1M-SNP GWAS renders in ~500 ms end-to-end (render only, excluding file I/O).

Error bars are expensive at scale. 3 extra primitives per point means 4–6× the cost of plain scatter. For large n with error bars, consider downsampling or rendering only bars for significant points.

Heatmap with values is 2×. The single-pass loop keeps the overhead exactly proportional — no wasted traversal.

Optimisations applied

These were identified by profiling before benchmarks existed and confirmed by the benchmark numbers above:

changefileexpected gain
Truncated KDE kernel (sort + binary search)render_utils.rs~8× at 100k uniform data
Manhattan pre-bucketing (HashMap)render.rs~22× at 1M SNPs with 22 chr
Heatmap single-loop + no flat Vecrender.rs~2× for show_values; -1 alloc
String::with_capacity in SVG backendsvg.rseliminates O(log n) reallocs