Benchmarks

kuva uses Criterion for statistical micro-benchmarks. All numbers on this page were collected on a release build (opt-level = 3) on AMD64 Linux. Timing is median wall-clock; HTML reports with per-sample distributions live in target/criterion/ after running.

Running the benchmarks

# All benchmark groups (requires the png + pdf backends to compile cleanly)
cargo bench --features full

# A single group
cargo bench --features full -- render

# HTML report (opens in browser)
open target/criterion/report/index.html

Criterion runs a 3-second warm-up then collects 100 samples per benchmark. The measurement is median time; outlier detection flags any samples more than 2 IQRs from the median.

Benchmark files

filewhat it measures
benches/render.rsFull pipeline (scene build + SVG emit) and SVG-only for scatter, scatter+errorbars, line, violin, manhattan, heatmap
benches/kde.rssimple_kde in isolation at 100 → 100k samples
benches/svg.rsSvgBackend::render_scene alone, N Circle primitives, no rendering pipeline

Results

kde — Gaussian KDE (simple_kde, 256 evaluation points)

The truncated kernel sorts the input once then binary-searches for the window [x ± 4·bw] around each evaluation point. Only values inside that window contribute (Gaussian weight beyond 4σ is < 0.003%).

n samplestime
10085.7 µs
1,000621 µs
10,0004.07 ms
100,00028.0 ms

Scaling: ~6.8× per decade of n. For uniformly distributed data the window captures ~10% of points, giving roughly O(n^0.8 × samples) total work instead of the naive O(n × samples). On bounded data (e.g. sin-shaped violin groups) the window covers ~30% of points, yielding a ~3× improvement over naive.

render — scatter and line

n=100n=1kn=10kn=100kn=1M
scatter scene+svg41 µs314 µs3.10 ms34.5 ms414 ms
scatter svg-only34 µs294 µs3.07 ms31.2 ms317 ms
scatter scene build~7 µs~20 µs~30 µs~3 ms~97 ms
line scene+svg39 µs262 µs2.49 ms28.6 ms308 ms

At 1M points the SVG emit accounts for 317 ms (77% of total). The scene build — coordinate mapping, Vec allocation — costs ~97 ms. SVG generation rate: ~200 ns/element.

render — scatter with error bars

Each error bar adds 3 Line primitives (cap–shaft–cap), so n=100k scatter+errorbars emits 400k primitives vs 100k for plain scatter.

nplain scatterwith y_erroverhead
10041 µs175 µs4.3×
1,000314 µs1.70 ms5.4×
10,0003.10 ms18.8 ms6.1×
100,00034.5 ms222 ms6.5×

render — violin (3 groups)

Violin SVG is cheap — three KDE curves emit ~10 path primitives regardless of n. All time is in KDE.

n per groupscene+svgsvg-onlyKDE cost
100557 µs20 µs~537 µs
1,0002.00 ms19 µs~1.98 ms
10,00012.7 ms25 µs~12.7 ms
100,00089.0 ms34 µs~89 ms

The violin SVG time is flat at ~25 µs across all scales. KDE matches 3× the kde bench values exactly. Practical advice: violin plots are fast at the scales bioinformatics data actually produces (100–5000 points per group); 100k/group is an extreme stress test.

render — manhattan (22 chromosomes)

Pre-bucketing builds a HashMap<chr → Vec<idx>> once before the span loop, reducing chromosome lookups from O(22 × n) to O(n).

n SNPstime
1,000372 µs
10,0003.88 ms
100,00042.0 ms
1,000,000501 ms

Scales linearly: each 10× increase in n costs ~10–12× more time. At 1M SNPs (a full GWAS): 501 ms including scene build and SVG emit. Note: this is render-only; file I/O for a 1M-row TSV adds read time on top.

render — heatmap (n×n matrix)

ncellsno valueswith valuestext overhead
1010061.7 µs114 µs1.85×
502,5001.18 ms2.54 ms2.15×
10010,0004.86 ms10.4 ms2.14×
20040,00024.6 ms
500250,000154 ms

Scales with n² (doubling n quadruples cells). The single-loop merge means show_values costs exactly ~2× no-values — one extra format! and Text primitive per cell.

svg — raw string generation

Measures SvgBackend::render_scene on a scene containing only Circle primitives. Isolates the string formatting cost with no rendering pipeline.

n circlestimens/element
1,000198 µs198
10,0001.99 ms199
100,00019.9 ms199
1,000,000213 ms213

Perfectly linear. String::with_capacity pre-allocation eliminates reallocation variance. ~200 ns/element is the baseline cost of format! through the std::fmt machinery.

Interpretation

SVG string generation dominates at scale. For scatter/line at 1M points, 77% of total time is in SvgBackend::render_scene. Improving the SVG backend (e.g. write-to-file streaming, avoiding format! for simple floats) would have the most impact at extreme scales.

Violin is all KDE. The SVG stage for a violin scene is ~25 µs at any n — it's a handful of path curves. For n > 10k/group, consider whether that resolution is actually needed; downsampling to 10k rarely changes the visible shape.

Manhattan is now O(n). Pre-bucketing reduced chromosome filtering from O(22n) to O(n). A 1M-SNP GWAS renders in ~500 ms end-to-end (render only, excluding file I/O).

Error bars are expensive at scale. 3 extra primitives per point means 4–6× the cost of plain scatter. For large n with error bars, consider downsampling or rendering only bars for significant points.

Heatmap with values is 2×. The single-pass loop keeps the overhead exactly proportional — no wasted traversal.

PNG rendering

These benchmarks measure end-to-end wall time for producing PNG output — scene build, rasterisation, and encode. They cover three libraries on the same data at the same canvas size (1600×1200 @2× = 3200×2400 px effective).

Numbers below are means over 3 runs on AMD64 Linux, release build. Charton uses resvg for PNG; plotters draws directly into a pixel buffer; kuva uses its own direct rasteriser (added in v0.2.0).

Line plot

library85k pts170k pts850k pts
kuva v0.2.0 baseline (via resvg)0.94 s2.29 s14.0 s
kuva v0.2.0 new-raster0.72 s1.39 s6.77 s
kuva v0.2.0 polyline-opt0.58 s1.09 s5.59 s
kuva v0.2.0 stadium0.63 s1.21 s6.19 s
charton (resvg)1.32 s3.26 s23.3 s
plotters0.035 s0.072 s0.33 s

line benchmark

The stadium renderer is the current default for thick polylines. Each segment is rasterized as the exact set of pixels within hw of that segment (a capsule/stadium shape); adjacent stadiums overlap at joins to form correct round joins with no polygon construction, no self-intersection artifacts, and no join arc overshoot. It is ~30% faster than the AET polygon approach. Plotters is faster still because it draws hairlines with no anti-aliasing; kuva renders AA thick strokes.

Scatter plot

library500k pts5M pts
kuva v0.2.0 baseline (via resvg)1.12 s11.2 s
kuva v0.2.0 stadium0.11 s0.93 s
charton (resvg)1.79 s17.2 s
plotters0.060 s0.53 s

scatter benchmark

Scatter improved 12× vs the baseline. Most of the gain is from direct SDF circle rasterisation replacing the resvg SVG circle path. The stadium renderer does not affect scatter (scatter uses CircleBatch, not polylines).

Histogram

library1M bins10M bins
kuva v0.2.0 baseline (via resvg)0.023 s0.183 s
kuva v0.2.0 stadium0.020 s0.199 s
charton (resvg)0.108 s1.045 s
plotters0.004 s0.005 s

histogram benchmark

Histogram is dominated by bar rect primitives, which are cheap for all backends. plotters is essentially flat because it skips anti-aliasing on rects entirely; kuva renders AA-edged rects.

Running the comparison benchmarks

cd benches/rust_bench
# Run with a custom label (appends to benchmark_results.csv)
RUN_ID="my-tag" ITERATIONS=5 bash run_benchmarks.sh

Once benchmark_results.csv exists, bash scripts/gen_docs.sh (run from the repo root) will automatically rebuild plot_results, regenerate the SVGs, and copy them into docs/src/assets/bench/.

Optimisations applied

These were identified by profiling before benchmarks existed and confirmed by the benchmark numbers above:

changefileexpected gain
Truncated KDE kernel (sort + binary search)render_utils.rs~8× at 100k uniform data
Manhattan pre-bucketing (HashMap)render.rs~22× at 1M SNPs with 22 chr
Heatmap single-loop + no flat Vecrender.rs~2× for show_values; -1 alloc
String::with_capacity in SVG backendsvg.rseliminates O(log n) reallocs
Direct raster backend (Wu lines, SDF circles, AET fill)backend/raster.rs12× scatter, 2.5× line vs resvg
Primitive::PolyLine — bypass SVG round-triprender/render.rs, backend/raster.rs~17% improvement vs per-segment SVG path
Per-segment stadium fill for thick polylinesbackend/raster.rs~30% faster than AET polygon; eliminates join artifacts