Benchmarks
kuva uses Criterion for statistical micro-benchmarks. All numbers on this page were collected on a release build (opt-level = 3) on AMD64 Linux. Timing is median wall-clock; HTML reports with per-sample distributions live in target/criterion/ after running.
Running the benchmarks
# All benchmark groups (requires the png + pdf backends to compile cleanly)
cargo bench --features full
# A single group
cargo bench --features full -- render
# HTML report (opens in browser)
open target/criterion/report/index.html
Criterion runs a 3-second warm-up then collects 100 samples per benchmark. The measurement is median time; outlier detection flags any samples more than 2 IQRs from the median.
Benchmark files
| file | what it measures |
|---|---|
benches/render.rs | Full pipeline (scene build + SVG emit) and SVG-only for scatter, scatter+errorbars, line, violin, manhattan, heatmap |
benches/kde.rs | simple_kde in isolation at 100 → 100k samples |
benches/svg.rs | SvgBackend::render_scene alone, N Circle primitives, no rendering pipeline |
Results
kde — Gaussian KDE (simple_kde, 256 evaluation points)
The truncated kernel sorts the input once then binary-searches for the window [x ± 4·bw] around each evaluation point. Only values inside that window contribute (Gaussian weight beyond 4σ is < 0.003%).
| n samples | time |
|---|---|
| 100 | 85.7 µs |
| 1,000 | 621 µs |
| 10,000 | 4.07 ms |
| 100,000 | 28.0 ms |
Scaling: ~6.8× per decade of n. For uniformly distributed data the window captures ~10% of points, giving roughly O(n^0.8 × samples) total work instead of the naive O(n × samples). On bounded data (e.g. sin-shaped violin groups) the window covers ~30% of points, yielding a ~3× improvement over naive.
render — scatter and line
| n=100 | n=1k | n=10k | n=100k | n=1M | |
|---|---|---|---|---|---|
| scatter scene+svg | 41 µs | 314 µs | 3.10 ms | 34.5 ms | 414 ms |
| scatter svg-only | 34 µs | 294 µs | 3.07 ms | 31.2 ms | 317 ms |
| scatter scene build | ~7 µs | ~20 µs | ~30 µs | ~3 ms | ~97 ms |
| line scene+svg | 39 µs | 262 µs | 2.49 ms | 28.6 ms | 308 ms |
At 1M points the SVG emit accounts for 317 ms (77% of total). The scene build — coordinate mapping, Vec allocation — costs ~97 ms. SVG generation rate: ~200 ns/element.
render — scatter with error bars
Each error bar adds 3 Line primitives (cap–shaft–cap), so n=100k scatter+errorbars emits 400k primitives vs 100k for plain scatter.
| n | plain scatter | with y_err | overhead |
|---|---|---|---|
| 100 | 41 µs | 175 µs | 4.3× |
| 1,000 | 314 µs | 1.70 ms | 5.4× |
| 10,000 | 3.10 ms | 18.8 ms | 6.1× |
| 100,000 | 34.5 ms | 222 ms | 6.5× |
render — violin (3 groups)
Violin SVG is cheap — three KDE curves emit ~10 path primitives regardless of n. All time is in KDE.
| n per group | scene+svg | svg-only | KDE cost |
|---|---|---|---|
| 100 | 557 µs | 20 µs | ~537 µs |
| 1,000 | 2.00 ms | 19 µs | ~1.98 ms |
| 10,000 | 12.7 ms | 25 µs | ~12.7 ms |
| 100,000 | 89.0 ms | 34 µs | ~89 ms |
The violin SVG time is flat at ~25 µs across all scales. KDE matches 3× the kde bench values exactly. Practical advice: violin plots are fast at the scales bioinformatics data actually produces (100–5000 points per group); 100k/group is an extreme stress test.
render — manhattan (22 chromosomes)
Pre-bucketing builds a HashMap<chr → Vec<idx>> once before the span loop, reducing chromosome lookups from O(22 × n) to O(n).
| n SNPs | time |
|---|---|
| 1,000 | 372 µs |
| 10,000 | 3.88 ms |
| 100,000 | 42.0 ms |
| 1,000,000 | 501 ms |
Scales linearly: each 10× increase in n costs ~10–12× more time. At 1M SNPs (a full GWAS): 501 ms including scene build and SVG emit. Note: this is render-only; file I/O for a 1M-row TSV adds read time on top.
render — heatmap (n×n matrix)
| n | cells | no values | with values | text overhead |
|---|---|---|---|---|
| 10 | 100 | 61.7 µs | 114 µs | 1.85× |
| 50 | 2,500 | 1.18 ms | 2.54 ms | 2.15× |
| 100 | 10,000 | 4.86 ms | 10.4 ms | 2.14× |
| 200 | 40,000 | 24.6 ms | — | — |
| 500 | 250,000 | 154 ms | — | — |
Scales with n² (doubling n quadruples cells). The single-loop merge means show_values costs exactly ~2× no-values — one extra format! and Text primitive per cell.
svg — raw string generation
Measures SvgBackend::render_scene on a scene containing only Circle primitives. Isolates the string formatting cost with no rendering pipeline.
| n circles | time | ns/element |
|---|---|---|
| 1,000 | 198 µs | 198 |
| 10,000 | 1.99 ms | 199 |
| 100,000 | 19.9 ms | 199 |
| 1,000,000 | 213 ms | 213 |
Perfectly linear. String::with_capacity pre-allocation eliminates reallocation variance. ~200 ns/element is the baseline cost of format! through the std::fmt machinery.
Interpretation
SVG string generation dominates at scale. For scatter/line at 1M points, 77% of total time is in SvgBackend::render_scene. Improving the SVG backend (e.g. write-to-file streaming, avoiding format! for simple floats) would have the most impact at extreme scales.
Violin is all KDE. The SVG stage for a violin scene is ~25 µs at any n — it's a handful of path curves. For n > 10k/group, consider whether that resolution is actually needed; downsampling to 10k rarely changes the visible shape.
Manhattan is now O(n). Pre-bucketing reduced chromosome filtering from O(22n) to O(n). A 1M-SNP GWAS renders in ~500 ms end-to-end (render only, excluding file I/O).
Error bars are expensive at scale. 3 extra primitives per point means 4–6× the cost of plain scatter. For large n with error bars, consider downsampling or rendering only bars for significant points.
Heatmap with values is 2×. The single-pass loop keeps the overhead exactly proportional — no wasted traversal.
PNG rendering
These benchmarks measure end-to-end wall time for producing PNG output — scene build, rasterisation, and encode. They cover three libraries on the same data at the same canvas size (1600×1200 @2× = 3200×2400 px effective).
Numbers below are means over 3 runs on AMD64 Linux, release build. Charton uses resvg for PNG; plotters draws directly into a pixel buffer; kuva uses its own direct rasteriser (added in v0.2.0).
Line plot
| library | 85k pts | 170k pts | 850k pts |
|---|---|---|---|
| kuva v0.2.0 baseline (via resvg) | 0.94 s | 2.29 s | 14.0 s |
| kuva v0.2.0 new-raster | 0.72 s | 1.39 s | 6.77 s |
| kuva v0.2.0 polyline-opt | 0.58 s | 1.09 s | 5.59 s |
| kuva v0.2.0 stadium | 0.63 s | 1.21 s | 6.19 s |
| charton (resvg) | 1.32 s | 3.26 s | 23.3 s |
| plotters | 0.035 s | 0.072 s | 0.33 s |
The stadium renderer is the current default for thick polylines. Each segment is rasterized as the exact set of pixels within hw of that segment (a capsule/stadium shape); adjacent stadiums overlap at joins to form correct round joins with no polygon construction, no self-intersection artifacts, and no join arc overshoot. It is ~30% faster than the AET polygon approach. Plotters is faster still because it draws hairlines with no anti-aliasing; kuva renders AA thick strokes.
Scatter plot
| library | 500k pts | 5M pts |
|---|---|---|
| kuva v0.2.0 baseline (via resvg) | 1.12 s | 11.2 s |
| kuva v0.2.0 stadium | 0.11 s | 0.93 s |
| charton (resvg) | 1.79 s | 17.2 s |
| plotters | 0.060 s | 0.53 s |
Scatter improved 12× vs the baseline. Most of the gain is from direct SDF circle rasterisation replacing the resvg SVG circle path. The stadium renderer does not affect scatter (scatter uses CircleBatch, not polylines).
Histogram
| library | 1M bins | 10M bins |
|---|---|---|
| kuva v0.2.0 baseline (via resvg) | 0.023 s | 0.183 s |
| kuva v0.2.0 stadium | 0.020 s | 0.199 s |
| charton (resvg) | 0.108 s | 1.045 s |
| plotters | 0.004 s | 0.005 s |
Histogram is dominated by bar rect primitives, which are cheap for all backends. plotters is essentially flat because it skips anti-aliasing on rects entirely; kuva renders AA-edged rects.
Running the comparison benchmarks
cd benches/rust_bench
# Run with a custom label (appends to benchmark_results.csv)
RUN_ID="my-tag" ITERATIONS=5 bash run_benchmarks.sh
Once benchmark_results.csv exists, bash scripts/gen_docs.sh (run from the repo root) will automatically rebuild plot_results, regenerate the SVGs, and copy them into docs/src/assets/bench/.
Optimisations applied
These were identified by profiling before benchmarks existed and confirmed by the benchmark numbers above:
| change | file | expected gain |
|---|---|---|
| Truncated KDE kernel (sort + binary search) | render_utils.rs | ~8× at 100k uniform data |
| Manhattan pre-bucketing (HashMap) | render.rs | ~22× at 1M SNPs with 22 chr |
Heatmap single-loop + no flat Vec | render.rs | ~2× for show_values; -1 alloc |
String::with_capacity in SVG backend | svg.rs | eliminates O(log n) reallocs |
| Direct raster backend (Wu lines, SDF circles, AET fill) | backend/raster.rs | 12× scatter, 2.5× line vs resvg |
Primitive::PolyLine — bypass SVG round-trip | render/render.rs, backend/raster.rs | ~17% improvement vs per-segment SVG path |
| Per-segment stadium fill for thick polylines | backend/raster.rs | ~30% faster than AET polygon; eliminates join artifacts |