How we benchmarked: methodology + caveats

rjest Team · May 26, 2026 ·

benchmarksmethodology

The README on the rjest repo says: cold start 1.9s vs 1.4s for Jest; warm run ~14ms vs 1.4s — about 100x. Those are real numbers from a real run. They are also a single data point. This post is about what surrounds that data point: how we measured, what suite shape produces it, which suites produce smaller multiples, and what would have been misleading to report.

If you came here to verify a benchmark before adopting a runner, that is the right instinct. We hope this post makes the verification easier.

What we measured

The benchmark suite for the headline number is a JS/TS package with about 40 test files, predominantly TypeScript, using SWC for transforms (in the Jest config, via @swc/jest). The tests are unit-shaped — fast assertions over pure functions and small classes, no database, no network. Total test count is in the low hundreds.

We measured two things, both wall-clock:

Cold start: a fresh shell, no rjest daemon running, no Jest worker cache. We timed time npx jest and time npx rjest from the moment the command was issued to the moment the runner exited with success.
Warm run: the same command issued a second time, immediately after the first. For Jest, this benefits from any disk caches that the OS or Babel left around. For rjest, it benefits from the running daemon, the populated transform cache, and the warm worker pool.

Both numbers are reported as the median of five trials.

Hardware: an M-series Mac with 16GB RAM. The numbers shift on x86 Linux but the qualitative shape — cold is similar, warm is dramatically different — holds across the platforms we tested.

Why 14ms

A warm rjest run skips:

Config parsing (held in the daemon)
Transformer setup (SWC is the daemon)
Worker boot (the pool is warm)
Most transform work (sled cache hit on blake3-hashed source)
Most module loading (workers retain VM caches across runs)

What is left is the actual execution of test functions, the I/O between worker and daemon, and the rendering of output. For a small unit suite this lands in the 10-20ms range. On the benchmark suite specifically, it is ~14ms median.

This number is not a property of rjest. It is a property of the suite — its test count, its I/O, its assertion complexity. On a suite ten times larger, the warm number scales roughly with the work that actually has to happen.

Why 100x

The 100x figure is the ratio of warm rjest (~14ms) to warm Jest (~1.4s) on the same suite. The reason Jest’s warm number is 1.4s, very close to its cold number, is that Jest re-pays most of the invocation overhead every time. Jest does not maintain a persistent daemon. Each invocation reloads the config, restarts workers, redoes the transformer setup.

So the multiple is not really “rjest is 100x faster than Jest.” It is “rjest’s incremental cost is 100x smaller than Jest’s incremental cost, on this suite, in warm conditions.”

That distinction matters for thinking about your own repo.

Where the speedup shrinks

The 100x number is the high end. The realistic range is 10-100x on warm runs. Three suite properties pull it down:

Heavy I/O in tests. If your tests open a SQLite database, hit a localhost server, or read big fixtures from disk, that work has to happen regardless of runner. It dominates the runtime. The speedup shrinks toward the ratio of overhead-to-real-work in your suite.

Genuinely expensive assertions. Snapshots over large structures, deep-equal on big objects, or property-based tests with many shrinks are real CPU work. Same situation: rjest does not make them faster, it just does not add overhead on top.

Few tests per invocation. If you run a single test file at a time (because you are working on one feature), the overhead-to-work ratio favors rjest more. If you always run the full suite, the absolute Jest number is bigger but the work-to-overhead ratio shifts a little toward Jest.

We have not seen a realistic suite where the warm multiplier dropped below ~10x, but it is plausible — for example, a suite that is mostly Selenium-driven end-to-end tests against a real browser would see almost no difference.

Where the speedup gets larger

Two properties push the multiplier toward the high end of the range:

Slow transform pipelines. Repos using ts-jest or complex Babel chains pay more startup overhead than the benchmark suite. rjest skips all of it. Multipliers above 100x are possible here on warm runs.

Watch mode. Watch mode is the workflow rjest is designed for. The daemon is always warm; file changes route through the FS watcher and dependency graph; only affected tests rerun. The watch-mode multiplier in normal use is probably the most useful number for a developer to measure, and it is larger than the static “warm run” multiplier because watch mode also gets to skip discovery and re-globbing.

What we did not report (and why)

We do not report a “single-run” or “average run” speedup because the spread between cold and warm is so large that the average is a misleading summary. Reporting an average implicitly answers the wrong question.

We do not report large-monorepo numbers because we want to do that work properly before publishing it. The shape we expect — bigger absolute Jest times, similar rjest times, similar multiples — is consistent with what we have seen on internal experiments, but until we have a clean test corpus we will not put a number on it.

We do not report comparisons against Vitest in the headline benchmark because the comparison is a category error. Vitest uses a different transform layer and a different invocation model. The honest comparison is qualitative, and lives on the compare page.

How to verify in your repo

The cheapest test, in roughly 15 minutes:

Install rjest in your repo (npm install -D rjest-install).
Time npx jest three times. Time npx rjest three times.
Look at the cold (first) and warm (second/third) numbers separately.
Run the same exercise inside watch mode — note the rerun latency on a single file save.

If the warm rjest number is uninterestingly small and the warm Jest number is the size you would expect from your suite’s overhead profile, the runner is doing what we claim. If it is not — particularly if a test fails in rjest but passes in Jest — that is a bug we want.

The honest summary

The 100x number is real, on a real suite, with a real methodology. It is also the high end of the realistic range. The number that will matter to you is the one you measure in your own repo. We have designed the runner to make that measurement easy and to make the failure mode visible.

If you measure and the speedup is smaller than you hoped, we want to hear about that too. Suite shape matters; we would rather understand the distribution than ship a number that is right on average and wrong in detail.