First-party measurement — runtime and memory, not accuracy
We ran ESMFold2 on an NVIDIA H100 80GB via Modal serverless and measured what actually matters when you deploy it: cold start, warm latency, peak VRAM, and where it runs out of memory.
| Length (aa) | Bucket | Cold start | First (JIT) | Warm | Peak VRAM | Result |
|---|---|---|---|---|---|---|
| 200 | 256 | 53.7s | 25.9s | 0.26s | 52.1 GB | OK |
| 450 | 512 | — | 29.6s | 1.26s | 52.1 GB | OK |
| 700 | 768 | — | 31.6s | 2.66s | 52.1 GB | OK |
| 800 | 1024 | — | — | — | — | OOM |
Measured on June 8, 2026 · ESMFold2 (Biohub ESM) · NVIDIA H100 80GB · Modal serverless. Measured across four sequence lengths in one warm H100 container.
These numbers come from running ESMFold2 on Modal serverless H100 GPUs with our own benchmark harness (bench/modal_benchmark.py). We are not the model authors; this measures runtime and memory, not prediction accuracy.
Cold start is the one-time cost of container init plus model load and weight convert, paid once per warm container. The first prediction at a new length adds a JIT compile; subsequent predictions at the same padded length are pure inference (the "warm" column).
Because the model recompiles per shape, inputs are padded to fixed length buckets. A 700-residue sequence runs at the 768 bucket; an 800-residue sequence runs at the 1024 bucket, which ran out of memory even on an 80GB H100.
This dataset is intentionally small and is being expanded with more sequence lengths and GPU tiers. We only publish points we have actually measured.
In our run, a 700-residue prediction peaked near 52GB of VRAM on an H100, so it needs a high-memory datacenter GPU (A100/H100 class) rather than a consumer card.
We measured a 700-residue sequence (768 bucket) running fine, while an 800-residue sequence (1024 bucket) ran out of memory on an 80GB H100. So the practical single-GPU ceiling sits around the 768-length bucket in our setup.
After warm-up, inference scales with length: about 0.3s at 200 residues, 1.3s at 450, and 2.7s at 700. The first call pays a ~54s cold start plus a ~26–32s JIT compile at each new length.
No. This page only measures runtime, latency, and memory. It does not compare prediction accuracy against ESMFold, AlphaFold3, or any other model.