First-party measurement — runtime and memory, not accuracy

ESMFold2 Benchmark: H100 Runtime, VRAM, and Max Length

We ran ESMFold2 on an NVIDIA H100 80GB via Modal serverless and measured what actually matters when you deploy it: cold start, warm latency, peak VRAM, and where it runs out of memory.

Length (aa)	Bucket	Cold start	First (JIT)	Warm	Peak VRAM	Result
200	256	53.7s	25.9s	0.26s	52.1 GB	OK
450	512	—	29.6s	1.26s	52.1 GB	OK
700	768	—	31.6s	2.66s	52.1 GB	OK
800	1024	—	—	—	—	OOM

Measured on June 8, 2026 · ESMFold2 (Biohub ESM) · NVIDIA H100 80GB · Modal serverless. Measured across four sequence lengths in one warm H100 container.

What we measured

Once the container is warm, inference is fast and scales with length: ~0.3s at 200 residues, ~1.3s at 450, and ~2.7s at 700 — all under three seconds.
Peak VRAM stayed around 52GB across every length from 200 to 700 residues, so memory is dominated by fixed model allocations, not by sequence length within the working range.
Cold start (model load + weight convert) was ~54s and is paid once per container; the first prediction at each new length adds a ~26–32s JIT compile.
Sequences that pad to the 1024 bucket (≥800 residues) ran out of memory even on an 80GB H100, so the practical single-GPU ceiling is the 768 bucket (~700 residues).

How we measured this

These numbers come from running ESMFold2 on Modal serverless H100 GPUs with our own benchmark harness (bench/modal_benchmark.py). We are not the model authors; this measures runtime and memory, not prediction accuracy.

Cold start is the one-time cost of container init plus model load and weight convert, paid once per warm container. The first prediction at a new length adds a JIT compile; subsequent predictions at the same padded length are pure inference (the "warm" column).

Because the model recompiles per shape, inputs are padded to fixed length buckets. A 700-residue sequence runs at the 768 bucket; an 800-residue sequence runs at the 1024 bucket, which ran out of memory even on an 80GB H100.

This dataset is intentionally small and is being expanded with more sequence lengths and GPU tiers. We only publish points we have actually measured.

FAQ

How much VRAM does ESMFold2 need?

In our run, a 700-residue prediction peaked near 52GB of VRAM on an H100, so it needs a high-memory datacenter GPU (A100/H100 class) rather than a consumer card.

What is the maximum sequence length for ESMFold2 on one GPU?

We measured a 700-residue sequence (768 bucket) running fine, while an 800-residue sequence (1024 bucket) ran out of memory on an 80GB H100. So the practical single-GPU ceiling sits around the 768-length bucket in our setup.

How fast is ESMFold2 inference?

After warm-up, inference scales with length: about 0.3s at 200 residues, 1.3s at 450, and 2.7s at 700. The first call pays a ~54s cold start plus a ~26–32s JIT compile at each new length.

Is this an accuracy benchmark?

No. This page only measures runtime, latency, and memory. It does not compare prediction accuracy against ESMFold, AlphaFold3, or any other model.

ESMFold2 vs AlphaFold3 See an example result Code generator

Length (aa)

Bucket

Cold start

First (JIT)

Warm

Peak VRAM

Result

200

256

53.7s

25.9s

0.26s

52.1 GB

450

512

—

29.6s

1.26s

52.1 GB

700

768

—

31.6s

2.66s

52.1 GB

800

1024

—

OOM

How we measured this

This dataset is intentionally small and is being expanded with more sequence lengths and GPU tiers. We only publish points we have actually measured.

FAQ

How much VRAM does ESMFold2 need?

In our run, a 700-residue prediction peaked near 52GB of VRAM on an H100, so it needs a high-memory datacenter GPU (A100/H100 class) rather than a consumer card.

What is the maximum sequence length for ESMFold2 on one GPU?

How fast is ESMFold2 inference?

After warm-up, inference scales with length: about 0.3s at 200 residues, 1.3s at 450, and 2.7s at 700. The first call pays a ~54s cold start plus a ~26–32s JIT compile at each new length.

Is this an accuracy benchmark?

No. This page only measures runtime, latency, and memory. It does not compare prediction accuracy against ESMFold, AlphaFold3, or any other model.