Every claim on this site is backed by a test. ARK's November 2025 benchmark suite, density studies, fault-tolerance measurements, and the architecture paper all live here — independently verifiable on customer hardware during POC.
| Capability | ARK | vLLM | TensorRT-LLM | Ollama |
|---|---|---|---|---|
| Heterogeneous GPU Support | ✓ Any mix | ✗ Homogeneous | ✗ Homogeneous | ✗ Single GPU |
| Elastic Hot-Scaling | ✓ Runtime | ✗ Restart | ✗ Restart | ✗ Not supported |
| Fault Tolerance | ✓ 99% survival | ✗ Group crash | ✗ Group crash | ✗ No HA |
| Multi-Model Tenancy | ✓ Shard-level | ✗ Per-model | ✗ Static engines | ✗ One per GPU |
| Network Requirement | ✓ ~5 Mbit/s | ✗ NVLink/IB | ✗ NVLink/IB | N/A |
| Context Length | ✓ +30–400% | Baseline | Baseline | Baseline |
| Session Isolation | ✓ KV + attention | ✗ Shared batching | ✗ Shared batching | Per-process |