Compact, instruction-tuned LLM. Strong default for chat, summarisation and tool-calling at low cost-per-token.
Start with the ARK-curated catalogue — frontier LLMs, vision, code, embeddings, image and speech, ready to call through an OpenAI v1-compatible API. Need something else from Hugging Face? Search the 500,000+ open-source library and request deployment in one click.
Same weights as upstream. Production-tuned for ARK's runtime — with stateful inference, session-level KV isolation, and EU data residency on Cloud.
Compact, instruction-tuned LLM. Strong default for chat, summarisation and tool-calling at low cost-per-token.
Mid-sized reasoning model, strong on multilingual benchmarks. A good upgrade from 8B-class when quality matters more than throughput.
Mixture-of-experts code model. Best-fit for IDE-integrated assistants and iterative debug loops — pairs naturally with stateful sessions.
Polish-language LLM trained on the SpeakLeash corpus. The default for CEE deployments where Polish quality matters more than English throughput.
Vision-language model for OCR, document understanding, chart reasoning and visual grounding. Returns structured output through the standard chat API.
OpenAI's open-weights release. Drop-in 20B reasoning model for teams who want OpenAI-style behaviour without an OpenAI account or its data terms.
Multilingual embedding model with dense, sparse, and multi-vector retrieval modes. The default for RAG, semantic search and hybrid retrieval.
Text-to-image generation at production quality. Configurable steps, samplers and resolution. Returns images through the standard images endpoint.
Multilingual speech-to-text with strong Polish, German, French and English performance. Best-fit for live meeting and call transcription pipelines.
No curated models match this filter. Try a different modality — or search Hugging Face below.
Pricing shown is ARK Cloud, per million input/output tokens unless stated. ARK Tailored and ARK Core run any of these models on your hardware under a per-GPU license. See full pricing →
ARK uses the standard Hugging Face model format — any open-weights model is a deployment request away. Search the live HF index, request the one you need, and our team validates it on the ARK runtime.
No models found. Try a broader query — or browse all on Hugging Face →
Hugging Face index is unreachable from this browser. You can browse it directly and with the model ID.
Don't want to search?
Click Try on Cloud on a curated model, or Request deployment on any Hugging Face model. We capture the model ID, your tier, and the GPU profile that matters.
Compatibility check, quantisation strategy, sharding plan, benchmark on representative GPUs. You get a one-page report — expected throughput, VRAM footprint, context headroom.
ARK Cloud: we promote it into the curated catalogue. ARK Tailored / Core: we ship the model entry into your Model Storage; your operator promotes it on the next rollout. No restart needed.
Same weights, same OpenAI-compatible contract. The decision is just where the GPUs sit.
Tell us where it should run and what the workload looks like. We'll come back with a one-page validation report and a firm date.