Documentation — ARK Labs

ARK API is compatible with OpenAI. You can use /chat/completions, /embeddings and /audio/transcriptions endpoints in the same way you would use OpenAI's endpoints. You can even use their client libraries by customising the base_url parameter.

There exist limitations and extensions that distinguish the ARK API from OpenAI's offering. They are listed below.

Limitations

Model names aliasing

ARK executes inference using open-weight models like Meta's Llama rather than proprietary models such as ChatGPT 4o. Because the openai library (and possibly others like it) validates model names against a predefined enum, the ARK API configuration implements the ability to assign aliases to model names.

Examples:

gpt-3.5-turbo → meta-llama/Llama-3.1-8B-Instruct
gpt-4o → meta-llama/Llama-3.1-70B-Instruct
text-embedding-ada-002 → BAAI/bge-m3
whisper-1 → whisper-1

Unsupported or not fully supported OpenAI parameters

`/chat/completions`

frequency_penalty — penalising repeated tokens is not supported.
function_call — explicit function calls unavailable.
logit_bias — biasing token probabilities unimplemented.
logprobs — token log probabilities unavailable.
presence_penalty — adjusting the likelihood of introducing new tokens is unavailable.
response_format — only text output is supported; JSON and other formats are unavailable.
seed — random seed control for reproducibility is unsupported.
stop — relies on eos_token_id rather than arbitrary string-based stop sequences.
temperature — setting temperature=0 lacks true determinism (though approximate). Internally set to 0.0001 to prevent numerical issues.
tools & tool_choice — function calling and tool integration unimplemented at this level (see Tool Calling for the supported pattern).
top_p — nucleus sampling unimplemented.
user — user parameter for per-user request tracking is unsupported.

`/embeddings`

dimensions — must stay within the model's predefined limits; arbitrary dimension settings are unsupported.
encoding_format — only float encoding is supported; base64 encoding is unavailable.
user — user parameter for request tracking is unsupported.

`/audio/transcriptions`

prompt — custom prompting is currently unsupported.

Extensions

Custom parameters

`/chat/completions`

ark_simplified — when using streaming, set this to true to disable wrapping every single token in a full JSON object. SSE event payloads will then contain only the token itself. Token usage JSON and [DONE] still arrive at the conclusion of inference.

`/embeddings`

This endpoint currently has no ARK extensions.

Stateful processing

During inference, a rich internal state is built inside GPU memory which represents the current prompt, the message history, and the reasoning done by the model. OpenAI optimises by processing every single request on randomly selected GPUs — but in the process most of that state is lost because only the final assistant reply is kept.

ARK allows users to have a session during which all requests are processed on the same set of GPUs and the full internal state is maintained between requests. Depending on the application, this strategy can enhance both response quality and performance.

Note: this mechanism can be globally enabled or disabled on your setup — consult information from our Deployment Team to know if you have this feature available.

To implement: enable cookie support in your client. The API will respond with:

set-cookie: ark_session_id=${SESSION_UUID}; Max-Age=86400; Path=/; SameSite=lax

Then sending:

cookie: ark_session_id=${SESSION_UUID}

with subsequent requests reuses the session. Please note that there are timeouts configured which destroy inactive sessions after some time, to prevent blocking GPUs indefinitely.

Prerequisites

Obtain the API URL and API key from the Deployment Team.
Install Python 3 (pre-installed on most Linux distributions).
Create a working directory, a virtual environment, and install the dependencies used across the examples:

mkdir ark
cd ark
python -m venv .venv
source .venv/bin/activate

pip install openai     # all examples
pip install numpy      # some examples
pip install requests   # some examples

Introduction.

Limitations

Model names aliasing

Unsupported or not fully supported OpenAI parameters

/chat/completions

/embeddings

/audio/transcriptions

Extensions

Custom parameters

/chat/completions

/embeddings

Stateful processing

Prerequisites

`/chat/completions`

`/embeddings`

`/audio/transcriptions`

`/chat/completions`

`/embeddings`