Introduction.

ARK API is OpenAI-compatible. Point the OpenAI client libraries at our base URL and you are shipping — with a few documented limitations and a handful of ARK-only extensions for stateful sessions and streaming ergonomics.

ARK API is compatible with OpenAI. You can use /chat/completions, /embeddings and /audio/transcriptions endpoints in the same way you would use OpenAI's endpoints. You can even use their client libraries by customising the base_url parameter.

There exist limitations and extensions that distinguish the ARK API from OpenAI's offering. They are listed below.

Limitations

Model names aliasing

ARK executes inference using open-weight models like Meta's Llama rather than proprietary models such as ChatGPT 4o. Because the openai library (and possibly others like it) validates model names against a predefined enum, the ARK API configuration implements the ability to assign aliases to model names.

Examples:

Unsupported or not fully supported OpenAI parameters

/chat/completions

/embeddings

/audio/transcriptions

Extensions

Custom parameters

/chat/completions

/embeddings

This endpoint currently has no ARK extensions.

Stateful processing

During inference, a rich internal state is built inside GPU memory which represents the current prompt, the message history, and the reasoning done by the model. OpenAI optimises by processing every single request on randomly selected GPUs — but in the process most of that state is lost because only the final assistant reply is kept.

ARK allows users to have a session during which all requests are processed on the same set of GPUs and the full internal state is maintained between requests. Depending on the application, this strategy can enhance both response quality and performance.

Note: this mechanism can be globally enabled or disabled on your setup — consult information from our Deployment Team to know if you have this feature available.

To implement: enable cookie support in your client. The API will respond with:

set-cookie: ark_session_id=${SESSION_UUID}; Max-Age=86400; Path=/; SameSite=lax

Then sending:

cookie: ark_session_id=${SESSION_UUID}

with subsequent requests reuses the session. Please note that there are timeouts configured which destroy inactive sessions after some time, to prevent blocking GPUs indefinitely.

Prerequisites

  1. Obtain the API URL and API key from the Deployment Team.
  2. Install Python 3 (pre-installed on most Linux distributions).
  3. Create a working directory, a virtual environment, and install the dependencies used across the examples:
mkdir ark
cd ark
python -m venv .venv
source .venv/bin/activate

pip install openai     # all examples
pip install numpy      # some examples
pip install requests   # some examples