Open WebUI
- Homepage: openwebui.com
- Install docs: docs.openwebui.com/getting-started/quick-start
- Gateway integration docs: OpenAI connection guide
- Protocol: OpenAI-compatible (
/v1/chat/completions)
Install
The project officially provides multiple methods — Docker, Python (pip / uv), Helm, and more; the complete list is in the official install docs. The two most common are listed below.
System requirements: the Python install method requires Python 3.11; the Docker method has no extra requirements. If you need GPU local inference, see the official GPU section separately — connecting to TokenBay does not require a GPU.
Method 1: Docker (officially recommended)
docker run -d \
-p 3000:8080 \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:mainAfter it starts, visit http://localhost:3000; the first time you enter, you’ll be asked to create an admin account.
Method 2: Python (pip)
pip install open-webui
open-webui serveListens on http://localhost:8080 by default.
Version check
# Docker install: check the image and running status
docker ps --filter name=open-webui
docker exec open-webui pip show open-webui | grep Version
# pip install
pip show open-webui | grep VersionYou can also log in and check the current version number on the ⚙️ Admin Settings → About page.
Connect TokenBay
How it works
Open WebUI ships with a built-in “OpenAI connection” — all you need to do is point the connection’s URL at the TokenBay gateway and fill in your TokenBay API key. The credential is sent with each request in the form Authorization: Bearer <sk-...>, and Open WebUI automatically appends paths such as /models and /chat/completions to that Base URL.
There are two configuration entry points:
- Admin panel UI (recommended): takes effect at runtime, no container restart needed.
- Environment variables: pre-set at boot, suited to automated deployments.
Base URL note: the TokenBay gateway Base URL is
https://api.tokenbay.com(no path, no trailing slash). However, Open WebUI requires you to enter the form that includes/v1, i.e.https://api.tokenbay.com/v1, because it automatically appends/modelsand/chat/completionson top of it. If you omit/v1, fetching the model list will fail.
1. Get an API key
Go to the TokenBay console → API Keys → Create Key (direct link) and copy the key starting with sk-.
The plaintext is shown only once: after a key is created, it is shown in full only this one time — save it carefully right away; if lost, you can only create a new one.
2. Method A: via the admin panel (recommended)
After logging in as an admin, go to ⚙️ Admin Settings → Connections → OpenAI → Manage (wrench icon) → ➕ Add New Connection and fill in per the table below:
| Field | Value |
|---|---|
| Connection Type | External |
| URL | https://api.tokenbay.com/v1 |
| API Key | Your TokenBay sk-... |
| Model IDs (Filter) | leave empty (automatically pulls all authorized models from /v1/models) |
Click Save ✅. Back on the chat home page’s model dropdown, you should see the model list passed through from TokenBay.
Model IDs (Filter): empty = auto-discover all models; filled in = a whitelist that shows only the model IDs you list, useful for hiding unneeded or more expensive models. Prefix ID: when you configure multiple upstreams at once and there are models with the same name, fill in a prefix (such as
tokenbay/) to distinguish them.
3. Method B: via environment variables
Use environment variables to pre-set the connection at boot. The variable mapping:
| Variable | Value |
|---|---|
OPENAI_API_BASE_URLS | https://api.tokenbay.com/v1 |
OPENAI_API_KEYS | sk-XXXXXXX |
Append to docker run:
docker run -d \
-p 3000:8080 \
-e OPENAI_API_BASE_URLS="https://api.tokenbay.com/v1" \
-e OPENAI_API_KEYS="sk-XXXXXXX" \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:mainPlural form:
OPENAI_API_BASE_URLS/OPENAI_API_KEYSsupport multiple endpoints, separated by;, with the two arrays matched one-to-one by index. When connecting only one endpoint, you can also use the singularOPENAI_API_BASE_URL/OPENAI_API_KEY. If you misspell a variable name, Open WebUI does not raise an error — it simply won’t pre-fill the value in the UI.
Persistence priority: connections saved via the Admin Settings UI are written to the database and take precedence over environment variables. If you want environment variables to always take effect, set
ENABLE_PERSISTENT_CONFIG=false; if you want to use environment variables to reset the database config, setRESET_CONFIG_ON_START=trueon the next boot.
4. Recommended models
Model IDs are passed through to TokenBay verbatim by Open WebUI, so use TokenBay’s model IDs directly. For the full list, see the model list or the console model marketplace.
| Use case | Model ID |
|---|---|
| General chat | gpt-5.5 |
| Fast / low cost | gpt-5.4-mini |
| Coding | claude-sonnet-4.6 |
| Complex reasoning / long tasks | claude-opus-4.8 |
| Multimodal (text + image) | gemini-3.1-pro-preview |
Model name format: TokenBay version numbers only accept the dotted form, e.g.
claude-sonnet-4.6,gpt-5.5, and do not accept the hyphenated form (e.g.claude-sonnet-4-6). Model not authorized: if a model doesn’t appear in the dropdown, it may not yet be authorized under Console → Group settings — enable it there first.
5. Advanced configuration
For long tasks (such as long-context reasoning or complex coding), we recommend increasing the timeout to avoid requests being cut off prematurely. These can be merged into docker run alongside the connection variables above:
docker run -d \
-p 3000:8080 \
-e OPENAI_API_BASE_URLS="https://api.tokenbay.com/v1" \
-e OPENAI_API_KEYS="sk-XXXXXXX" \
-e AIOHTTP_CLIENT_TIMEOUT=600 \
-e AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST=30 \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main| Variable | Default | Description |
|---|---|---|
AIOHTTP_CLIENT_TIMEOUT | 300 | Overall timeout (seconds) for requests to upstreams (including TokenBay). Increase for long tasks, e.g. 600. |
AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST | 10 | Timeout (seconds) for fetching the model list. Raise to 30 on slower networks. |
