Continue

Install

Continue is an IDE extension. It officially supports VS Code (including derivatives such as Cursor, Windsurf, and VSCodium) and the JetBrains IDE family. For all other methods, defer to the official install docs.

VS Code

Press Ctrl/Cmd + Shift + X to open the Extensions panel, search for Continue, and install it; you can also install it from the command line by its Marketplace ID:

code --install-extension Continue.continue

JetBrains (IDEA / PyCharm / GoLand, etc.)

Go to Settings → Plugins → Marketplace, search for Continue, install it, and restart the IDE; you can also install it from the JetBrains Marketplace page.

After installing, a Continue icon appears in the activity bar. You can check the installed Continue version in the Extensions panel; if you run into problems, it’s a good idea to update to the latest version first.

Connect TokenBay

How it works

Continue does not use environment variables to configure the gateway. Instead, you register a models entry for each model in the config file ~/.continue/config.yaml, pointing to TokenBay with apiBase and passing the credential with apiKey. There are two kinds of providers for connecting to TokenBay:

  • anthropic (recommended): uses TokenBay’s native Anthropic endpoint, with the most complete feature set (supports prompt caching and similar features). Set apiBase to https://api.tokenbay.com/v1, and Continue will append the messages endpoint after it.
  • openai (alternative): for non-Claude models such as GPT and DeepSeek, using the standard /chat/completions. Set apiBase to https://api.tokenbay.com/v1 as well.

apiBase must include /v1: The official examples all set apiBase to a versioned endpoint root (e.g. .../v1), and Continue then appends paths such as messages or chat/completions after it. So set it to https://api.tokenbay.com/v1 throughout; do not write the bare domain https://api.tokenbay.com, otherwise the resulting URL will drop the /v1.

1. Get an API key

Sign in to the TokenBay consoleAPI KeysCreate Key. Copy the full string starting with sk-. The plaintext is shown only once and cannot be viewed again after you leave the page.

Create an API key in the console

2. Edit ~/.continue/config.yaml

Config file locations (in order of priority):

ScopePathNotes
User-level (global)~/.continue/config.yamlApplies to all projects, most common
Project-level.continue/config.yaml in the workspace rootApplies only to that project and is merged with the global config

If the file doesn’t exist, click the gear (settings) at the top-right of the Continue sidebar in VS Code to generate it. Fill in the following (replace sk-XXXXXXX with your key):

name: TokenBay
version: 1.0.0
schema: v1
models:
  - name: Claude Sonnet (TokenBay)
    provider: anthropic
    model: claude-sonnet-4.6
    apiBase: https://api.tokenbay.com/v1
    apiKey: sk-XXXXXXX
    roles:
      - chat
      - edit
      - apply
 
  - name: GPT-5.5 (TokenBay)
    provider: openai
    model: gpt-5.5
    apiBase: https://api.tokenbay.com/v1
    apiKey: sk-XXXXXXX
    roles:
      - chat
      - edit

Field reference:

FieldNotes
provideranthropic uses Anthropic Messages; openai uses Chat Completions
modelThe model ID on TokenBay, passed straight through to the upstream, with no prefix
apiBaseAlways set to https://api.tokenbay.com/v1 (with /v1)
apiKeyYour TokenBay API key (sk-...)
rolesThe roles this model serves within Continue (chat / edit / apply / autocomplete / embed, etc.)

After saving, Continue hot-reloads — no need to restart the IDE.

If you’d rather not write the key in plaintext in the config, use Continue’s secret reference syntax apiKey: ${{ secrets.TOKENBAY_API_KEY }} and maintain TOKENBAY_API_KEY in Continue’s settings.

Use caseModel IDprovider
Primary coding (chat / edit / apply / agent)claude-sonnet-4.6anthropic
Complex refactoring / long contextclaude-opus-4.8anthropic
Lightweight / fast responseclaude-haiku-4.5anthropic
GPT general-purpose flagshipgpt-5.5openai
GPT coding alternativegpt-5.3-codexopenai
Inline autocompletegpt-5.4-miniopenai

Model IDs are passed straight through to the upstream, with no prefix. See the Models list for the full set of available models.

Model name format: In TokenBay model names, version numbers are only accepted in dotted form (e.g. claude-sonnet-4.6, gpt-5.5); do not write them with hyphens (claude-sonnet-4-6, gpt-5-5).

The table above is just an example. Refer to the console Models page (or the Models list) for the exact Model IDs; before connecting, verify them and confirm your API key’s group is authorized for the model.

About autocomplete and embed: Continue’s inline completion works best with dedicated FIM models (such as Codestral or Qwen-Coder); the general chat models in the table above also work, but with mediocre latency and quality — trade off as needed. For embed (vector indexing), use a model ID in the console that explicitly supports embeddings, and skip it if none is available.

4. Advanced configuration (long tasks / timeouts / completion throttling)

Merge timeout, completion throttling, and similar options into the config.yaml above:

name: TokenBay
version: 1.0.0
schema: v1
models:
  - name: Claude Sonnet (TokenBay)
    provider: anthropic
    model: claude-sonnet-4.6
    apiBase: https://api.tokenbay.com/v1
    apiKey: sk-XXXXXXX
    roles:
      - chat
      - edit
      - apply
    defaultCompletionOptions:
      maxTokens: 8192
      promptCaching: true   # anthropic only; enabling it lowers cost on cache hits
    requestOptions:
      timeout: 600           # per-request timeout; increase for long tasks / long context
  • requestOptions.timeout: per-request timeout. The official docs don’t specify the unit or default value (config.yaml reference); when long-context or long-running tasks get interrupted, you can increase it appropriately — defer to the official docs for exact values.
  • Proxy / firewall: The VS Code build of Continue reuses VS Code’s network and proxy settings (http.proxy). On a corporate network, make sure the proxy allows api.tokenbay.com.
  • Completion throttling: autocomplete triggers on every keystroke by default; add tabAutocompleteOptions.debounceDelay: 350 (milliseconds) at the top level to coalesce requests and save quota.