Runtimes, Models & Usage Tracking

Runtime selection

Second supports three builder runtimes:

Runtime	Runtime ID	Model format	Parameter controls
Claude Code	`claude-code`	Claude model IDs such as `claude-sonnet-4-6`	Effort and thinking
Codex CLI	`codex-cli`	OpenAI model IDs such as `gpt-5.4`	Reasoning effort and Codex sandbox
OpenCode	`opencode`	OpenCode `provider/model` IDs such as `openai/gpt-5.4`	No extra controls yet

Apps persist runtime settings as:

{
  runtimeId: "claude-code" | "codex-cli" | "opencode";
  runtimeModel: string;
  runtimeParams: Record<string, string>;
}

The model picker is driven by apps/web/src/lib/agent/runtime-registry.ts. It groups models by runtime and renders only the parameter controls supported by the selected runtime. The composer and chat transport send runtimeId, runtimeModel, and runtimeParams on every app creation, settings update, and chat POST. The local onboarding runtime choice is also saved as a browser preference so the app composer opens with the selected runtime instead of falling back to the project default.

How command runtimes work under the hood

Claude uses the Claude Agent SDK. Codex is launched through the Codex CLI app-server protocol over stdio, which is the same local Codex runtime surface used by the Codex SDK but without adding an extra SDK dependency in the worker. OpenCode is launched in non-interactive JSON mode. The worker normalizes all runtime output into the same Claude-shaped worker SSE events so the existing chat bridge and AI element cards continue to render streamed text, plans, terminal commands, file edits, app data tools, integration setup, and done_building. OpenCode support requires an OpenCode CLI version whose opencode run --help includes --format json. Older OpenCode binaries are reported during onboarding as installed but not usable for the OpenCode runtime, and the worker returns a clear runtime error instead of starting a non-streamable plain-text run.

Claude Agent SDK

Understanding model selection requires understanding what query() does at the process level.

Every call spawns a new process

The Claude Agent SDK does not keep a long-running connection to the Anthropic API. Each query() call spawns a brand new CLI process:

query({ prompt: "hello", options: { model: "claude-sonnet-4-6" } })
  → child_process.spawn("node", ["cli.js", "--model", "claude-sonnet-4-6", ...])

The CLI binary handles the entire agent loop internally:

Sends POST https://api.anthropic.com/v1/messages with the specified model
Claude responds with text and/or tool calls
CLI executes tools locally (Read, Edit, Bash, etc.)
CLI appends tool results and sends another API call
Repeat until Claude responds with no tool calls
Process exits

There is no “direct API mode.” The SDK is a wrapper around the claude CLI binary.

Sessions are files on disk

The CLI writes every API request and response to a JSONL file:

~/.claude/projects/<encoded-cwd>/<session-id>.jsonl

Each line is a complete message with the raw API response, including the model field and full usage object. This file is the CLI’s own record of what happened — not written by our code.

The API is stateless

Anthropic’s Messages API has no server-side sessions. Every API call includes the entire conversation history as the messages array. Resuming a session means re-sending all previous messages as input tokens. Prompt caching mitigates this: system prompts, tool definitions, and early messages get cached at 0.1x the input price. In practice, most resumed conversations hit the cache heavily.

Model selection and switching

Available models

The runtime registry includes Claude Code, Codex CLI, and OpenCode model entries. It stores runtime-native IDs, display names, descriptions, defaults, and parameter constraints. Claude pricing metadata is available for cost display:

Display name	Model ID	Description	Input / MTok	Output / MTok	Cache read / MTok
Opus 4.6	`claude-opus-4-6`	Most capable for ambitious work	$5	$25	$0.50
Sonnet 4.6	`claude-sonnet-4-6`	Most efficient for everyday tasks	$3	$15	$0.30
Haiku 4.5	`claude-haiku-4-5`	Fastest for quick answers	$1	$5	$0.10

The default runtime is Claude Code with Sonnet 4.6. Runtime defaults and model display names are defined in lib/agent/runtime-registry.ts.

Model-specific capabilities

Some features are only available on certain models:

Feature	Available on	Fallback for other models
Effort: `max`	Opus 4.6 only	`high`
Thinking: `adaptive`	Opus 4.6 only	`enabled`

The UI enforces these constraints — Opus-only options are disabled in the dropdown when a non-Opus model is selected. If the user switches from Opus to another model, any Opus-only selections are automatically downgraded.

How switching works

The user selects a runtime model and runtime-specific parameters from the composer. Each message carries the normalized runtime settings through the full stack:

Composer dropdowns → React refs (runtimeId, runtimeModel, runtimeParams)
    ↓
Custom fetch on DefaultChatTransport
  (reads refs, injects runtime settings into POST body)
    ↓
POST /api/.../chat  →  body.runtimeId, body.runtimeModel, body.runtimeParams
    ↓
worker-bridge  →  POST /sessions/:appId/messages  →  runtime settings
    ↓
session.sendMessage(prompt, runtimeSettings)
    ↓
runtime adapter dispatches to Claude, Codex CLI, or OpenCode

Switching from Sonnet to Opus mid-conversation means the next message spawns a new Claude CLI process with --model claude-opus-4-6 --resume <sessionId>. The CLI reads the session JSONL (which includes all previous Sonnet messages), sends the full history to the API with the new model, and continues the conversation. Effort and thinking settings take effect on the same call. Second stores provider-native session state per runtime on the run document. When the user keeps using a runtime whose native session state is current, the next message sends only the latest user prompt plus that runtime’s session state. When the user switches to another runtime, Second uses the persisted provider-agnostic UIMessage[] transcript as the handoff layer. The chat route builds a bounded neutral transcript for the messages that the target runtime has not already seen, then appends the latest user message. The target runtime receives that handoff as plain prompt context plus its own provider session state when one exists. Second does not write vendor-private session files to “convert” a Claude session into a Codex or OpenCode session. The durable source of truth is the stored UIMessage[] plus the workspace files on disk; provider session state is an optimization for native resume, not the tenant boundary or the only conversation record. No re-run and no conversation restart. Same-runtime switches use native resume when possible; cross-runtime switches use the neutral transcript handoff and continue from the same Second run.

Why custom fetch (not transport body)

The Vercel AI SDK’s useChat hook captures the DefaultChatTransport instance on first render and never swaps it. If you create a new transport when the model changes, useChat ignores it. The solution: create one stable transport (memoized on chatApi only) with a custom fetch function that reads current values from React refs on every request:

const runtimeSettingsRef = useRef(runtimeSettings);

const transport = useMemo(() => new DefaultChatTransport({
  api: chatApi,
  fetch: async (input, init) => {
    if (init?.method === "POST" && typeof init.body === "string") {
      const body = JSON.parse(init.body);
      const latest = runtimeSettingsRef.current;  // always reads latest
      body.runtimeId = latest.runtimeId;
      body.runtimeModel = latest.model;
      body.runtimeParams = latest.params;
      return globalThis.fetch(input, { ...init, body: JSON.stringify(body) });
    }
    return globalThis.fetch(input, init);
  },
}), [chatApi]);  // no dependency on selected values

Composer layout

┌──────────────────────────────────────────────────┐
│ [textarea]                                       │
│                                                  │
│ [+] [Sonnet 4.6 ▼] [runtime params...]       [⬆ / ⏸] │
└──────────────────────────────────────────────────┘

+ button — Attach files (placeholder, not wired yet).
Model dropdown (components/model-selector.tsx) — shared between the workspace composer and the chat composer. Shows models grouped by runtime with descriptions and a checkmark on the selected one. Includes an “Add runtime” dialog with setup notes for Claude Code, Codex CLI, and OpenCode.
Runtime parameter dropdowns (components/runtime-parameter-selectors.tsx) — rendered from runtime-registry.ts. Claude shows effort and thinking. Codex CLI shows reasoning effort and sandbox mode. OpenCode currently has no additional controls.
Submit button — Circle with ArrowUp icon. Switches to Pause while streaming. Clicking during a stream calls stop() to abort.

When the user switches runtime or model, settings are normalized against the selected runtime’s defaults and supported options. For example, switching from Opus to a non-Opus Claude model downgrades Opus-only selections to supported Claude values.

Local provider setup

During onboarding in local mode (SECOND_AUTH_MODE=none), a provider setup screen at /onboarding/provider auto-detects what’s available:

Claude CLI on PATH — checked via which claude on the worker, or SECOND_CLAUDE_PATH when an operator pins a custom executable path
Codex CLI on PATH — checked via which codex on the worker, or SECOND_CODEX_PATH when configured
OpenCode CLI on PATH with JSON events — checked via which opencode and opencode run --help on the worker, or SECOND_OPENCODE_PATH when configured
Runtime auth env hints — ANTHROPIC_API_KEY, CODEX_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY, and GEMINI_API_KEY are reported only as booleans, never values

If the Claude CLI is installed and the user has logged in (claude login), everything works automatically — no API key needed. The SDK spawns the user’s local claude binary, which uses their existing auth. If ANTHROPIC_API_KEY is set, it takes priority — the CLI switches to API billing regardless of whether the user is also logged in via subscription. Codex CLI can use its own login state or CODEX_API_KEY/OPENAI_API_KEY, depending on the installed CLI configuration. Detection runs codex login status and checks stdout and stderr because Codex may print login status on stderr even when the command succeeds. It reports only a boolean auth result; it never returns token values or reads auth file contents. OpenCode uses the provider credentials required by the selected provider/model ID. This screen only exists in local mode. In enterprise deployments (SECOND_AUTH_MODE=external), the API key is configured before deployment and the screen is skipped entirely.

Files involved

File	Role
`apps/worker/src/index.ts`	`GET /detect-provider` — detects `claude`, `codex`, `opencode`, and auth-mode booleans
`apps/web/src/app/api/setup/detect-provider/route.ts`	Proxies to worker
`apps/web/src/app/onboarding/provider/page.tsx`	Server component — guards, renders setup
`apps/web/src/components/provider-setup.tsx`	Client component — calls detect, shows results

Billing modes

Second separates runtime authentication from token/cost visibility. Runtimes can emit token counts and API-equivalent dollar estimates even when the local CLI usage is covered by a subscription plan.

Runtime	Local subscription mode	API billing mode
Claude Code	`SECOND_AUTH_MODE=none`, no `ANTHROPIC_API_KEY`, Claude CLI logged in via Claude.ai	`ANTHROPIC_API_KEY` configured
Codex CLI	`SECOND_AUTH_MODE=none`, no `CODEX_API_KEY`/`OPENAI_API_KEY`, Codex CLI logged in with ChatGPT	`CODEX_API_KEY` or `OPENAI_API_KEY` configured
OpenCode	Not treated as subscription-backed by Second	Provider key required by the selected `provider/model`

The app usage panel still shows token counts in all modes. In local subscription mode, it treats provider dollar values as API-equivalent estimates: the estimate is struck through and the displayed run cost excludes that subscription-backed model usage. For example, local Claude Code shows “Running on your Claude subscription”; local Codex CLI with ChatGPT login shows “Running through your Codex CLI ChatGPT login.” Detection happens in page.tsx from server environment flags, then AppWorkspace applies the billing display per model row. This matters for mixed-runtime runs: a Claude subscription row and a Codex ChatGPT-login row can both be struck through, while an API-key-backed OpenCode row still displays as billable.

Usage tracking

Where the data comes from

Claude emits a result message at the end of every SDK query() call:

{
  "type": "result",
  "total_cost_usd": 0.0342,
  "num_turns": 3,
  "duration_ms": 12400,
  "duration_api_ms": 8200,
  "modelUsage": {
    "claude-opus-4-6": {
      "inputTokens": 8420,
      "outputTokens": 1203,
      "cacheReadInputTokens": 6100,
      "cacheCreationInputTokens": 500,
      "costUSD": 0.0342
    }
  }
}

The modelUsage field is computed by the runtime adapter from provider runtime events. Claude includes cost and token data from the Claude CLI result. Codex app-server exposes token usage but not a dollar value, so Second estimates OpenAI cost from the selected model’s current input, cached-input, and output token rates. OpenCode emits the same result shape when its JSON stream exposes usage data; when a runtime does not expose cost and Second has no pricing metadata for the selected model, Second records token counts when available and zero cost.

How it’s captured

Runtime result message (emitted or normalized by the worker)
  → worker SSE stream
    → worker-bridge captures msg.type === "result"
      → extracts totalCostUsd + modelUsage
        → API route calls accumulateRunUsage()
          → MongoDB $inc on the run document

Usage is accumulated atomically with $inc. Each runtime turn adds to the run’s totals. Multiple messages in a run accumulate correctly.

Schema

The usage field on AgentRunDocument:

type RunUsage = {
  totalCostUsd: number;              // sum across all runtime turns in this run
  totalInputTokens: number;
  totalOutputTokens: number;
  totalCacheReadTokens: number;
  totalCacheCreationTokens: number;
  byModel: Record<string, {          // per-model breakdown
    inputTokens: number;
    outputTokens: number;
    cacheReadInputTokens: number;
    cacheCreationInputTokens: number;
    costUsd: number;
  }>;
};

Querying for billing

Per-app cost across all runs:

db.agent_runs.aggregate([
  { $match: { workspaceId: "<workspaceId>" } },
  { $group: {
    _id: "$appId",
    totalCost: { $sum: "$usage.totalCostUsd" },
    totalInput: { $sum: "$usage.totalInputTokens" },
    totalOutput: { $sum: "$usage.totalOutputTokens" },
  }}
])

Per-workspace cost (all apps):

db.agent_runs.aggregate([
  { $match: { workspaceId: "<workspaceId>" } },
  { $group: {
    _id: null,
    totalCost: { $sum: "$usage.totalCostUsd" },
  }}
])

UI: info panel

A small ⓘ icon in the top-right corner of the app page opens a dropdown showing:

Total run cost
Input / output / cache-read token counts
Per-model breakdown (model name, token count, cost)

The data refreshes automatically when a stream finishes — the frontend detects the streaming → ready transition and fetches GET /chat which includes usage in the response.

Verification and debugging

Five levels of proof that the correct runtime/model was used, from lowest (closest to the metal) to highest:

1. Runtime-native logs

Claude writes every raw API response to a session JSONL file. The model field in each response is what the Anthropic API returned.

# Find the session file for a specific app
ls ~/.claude/projects/-private-tmp-second-workspaces-<appId>/

# Parse it and show which models were used
python3 -c "
import json, sys, collections
models = collections.Counter()
for line in open(sys.argv[1]):
    d = json.loads(line)
    m = d.get('message', {}).get('model', '')
    if m:
        u = d.get('message', {}).get('usage', {})
        models[m] += u.get('output_tokens', 0)
for m, tokens in models.items():
    print(f'{m}: {tokens} output tokens')
" ~/.claude/projects/-private-tmp-second-workspaces-<appId>/*.jsonl

Each line in the JSONL contains the full API response:

{
  "type": "assistant",
  "message": {
    "id": "msg_01HnLBE9DJDSMKxTRNMZWqvj",
    "model": "claude-sonnet-4-6",
    "usage": {
      "input_tokens": 3,
      "output_tokens": 66,
      "cache_read_input_tokens": 7294,
      "cache_creation_input_tokens": 2725
    }
  }
}

The msg_01... ID is assigned by Anthropic’s API. Different IDs = different API calls. Different model values = different models served the request. Codex CLI and OpenCode keep their own runtime/session records depending on the installed CLI configuration. Use those native logs together with Second’s stored sessionState when debugging resume behavior.

2. Provider console

For API-key backed runtimes, use the provider’s console or usage logs. For example, Anthropic logs Claude API calls with model, token counts, and cost.

3. MongoDB

mongosh second --eval \
  'db.agent_runs.find({}, {"usage.byModel":1}).sort({updatedAt:-1}).limit(1).pretty()'

Shows the accumulated modelUsage from all result messages in the run.

4. Worker terminal

The worker logs each request:

[worker] appId=69c6f381... model=claude-opus-4-6

5. Browser Network tab

Open devtools → Network → filter by chat. Inspect the POST request payload. It contains runtimeId, runtimeModel, and runtimeParams injected by the custom fetch.

Getting Started

Architecture

Deployment

Community

Runtimes, Models & Usage Tracking

Runtime selection

How command runtimes work under the hood

Claude Agent SDK

Every call spawns a new process

Sessions are files on disk

The API is stateless

Model selection and switching

Available models

Model-specific capabilities

How switching works

Why custom fetch (not transport body)

Composer layout

Local provider setup

Files involved

Billing modes

Usage tracking

Where the data comes from

How it’s captured

Schema

Querying for billing

UI: info panel

Verification and debugging

1. Runtime-native logs

2. Provider console

3. MongoDB

4. Worker terminal

5. Browser Network tab

Getting Started

Architecture

Deployment

Community

Documentation Index

​Runtime selection

​How command runtimes work under the hood

​Claude Agent SDK

​Every call spawns a new process

​Sessions are files on disk

​The API is stateless

​Model selection and switching

​Available models

​Model-specific capabilities

​How switching works

​Why custom fetch (not transport body)

​Composer layout

​Local provider setup

​Files involved

​Billing modes

​Usage tracking

​Where the data comes from

​How it’s captured

​Schema

​Querying for billing

​UI: info panel

​Verification and debugging

​1. Runtime-native logs

​2. Provider console

​3. MongoDB

​4. Worker terminal

​5. Browser Network tab

Runtime selection

How command runtimes work under the hood

Claude Agent SDK

Every call spawns a new process

Sessions are files on disk

The API is stateless

Model selection and switching

Available models

Model-specific capabilities

How switching works

Why custom fetch (not transport body)

Composer layout

Local provider setup

Files involved

Billing modes

Usage tracking

Where the data comes from

How it’s captured

Schema

Querying for billing

UI: info panel

Verification and debugging

1. Runtime-native logs

2. Provider console

3. MongoDB

4. Worker terminal

5. Browser Network tab