Skip to content

V0.5+: revisit effort picker (Opus 4.8 likely makes effort more impactful) #15

Description

@yyq1025

Why

Opus 4.8 / Opus 4.8 1M just landed (89f656d). Effort level is expected to have a larger behavioral impact on Opus 4.8 than on the 4.x models we shipped against, so the per-session effort picker we cut is worth reconsidering.

Background — what we had and why it was cut

A full per-session effort picker existed and was deliberately removed:

  • Built: 388fd9c (getModels RPC + model picker), f8f87a2 (model+effort apply), 5af60da (per-model supportedEffortLevels in MODEL_METADATA)
  • Cut: 03ba3b4 — "cut effort picker (trust SDK adaptive thinking)" (~512 LOC removed). Chip now shows model only.
  • Follow-up: 3f5a950 — daemon still hardcodes effort: "xhigh" in sidecode metadata purely for Desktop local_*.json schema compat (never read at runtime).

Today the daemon never passes --effort; the user's account-level Settings.effortLevel (set via Desktop /effort) is honored implicitly.

Original reasons for cutting (still partly valid — weigh before re-adding)

  1. SDK does zero call-time validation of effortLevelapplyFlagSettings/setModel silently accept nonsense (max / 'test' / bad combos) and return undefined. Effort is unobservable at apply time.
  2. Models can't self-report their effort, so we can't confirm a switch took.
  3. Anthropic Desktop itself doesn't cross-validate model+effort (ships e.g. "Sonnet + xhigh" combos), so there's no canonical mapping to mirror.
  4. Only ground truth is result.modelUsage keys after a turn completes.

What re-adding involves

  • Adapt/revert 03ba3b4. The atomic mid-session switcher is applyFlagSettings({ model, effortLevel }) (see packages/daemon/src/router.ts).
  • Re-surface effort in the model chip — packages/app/src/components/transcript/input-bar.tsx.
  • Protocol: modelEntry / getModels / setSessionSelection schemas in packages/protocol/src/index.ts; consider restoring supportedEffortLevels in packages/daemon/src/models-metadata.ts.

Open question to settle first

Before shipping UI, empirically confirm effort actually changes Opus 4.8 output — reuse the probe-script pattern from 2adbc8e against claude-opus-4-8[1m]. If it's still unobservable, the cut reasoning holds and we keep trusting the account default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions