Commit cc478e5
authored
🤖 feat: Add PostHog experiments integration (#1179)
Add backend-first PostHog feature flag evaluation for remote-controlled
experiments, starting with Post-Compaction Context.
## Changes
### Backend (ExperimentsService)
- Evaluate PostHog feature flags via `posthog-node`
- Disk cache (`~/.mux/feature_flags.json`) with TTL-based refresh
- Fail-closed behavior (unknown = disabled)
- Disable calls when telemetry is off
### Telemetry enrichment (TelemetryService)
- `setFeatureFlagVariant()` adds `$feature/<flagKey>` to all events
- Enables variant breakdown in PostHog analytics
### oRPC layer
- `experiments.getAll`: Get all experiment values
- `experiments.reload`: Force refresh from PostHog
### Frontend (ExperimentsContext)
- Fetch remote experiments on mount
- Priority: remote PostHog > local toggle > default
- Read-only UI when experiment is remote-controlled
### Backend authoritative gating (WorkspaceService)
- `sendMessage()` resolves experiment from PostHog when enabled
- `list()` decides `includePostCompaction` based on experiment
### Type consolidation
- `ExperimentValueSchema` (Zod) is single source of truth
- `ExperimentValue` type derived via `z.infer` in types.ts
## Bug fixes (unrelated)
- Fixed backgroundProcessManager exit race condition
- Fixed telemetry client Node.js compatibility
- Relaxed timing test threshold in authMiddleware
---
<details>
<summary>📋 Implementation Plan</summary>
# PostHog early access, feature flags, and experiments (Mux)
## Goals
1. Run **remote-controlled** experiments/feature flags from PostHog
(starting with **`Post-Compaction Context`**).
2. Keep PostHog interactions **backend-first** to avoid ad-blocker
issues (Mux already forwards telemetry to Node via oRPC).
3. Ensure telemetry/error events can be **analyzed by variant** in
PostHog (required for experiments + server-side event capture).
4. Preserve Mux privacy guarantees: no project names, file paths, user
prompts, etc. sent to PostHog.
## Recommendation (architecture)
### ✅ Approach A (recommended): Backend-owned flag/experiment evaluation
+ oRPC exposure
**Net new product LoC (est.): ~250–450**
- Use the existing main-process `posthog-node` client to:
- evaluate flags/experiment variants
- emit `$feature_flag_called` exposure events
- attach `$feature/<flagKey>` properties to telemetry events so PostHog
can break down metrics by variant
- Expose a small oRPC surface to the renderer for:
- displaying current experiment variant in Settings → Experiments
- gating UI behavior where needed
- Keep **local experiment toggles only as a dev/override fallback**
(optional), not the default source of truth.
Why this fits Mux:
- Mux already routes telemetry through the backend “to avoid ad-blocker
issues.”
- The `Post-Compaction Context` experiment gates backend behavior
(attachment injection), so backend must know the assignment anyway.
### Alternative B: Renderer uses `posthog-js` for flags/experiments
(keep backend telemetry)
**Net new product LoC (est.): ~350–650**
Pros:
- Easiest way to adopt **Early Access Feature Management** (PostHog
notes it’s JS-web-SDK-only).
Cons:
- Reintroduces the ad-blocker/network fragility we intentionally avoided
for telemetry.
- Requires careful identity bridging (distinctId must match backend’s ID
or you lose experiment attribution).
## Proposed flow (Approach A)
```mermaid
flowchart TD
A[Renderer] -->|oRPC: telemetry.track| B[Main process TelemetryService]
A -->|oRPC: experiments.getAll| C[Main process ExperimentsService]
C -->|getFeatureFlag: post-compaction-context| D[PostHog Decide / Flags]
C -->|cache variant| C
B -->|capture events| E[PostHog events]
B -->|include $feature/post-compaction-context| E
```
## Implementation plan
### 1) Backend: add a PostHog-backed experiments/flags service
- Create `src/node/services/experimentsService.ts` (name TBD) that
depends on:
- `TelemetryService` (for `distinctId` + access to a `PostHog` client),
or
- a shared `PostHogClientService` if you want to refactor
TelemetryService into a reusable PostHog wrapper.
Core responsibilities:
- `getDistinctId()` (or expose from TelemetryService) – **single stable
identity** used for both:
- flag evaluation
- telemetry capture
- `getExperimentVariant(experimentId: ExperimentId): Promise<string |
boolean | null>`
- Map `ExperimentId` → PostHog **feature flag key**. (Conveniently,
current `EXPERIMENT_IDS.*` already look like flag keys.)
- Call `posthog.getFeatureFlag(flagKey, distinctId)`.
- This automatically emits `$feature_flag_called` (exposure) events when
appropriate.
- Cache result in-memory with a TTL (e.g., 5–15 min) to avoid
re-fetching on every UI render.
- `isExperimentEnabled(experimentId: ExperimentId): boolean`
- Converts the raw PostHog variant into a boolean gate.
- Suggested mapping for `post-compaction-context`:
- `"test"` / `true` → enabled
- `"control"` / `false` / `null` → disabled
Offline + startup behavior:
- Persist last-known assignments to disk in `~/.mux/feature_flags.json`
(or inside muxHome near `telemetry_id`).
- On startup:
- load cached values immediately (fast)
- refresh asynchronously in the background
- Fail closed: if PostHog is unreachable, default experiment to
**control** (disabled) unless cached value exists.
Feature-flag enablement rules:
- If telemetry is disabled (`MUX_DISABLE_TELEMETRY=1`, CI, test, etc.),
**do not** call PostHog for flags.
- Return `null` / “unknown” from the service.
- Renderer can fall back to local toggles (dev-only) or treat as
control.
- To test PostHog-driven experiments in an unpackaged/dev Electron
build, use the existing env opt-in:
- `MUX_ENABLE_TELEMETRY_IN_DEV=1`
### 2) Backend: attach experiment/flag info to telemetry events
PostHog’s docs explicitly require this for server-side capture.
Implement one of these (recommend #1):
1. **Manual property injection (preferred):**
- Add `$feature/<flagKey>` properties to captured events.
- Implementation idea: `TelemetryService.getBaseProperties()` merges in
a stable `this.featureFlagProperties` map populated by
`ExperimentsService`.
- For the initial experiment:
- `$feature/post-compaction-context: 'control' | 'test'` (or boolean)
depending on how you configure variants.
2. `sendFeatureFlags: true` on `posthog.capture()`
- Avoid unless you also enable local evaluation, otherwise it can add
extra requests per capture.
Also add:
- A tiny “experiment snapshot” helper so that **error_occurred** (and
other critical events) always include the variant, even if flags aren’t
fully loaded yet (use cached value).
### 3) oRPC: expose experiment state to the renderer
Add a new oRPC namespace, e.g. `experiments`:
- `experiments.getAll` → returns `Record<ExperimentId, { value: string |
boolean | null; source: 'posthog' | 'cache' | 'disabled' }>`
- `experiments.reload` (optional) → forces a refresh, useful for
debugging the Settings page.
Update:
- `src/common/orpc/schemas/api.ts` to include the new endpoints.
- `src/node/orpc/router.ts` to wire handlers.
- `src/node/orpc/context.ts` + `ServiceContainer` to register the new
service.
### 4) Frontend: update ExperimentsContext + Settings → Experiments
Target behavior:
- In packaged builds (telemetry enabled): show experiments as
**read-only** (variant + short description), since assignment is
remote-controlled.
- In dev/test or when PostHog flags are disabled/unavailable: fall back
to the existing local toggles.
Concrete steps:
- Add a new hook (or extend existing):
- `useRemoteExperiments()` → fetches `experiments.getAll` once and
stores in context.
- Update `useExperimentValue(EXPERIMENT_IDS.POST_COMPACTION_CONTEXT)` to
resolve in this order:
1. Remote PostHog assignment (if available)
2. Cached remote assignment (if remote temporarily unavailable)
3. LocalStorage toggle (dev fallback)
4. Default (`enabledByDefault`)
UI changes (`ExperimentsSection.tsx`):
- If remote assignment exists:
- render a disabled `Switch` (or replace with a badge) and show
`Variant: control/test`.
- Else:
- keep the current toggle UI.
### 5) Wire `Post-Compaction Context` gating to PostHog
Backend gating (authoritative):
- In `AgentSession` (or `WorkspaceService.sendMessage`), compute:
- `postCompactionContextEnabled =
experimentsService.isEnabled('post-compaction-context')`
- Use this instead of (or in preference to)
`options?.experiments?.postCompactionContext`.
Frontend gating (UI):
- Use the same resolved experiment value to:
- decide whether to request `includePostCompaction` in `workspace.list`
- show/hide the PostCompaction UI (Costs tab/sidebar)
**Recommended (practical) simplification:**
- Change `workspace.list({ includePostCompaction })` so that when
`includePostCompaction` is **omitted**, the backend decides based on
experiment state.
- This is likely necessary because `WorkspaceProvider` loads metadata
**before** `ExperimentsProvider` mounts today.
- It removes a “front-end must know experiment first” dependency and
avoids provider-tree churn.
### 6) Add minimal analytics events for the experiment (optional but
high-value)
To get actionable insights beyond “did users click it,” add 1–2
low-cardinality events:
- `compaction_performed`
- properties: `had_file_diffs: boolean`, `diff_count_b2: number`
- `post_compaction_context_injected`
- properties: `plan_included: boolean`, `diff_count_b2: number`
All properties must remain privacy-safe (counts + booleans only).
### 7) Tests
- Unit tests for `ExperimentsService`:
- caching + TTL
- disabled-by-env behavior
- disk cache load/save
- Unit test for `TelemetryService`:
- includes `$feature/post-compaction-context` when cached/available
- Update existing post-compaction tests if behavior changes from
“frontend-provided flag” → “backend-derived flag.”
## PostHog provisioning (via MCP) ✅
Since you’ve configured the PostHog MCP server, we can create the flag +
experiment as part of this integration (in Exec mode) rather than doing
it manually in the PostHog UI.
1. **Select the target PostHog project**
- `posthog_organization-details-get` (confirm current org)
- `posthog_projects-get` (pick `projectId`)
- `posthog_switch-project({ projectId })` (if needed)
2. **Create (or reuse) the feature flag** `post-compaction-context`
- Check for existence:
- `posthog_feature-flag-get-all` (search for key)
- or `posthog_feature-flag-get-definition({ flagKey:
'post-compaction-context' })`
- If missing:
- Prefer letting `posthog_experiment-create` create/update the
underlying flag (since experiments want explicit variants).
- Fallback: `posthog_create-feature-flag(...)` (boolean-only) and then
upgrade to variants via the experiment.
3. **Create (or reuse) the experiment** “Post-Compaction Context”
- Check existing experiments:
- `posthog_experiment-get-all` (avoid duplicates for the same
`feature_flag_key`)
- (Optional) sanity-check event names we’ll use for metrics:
- `posthog_event-definitions-list` (look for `error_occurred`,
`stream_completed`, `message_sent`, etc.)
- Create as **draft** first:
- `posthog_experiment-create({ feature_flag_key:
'post-compaction-context', variants: [{ key: 'control',
rollout_percentage: 50 }, { key: 'test', rollout_percentage: 50 }], ...
})`
- Suggested primary metric: mean `error_occurred`
- Suggested secondary metrics: mean `stream_completed`, mean
`message_sent`
- If we implement the optional new events in step 6, add
`post_compaction_context_injected` as a secondary metric (sanity-check
feature usage).
4. **Launch / stop the experiment**
- Launch after code ships: `posthog_experiment-update({ experimentId,
data: { launch: true } })`
- Stop/conclude: `posthog_experiment-update({ experimentId, data: {
conclude: 'won' | 'lost' | 'inconclusive' | 'stopped_early' | 'invalid',
conclusion_comment } })`
### Manual fallback (if MCP is unavailable)
- Create a **feature flag** key: `post-compaction-context`.
- Create an **experiment** using that flag key (variants: `control` vs
`test`).
- Choose at least one metric (e.g., `error_occurred`,
`stream_completed`, or `post_compaction_context_injected` if
implemented).
<details>
<summary>Notes on Early Access Feature Management</summary>
PostHog’s docs state Early Access management is currently only available
in the JavaScript Web SDK.
If we want “users opt into betas” inside Mux Settings:
- Either adopt `posthog-js` in the renderer specifically for early
access APIs, OR
- Implement early-access enrollment via PostHog APIs (will require auth
+ careful security), OR
- Keep Mux’s current local toggle approach for “labs” features and
reserve PostHog for experiments.
Given your immediate goal is AB testing `Post-Compaction Context`, I’d
start with backend feature flags/experiments first.
</details>
</details>
---
_Generated with `mux` • Model: `anthropic:claude-opus-4-5` • Thinking:
`high`_
---------
Signed-off-by: Thomas Kosiewski <[email protected]>1 parent a30adbc commit cc478e5
File tree
31 files changed
+1211
-55
lines changed- src
- browser
- components/Settings/sections
- contexts
- hooks
- cli
- common
- constants
- orpc
- schemas
- telemetry
- desktop
- node
- orpc
- services
- tests/ipc
31 files changed
+1211
-55
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
26 | 32 | | |
27 | 33 | | |
28 | 34 | | |
| |||
131 | 137 | | |
132 | 138 | | |
133 | 139 | | |
134 | | - | |
| 140 | + | |
135 | 141 | | |
136 | 142 | | |
137 | 143 | | |
138 | 144 | | |
139 | 145 | | |
140 | | - | |
| 146 | + | |
141 | 147 | | |
142 | 148 | | |
143 | 149 | | |
| |||
151 | 157 | | |
152 | 158 | | |
153 | 159 | | |
154 | | - | |
| 160 | + | |
155 | 161 | | |
156 | 162 | | |
157 | 163 | | |
| |||
163 | 169 | | |
164 | 170 | | |
165 | 171 | | |
166 | | - | |
| 172 | + | |
167 | 173 | | |
168 | 174 | | |
169 | 175 | | |
| |||
173 | 179 | | |
174 | 180 | | |
175 | 181 | | |
176 | | - | |
| 182 | + | |
177 | 183 | | |
178 | 184 | | |
179 | 185 | | |
| |||
Lines changed: 17 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
| 1 | + | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| |||
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
20 | | - | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
21 | 24 | | |
22 | 25 | | |
23 | 26 | | |
24 | 27 | | |
| 28 | + | |
| 29 | + | |
25 | 30 | | |
26 | 31 | | |
27 | | - | |
| 32 | + | |
28 | 33 | | |
29 | 34 | | |
30 | 35 | | |
| |||
43 | 48 | | |
44 | 49 | | |
45 | 50 | | |
46 | | - | |
| 51 | + | |
47 | 52 | | |
48 | 53 | | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
49 | 61 | | |
50 | 62 | | |
51 | 63 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
0 commit comments