Skip to content

Commit cc478e5

Browse files
authored
🤖 feat: Add PostHog experiments integration (#1179)
Add backend-first PostHog feature flag evaluation for remote-controlled experiments, starting with Post-Compaction Context. ## Changes ### Backend (ExperimentsService) - Evaluate PostHog feature flags via `posthog-node` - Disk cache (`~/.mux/feature_flags.json`) with TTL-based refresh - Fail-closed behavior (unknown = disabled) - Disable calls when telemetry is off ### Telemetry enrichment (TelemetryService) - `setFeatureFlagVariant()` adds `$feature/<flagKey>` to all events - Enables variant breakdown in PostHog analytics ### oRPC layer - `experiments.getAll`: Get all experiment values - `experiments.reload`: Force refresh from PostHog ### Frontend (ExperimentsContext) - Fetch remote experiments on mount - Priority: remote PostHog > local toggle > default - Read-only UI when experiment is remote-controlled ### Backend authoritative gating (WorkspaceService) - `sendMessage()` resolves experiment from PostHog when enabled - `list()` decides `includePostCompaction` based on experiment ### Type consolidation - `ExperimentValueSchema` (Zod) is single source of truth - `ExperimentValue` type derived via `z.infer` in types.ts ## Bug fixes (unrelated) - Fixed backgroundProcessManager exit race condition - Fixed telemetry client Node.js compatibility - Relaxed timing test threshold in authMiddleware --- <details> <summary>📋 Implementation Plan</summary> # PostHog early access, feature flags, and experiments (Mux) ## Goals 1. Run **remote-controlled** experiments/feature flags from PostHog (starting with **`Post-Compaction Context`**). 2. Keep PostHog interactions **backend-first** to avoid ad-blocker issues (Mux already forwards telemetry to Node via oRPC). 3. Ensure telemetry/error events can be **analyzed by variant** in PostHog (required for experiments + server-side event capture). 4. Preserve Mux privacy guarantees: no project names, file paths, user prompts, etc. sent to PostHog. ## Recommendation (architecture) ### ✅ Approach A (recommended): Backend-owned flag/experiment evaluation + oRPC exposure **Net new product LoC (est.): ~250–450** - Use the existing main-process `posthog-node` client to: - evaluate flags/experiment variants - emit `$feature_flag_called` exposure events - attach `$feature/<flagKey>` properties to telemetry events so PostHog can break down metrics by variant - Expose a small oRPC surface to the renderer for: - displaying current experiment variant in Settings → Experiments - gating UI behavior where needed - Keep **local experiment toggles only as a dev/override fallback** (optional), not the default source of truth. Why this fits Mux: - Mux already routes telemetry through the backend “to avoid ad-blocker issues.” - The `Post-Compaction Context` experiment gates backend behavior (attachment injection), so backend must know the assignment anyway. ### Alternative B: Renderer uses `posthog-js` for flags/experiments (keep backend telemetry) **Net new product LoC (est.): ~350–650** Pros: - Easiest way to adopt **Early Access Feature Management** (PostHog notes it’s JS-web-SDK-only). Cons: - Reintroduces the ad-blocker/network fragility we intentionally avoided for telemetry. - Requires careful identity bridging (distinctId must match backend’s ID or you lose experiment attribution). ## Proposed flow (Approach A) ```mermaid flowchart TD A[Renderer] -->|oRPC: telemetry.track| B[Main process TelemetryService] A -->|oRPC: experiments.getAll| C[Main process ExperimentsService] C -->|getFeatureFlag: post-compaction-context| D[PostHog Decide / Flags] C -->|cache variant| C B -->|capture events| E[PostHog events] B -->|include $feature/post-compaction-context| E ``` ## Implementation plan ### 1) Backend: add a PostHog-backed experiments/flags service - Create `src/node/services/experimentsService.ts` (name TBD) that depends on: - `TelemetryService` (for `distinctId` + access to a `PostHog` client), or - a shared `PostHogClientService` if you want to refactor TelemetryService into a reusable PostHog wrapper. Core responsibilities: - `getDistinctId()` (or expose from TelemetryService) – **single stable identity** used for both: - flag evaluation - telemetry capture - `getExperimentVariant(experimentId: ExperimentId): Promise<string | boolean | null>` - Map `ExperimentId` → PostHog **feature flag key**. (Conveniently, current `EXPERIMENT_IDS.*` already look like flag keys.) - Call `posthog.getFeatureFlag(flagKey, distinctId)`. - This automatically emits `$feature_flag_called` (exposure) events when appropriate. - Cache result in-memory with a TTL (e.g., 5–15 min) to avoid re-fetching on every UI render. - `isExperimentEnabled(experimentId: ExperimentId): boolean` - Converts the raw PostHog variant into a boolean gate. - Suggested mapping for `post-compaction-context`: - `"test"` / `true` → enabled - `"control"` / `false` / `null` → disabled Offline + startup behavior: - Persist last-known assignments to disk in `~/.mux/feature_flags.json` (or inside muxHome near `telemetry_id`). - On startup: - load cached values immediately (fast) - refresh asynchronously in the background - Fail closed: if PostHog is unreachable, default experiment to **control** (disabled) unless cached value exists. Feature-flag enablement rules: - If telemetry is disabled (`MUX_DISABLE_TELEMETRY=1`, CI, test, etc.), **do not** call PostHog for flags. - Return `null` / “unknown” from the service. - Renderer can fall back to local toggles (dev-only) or treat as control. - To test PostHog-driven experiments in an unpackaged/dev Electron build, use the existing env opt-in: - `MUX_ENABLE_TELEMETRY_IN_DEV=1` ### 2) Backend: attach experiment/flag info to telemetry events PostHog’s docs explicitly require this for server-side capture. Implement one of these (recommend #1): 1. **Manual property injection (preferred):** - Add `$feature/<flagKey>` properties to captured events. - Implementation idea: `TelemetryService.getBaseProperties()` merges in a stable `this.featureFlagProperties` map populated by `ExperimentsService`. - For the initial experiment: - `$feature/post-compaction-context: 'control' | 'test'` (or boolean) depending on how you configure variants. 2. `sendFeatureFlags: true` on `posthog.capture()` - Avoid unless you also enable local evaluation, otherwise it can add extra requests per capture. Also add: - A tiny “experiment snapshot” helper so that **error_occurred** (and other critical events) always include the variant, even if flags aren’t fully loaded yet (use cached value). ### 3) oRPC: expose experiment state to the renderer Add a new oRPC namespace, e.g. `experiments`: - `experiments.getAll` → returns `Record<ExperimentId, { value: string | boolean | null; source: 'posthog' | 'cache' | 'disabled' }>` - `experiments.reload` (optional) → forces a refresh, useful for debugging the Settings page. Update: - `src/common/orpc/schemas/api.ts` to include the new endpoints. - `src/node/orpc/router.ts` to wire handlers. - `src/node/orpc/context.ts` + `ServiceContainer` to register the new service. ### 4) Frontend: update ExperimentsContext + Settings → Experiments Target behavior: - In packaged builds (telemetry enabled): show experiments as **read-only** (variant + short description), since assignment is remote-controlled. - In dev/test or when PostHog flags are disabled/unavailable: fall back to the existing local toggles. Concrete steps: - Add a new hook (or extend existing): - `useRemoteExperiments()` → fetches `experiments.getAll` once and stores in context. - Update `useExperimentValue(EXPERIMENT_IDS.POST_COMPACTION_CONTEXT)` to resolve in this order: 1. Remote PostHog assignment (if available) 2. Cached remote assignment (if remote temporarily unavailable) 3. LocalStorage toggle (dev fallback) 4. Default (`enabledByDefault`) UI changes (`ExperimentsSection.tsx`): - If remote assignment exists: - render a disabled `Switch` (or replace with a badge) and show `Variant: control/test`. - Else: - keep the current toggle UI. ### 5) Wire `Post-Compaction Context` gating to PostHog Backend gating (authoritative): - In `AgentSession` (or `WorkspaceService.sendMessage`), compute: - `postCompactionContextEnabled = experimentsService.isEnabled('post-compaction-context')` - Use this instead of (or in preference to) `options?.experiments?.postCompactionContext`. Frontend gating (UI): - Use the same resolved experiment value to: - decide whether to request `includePostCompaction` in `workspace.list` - show/hide the PostCompaction UI (Costs tab/sidebar) **Recommended (practical) simplification:** - Change `workspace.list({ includePostCompaction })` so that when `includePostCompaction` is **omitted**, the backend decides based on experiment state. - This is likely necessary because `WorkspaceProvider` loads metadata **before** `ExperimentsProvider` mounts today. - It removes a “front-end must know experiment first” dependency and avoids provider-tree churn. ### 6) Add minimal analytics events for the experiment (optional but high-value) To get actionable insights beyond “did users click it,” add 1–2 low-cardinality events: - `compaction_performed` - properties: `had_file_diffs: boolean`, `diff_count_b2: number` - `post_compaction_context_injected` - properties: `plan_included: boolean`, `diff_count_b2: number` All properties must remain privacy-safe (counts + booleans only). ### 7) Tests - Unit tests for `ExperimentsService`: - caching + TTL - disabled-by-env behavior - disk cache load/save - Unit test for `TelemetryService`: - includes `$feature/post-compaction-context` when cached/available - Update existing post-compaction tests if behavior changes from “frontend-provided flag” → “backend-derived flag.” ## PostHog provisioning (via MCP) ✅ Since you’ve configured the PostHog MCP server, we can create the flag + experiment as part of this integration (in Exec mode) rather than doing it manually in the PostHog UI. 1. **Select the target PostHog project** - `posthog_organization-details-get` (confirm current org) - `posthog_projects-get` (pick `projectId`) - `posthog_switch-project({ projectId })` (if needed) 2. **Create (or reuse) the feature flag** `post-compaction-context` - Check for existence: - `posthog_feature-flag-get-all` (search for key) - or `posthog_feature-flag-get-definition({ flagKey: 'post-compaction-context' })` - If missing: - Prefer letting `posthog_experiment-create` create/update the underlying flag (since experiments want explicit variants). - Fallback: `posthog_create-feature-flag(...)` (boolean-only) and then upgrade to variants via the experiment. 3. **Create (or reuse) the experiment** “Post-Compaction Context” - Check existing experiments: - `posthog_experiment-get-all` (avoid duplicates for the same `feature_flag_key`) - (Optional) sanity-check event names we’ll use for metrics: - `posthog_event-definitions-list` (look for `error_occurred`, `stream_completed`, `message_sent`, etc.) - Create as **draft** first: - `posthog_experiment-create({ feature_flag_key: 'post-compaction-context', variants: [{ key: 'control', rollout_percentage: 50 }, { key: 'test', rollout_percentage: 50 }], ... })` - Suggested primary metric: mean `error_occurred` - Suggested secondary metrics: mean `stream_completed`, mean `message_sent` - If we implement the optional new events in step 6, add `post_compaction_context_injected` as a secondary metric (sanity-check feature usage). 4. **Launch / stop the experiment** - Launch after code ships: `posthog_experiment-update({ experimentId, data: { launch: true } })` - Stop/conclude: `posthog_experiment-update({ experimentId, data: { conclude: 'won' | 'lost' | 'inconclusive' | 'stopped_early' | 'invalid', conclusion_comment } })` ### Manual fallback (if MCP is unavailable) - Create a **feature flag** key: `post-compaction-context`. - Create an **experiment** using that flag key (variants: `control` vs `test`). - Choose at least one metric (e.g., `error_occurred`, `stream_completed`, or `post_compaction_context_injected` if implemented). <details> <summary>Notes on Early Access Feature Management</summary> PostHog’s docs state Early Access management is currently only available in the JavaScript Web SDK. If we want “users opt into betas” inside Mux Settings: - Either adopt `posthog-js` in the renderer specifically for early access APIs, OR - Implement early-access enrollment via PostHog APIs (will require auth + careful security), OR - Keep Mux’s current local toggle approach for “labs” features and reserve PostHog for experiments. Given your immediate goal is AB testing `Post-Compaction Context`, I’d start with backend feature flags/experiments first. </details> </details> --- _Generated with `mux` • Model: `anthropic:claude-opus-4-5` • Thinking: `high`_ --------- Signed-off-by: Thomas Kosiewski <[email protected]>
1 parent a30adbc commit cc478e5

31 files changed

+1211
-55
lines changed

Makefile

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,12 @@
2323
# AVOID CONDITIONAL BRANCHES (if/else) IN BUILD TARGETS AT ALL COSTS.
2424
# Branches reduce reproducibility - builds should fail fast with clear errors
2525
# if dependencies are missing, not silently fall back to different behavior.
26+
#
27+
# Telemetry in Development:
28+
# Telemetry is disabled by default in dev mode (MUX_DISABLE_TELEMETRY=1).
29+
# To enable it (e.g., for testing PostHog experiments), set:
30+
# MUX_ENABLE_TELEMETRY_IN_DEV=1 make dev
31+
# This single env var is sufficient - no need to also set MUX_DISABLE_TELEMETRY=0.
2632

2733
# Use PATH-resolved bash for portability across different systems.
2834
# - Windows: /usr/bin/bash doesn't exist in Chocolatey's make environment or GitHub Actions
@@ -131,13 +137,13 @@ dev: node_modules/.installed build-main ## Start development server (Vite + node
131137
@echo "Starting dev mode (3 watchers: nodemon for main process, esbuild for api, vite for renderer)..."
132138
# On Windows, use npm run because bunx doesn't correctly pass arguments to concurrently
133139
# https://github.com/oven-sh/bun/issues/18275
134-
@MUX_DISABLE_TELEMETRY=$(or $(MUX_DISABLE_TELEMETRY),1) NODE_OPTIONS="--max-old-space-size=4096" npm x concurrently -k --raw \
140+
@MUX_DISABLE_TELEMETRY=$(if $(MUX_ENABLE_TELEMETRY_IN_DEV),,$(or $(MUX_DISABLE_TELEMETRY),1)) NODE_OPTIONS="--max-old-space-size=4096" npm x concurrently -k --raw \
135141
"bun x nodemon --watch src --watch tsconfig.main.json --watch tsconfig.json --ext ts,tsx,json --ignore dist --ignore node_modules --exec node scripts/build-main-watch.js" \
136142
"npx esbuild src/cli/api.ts --bundle --format=esm --platform=node --target=node20 --outfile=dist/cli/api.mjs --external:zod --external:commander --external:@trpc/server --watch" \
137143
"vite"
138144
else
139145
dev: node_modules/.installed build-main build-preload ## Start development server (Vite + tsgo watcher for 10x faster type checking)
140-
@MUX_DISABLE_TELEMETRY=$(or $(MUX_DISABLE_TELEMETRY),1) bun x concurrently -k \
146+
@MUX_DISABLE_TELEMETRY=$(if $(MUX_ENABLE_TELEMETRY_IN_DEV),,$(or $(MUX_DISABLE_TELEMETRY),1)) bun x concurrently -k \
141147
"bun x concurrently \"$(TSGO) -w -p tsconfig.main.json\" \"bun x tsc-alias -w -p tsconfig.main.json\"" \
142148
"bun x esbuild src/cli/api.ts --bundle --format=esm --platform=node --target=node20 --outfile=dist/cli/api.mjs --external:zod --external:commander --external:@trpc/server --watch" \
143149
"vite"
@@ -151,7 +157,7 @@ dev-server: node_modules/.installed build-main ## Start server mode with hot rel
151157
@echo ""
152158
@echo "For remote access: make dev-server VITE_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0"
153159
@# On Windows, use npm run because bunx doesn't correctly pass arguments
154-
@MUX_DISABLE_TELEMETRY=$(or $(MUX_DISABLE_TELEMETRY),1) npmx concurrently -k \
160+
@MUX_DISABLE_TELEMETRY=$(if $(MUX_ENABLE_TELEMETRY_IN_DEV),,$(or $(MUX_DISABLE_TELEMETRY),1)) npmx concurrently -k \
155161
"npmx nodemon --watch src --watch tsconfig.main.json --watch tsconfig.json --ext ts,tsx,json --ignore dist --ignore node_modules --exec node scripts/build-main-watch.js" \
156162
"npx esbuild src/cli/api.ts --bundle --format=esm --platform=node --target=node20 --outfile=dist/cli/api.mjs --external:zod --external:commander --external:@trpc/server --watch" \
157163
"npmx nodemon --watch dist/cli/index.js --watch dist/cli/server.js --delay 500ms --exec \"node dist/cli/index.js server --host $(or $(BACKEND_HOST),localhost) --port $(or $(BACKEND_PORT),3000)\"" \
@@ -163,7 +169,7 @@ dev-server: node_modules/.installed build-main ## Start server mode with hot rel
163169
@echo " Frontend (with HMR): http://$(or $(VITE_HOST),localhost):$(or $(VITE_PORT),5173)"
164170
@echo ""
165171
@echo "For remote access: make dev-server VITE_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0"
166-
@MUX_DISABLE_TELEMETRY=$(or $(MUX_DISABLE_TELEMETRY),1) bun x concurrently -k \
172+
@MUX_DISABLE_TELEMETRY=$(if $(MUX_ENABLE_TELEMETRY_IN_DEV),,$(or $(MUX_DISABLE_TELEMETRY),1)) bun x concurrently -k \
167173
"bun x concurrently \"$(TSGO) -w -p tsconfig.main.json\" \"bun x tsc-alias -w -p tsconfig.main.json\"" \
168174
"bun x esbuild src/cli/api.ts --bundle --format=esm --platform=node --target=node20 --outfile=dist/cli/api.mjs --external:zod --external:commander --external:@trpc/server --watch" \
169175
"bun x nodemon --watch dist/cli/index.js --watch dist/cli/server.js --delay 500ms --exec 'NODE_ENV=development node dist/cli/index.js server --host $(or $(BACKEND_HOST),localhost) --port $(or $(BACKEND_PORT),3000)'" \
@@ -173,7 +179,7 @@ endif
173179

174180

175181
start: node_modules/.installed build-main build-preload build-static ## Build and start Electron app
176-
@NODE_ENV=development MUX_DISABLE_TELEMETRY=$(or $(MUX_DISABLE_TELEMETRY),1) bunx electron --remote-debugging-port=9222 .
182+
@NODE_ENV=development MUX_DISABLE_TELEMETRY=$(if $(MUX_ENABLE_TELEMETRY_IN_DEV),,$(or $(MUX_DISABLE_TELEMETRY),1)) bunx electron --remote-debugging-port=9222 .
177183

178184
## Build targets (can run in parallel)
179185
build: node_modules/.installed src/version.ts build-renderer build-main build-preload build-icons build-static ## Build all targets

src/browser/components/Settings/sections/ExperimentsSection.tsx

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
1-
import React, { useCallback } from "react";
2-
import { useExperiment } from "@/browser/contexts/ExperimentsContext";
1+
import React, { useCallback, useMemo } from "react";
2+
import { useExperiment, useRemoteExperimentValue } from "@/browser/contexts/ExperimentsContext";
33
import {
44
getExperimentList,
55
EXPERIMENT_IDS,
66
type ExperimentId,
77
} from "@/common/constants/experiments";
88
import { Switch } from "@/browser/components/ui/switch";
99
import { useWorkspaceContext } from "@/browser/contexts/WorkspaceContext";
10+
import { useTelemetry } from "@/browser/hooks/useTelemetry";
1011

1112
interface ExperimentRowProps {
1213
experimentId: ExperimentId;
@@ -17,14 +18,18 @@ interface ExperimentRowProps {
1718

1819
function ExperimentRow(props: ExperimentRowProps) {
1920
const [enabled, setEnabled] = useExperiment(props.experimentId);
20-
const { onToggle } = props;
21+
const remote = useRemoteExperimentValue(props.experimentId);
22+
const telemetry = useTelemetry();
23+
const { onToggle, experimentId } = props;
2124

2225
const handleToggle = useCallback(
2326
(value: boolean) => {
2427
setEnabled(value);
28+
// Track the override for analytics
29+
telemetry.experimentOverridden(experimentId, remote?.value ?? null, value);
2530
onToggle?.(value);
2631
},
27-
[setEnabled, onToggle]
32+
[setEnabled, telemetry, experimentId, remote?.value, onToggle]
2833
);
2934

3035
return (
@@ -43,9 +48,16 @@ function ExperimentRow(props: ExperimentRowProps) {
4348
}
4449

4550
export function ExperimentsSection() {
46-
const experiments = getExperimentList();
51+
const allExperiments = getExperimentList();
4752
const { refreshWorkspaceMetadata } = useWorkspaceContext();
4853

54+
// Only show user-overridable experiments (non-overridable ones are hidden since users can't change them)
55+
const experiments = useMemo(
56+
() =>
57+
allExperiments.filter((exp) => exp.showInSettings !== false && exp.userOverridable === true),
58+
[allExperiments]
59+
);
60+
4961
// When post-compaction experiment is toggled, refresh metadata to fetch/clear bundled state
5062
const handlePostCompactionToggle = useCallback(() => {
5163
void refreshWorkspaceMetadata();
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
import { cleanup, render, waitFor } from "@testing-library/react";
2+
import { afterEach, beforeEach, describe, expect, mock, test } from "bun:test";
3+
import { GlobalWindow } from "happy-dom";
4+
import { ExperimentsProvider, useExperimentValue } from "./ExperimentsContext";
5+
import { EXPERIMENT_IDS } from "@/common/constants/experiments";
6+
import type { ExperimentValue } from "@/common/orpc/types";
7+
import type { APIClient } from "@/browser/contexts/API";
8+
import type { RecursivePartial } from "@/browser/testUtils";
9+
10+
let currentClientMock: RecursivePartial<APIClient> = {};
11+
void mock.module("@/browser/contexts/API", () => ({
12+
useAPI: () => ({
13+
api: currentClientMock as APIClient,
14+
status: "connected" as const,
15+
error: null,
16+
}),
17+
APIProvider: ({ children }: { children: React.ReactNode }) => children,
18+
}));
19+
20+
describe("ExperimentsProvider", () => {
21+
beforeEach(() => {
22+
globalThis.window = new GlobalWindow() as unknown as Window & typeof globalThis;
23+
globalThis.document = globalThis.window.document;
24+
globalThis.window.localStorage.clear();
25+
});
26+
27+
afterEach(() => {
28+
cleanup();
29+
globalThis.window = undefined as unknown as Window & typeof globalThis;
30+
globalThis.document = undefined as unknown as Document;
31+
currentClientMock = {};
32+
});
33+
34+
test("polls getAll until remote variants are available", async () => {
35+
let callCount = 0;
36+
37+
const getAllMock = mock(() => {
38+
callCount += 1;
39+
40+
if (callCount === 1) {
41+
return Promise.resolve({
42+
[EXPERIMENT_IDS.POST_COMPACTION_CONTEXT]: { value: null, source: "cache" },
43+
} satisfies Record<string, ExperimentValue>);
44+
}
45+
46+
return Promise.resolve({
47+
[EXPERIMENT_IDS.POST_COMPACTION_CONTEXT]: { value: "test", source: "posthog" },
48+
} satisfies Record<string, ExperimentValue>);
49+
});
50+
51+
currentClientMock = {
52+
experiments: {
53+
getAll: getAllMock,
54+
reload: mock(() => Promise.resolve()),
55+
},
56+
};
57+
58+
function Observer() {
59+
const enabled = useExperimentValue(EXPERIMENT_IDS.POST_COMPACTION_CONTEXT);
60+
return <div data-testid="enabled">{String(enabled)}</div>;
61+
}
62+
63+
const { getByTestId } = render(
64+
<ExperimentsProvider>
65+
<Observer />
66+
</ExperimentsProvider>
67+
);
68+
69+
expect(getByTestId("enabled").textContent).toBe("false");
70+
71+
await waitFor(() => {
72+
expect(getByTestId("enabled").textContent).toBe("true");
73+
});
74+
75+
expect(getAllMock.mock.calls.length).toBeGreaterThanOrEqual(2);
76+
});
77+
});

0 commit comments

Comments
 (0)