Skip to content

fix(clickhouse): raise binary type complexity limit so nested JSON columns stay readable#4076

Open
matt-aitken wants to merge 1 commit into
mainfrom
feature/tri-11463-queue-page-running-count-doesnt-match-runs-table-executing
Open

fix(clickhouse): raise binary type complexity limit so nested JSON columns stay readable#4076
matt-aitken wants to merge 1 commit into
mainfrom
feature/tri-11463-queue-page-running-count-doesnt-match-runs-table-executing

Conversation

@matt-aitken

Copy link
Copy Markdown
Member

Summary

Runs whose output or error payload produces a deeply nested type could disappear from the runs list: the status filter still counted them, but the list itself came back empty. This makes those runs visible again.

Root cause

output and error are stored as ClickHouse JSON columns. For a sufficiently nested payload, the materialized type's binary encoding exceeds the default input_format_binary_max_type_complexity of 1000. When the runs-list query reads the column back it throws Code 117 ... complexity limit exceeded, which fails the entire query, so no rows render. Status aggregates never touch these columns, so the count kept working, hence the mismatch.

Fix

Raise input_format_binary_max_type_complexity on the ClickHouse client defaults so every query can read these columns. The setting only widens a guard against pathological binary type payloads; for our own data it is safe to raise.

…lumns stay readable

Runs whose output or error payloads produce a deeply nested materialized
type can exceed ClickHouse's default binary type complexity of 1000 when
the JSON-typed column is read back. The read fails for the whole query, so
those runs vanish from the runs list even though they are still counted by
status aggregates. Set input_format_binary_max_type_complexity on the client
so every query reads these columns without erroring.
@changeset-bot

changeset-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 1808605

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

In the ClickhouseClient constructor, a new clickhouse_settings entry sets input_format_binary_max_type_complexity to 100000, overriding the default threshold that caused queries to fail when reading deeply nested JSON-typed columns. A changelog entry documents this as a fix preventing runs with complex output or error payloads from disappearing from the runs list.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR body is missing the template's issue reference, checklist, testing steps, changelog, and screenshots sections. Add the Closes #issue line and fill in the checklist, testing steps, changelog, and screenshots sections per the template.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the ClickHouse setting change and its effect on nested JSON columns.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/tri-11463-queue-page-running-count-doesnt-match-runs-table-executing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

Comment on lines +75 to +78
// Deeply nested JSON-typed columns (e.g. run output/error) can exceed
// the default binary type complexity of 1000 when read back, which
// fails the whole query. Raise the ceiling so those rows stay readable.
input_format_binary_max_type_complexity: 100000,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Setting name may not apply when queries use JSON format

The new setting input_format_binary_max_type_complexity at internal-packages/clickhouse/src/client/client.ts:78 relates to binary format parsing. However, all query methods in this client use JSONEachRow or JSONCompactEachRow format (e.g. internal-packages/clickhouse/src/client/client.ts:168, internal-packages/clickhouse/src/client/client.ts:450). It's possible the @clickhouse/client Node.js library uses binary format internally for metadata or type description exchange even when the query format is JSON-based — if so, this setting would correctly prevent failures. But if the setting only applies to queries explicitly using RowBinary/Native formats, it may have no effect. The PR author likely tested this in production against the actual failure, so it probably does help, but confirming with ClickHouse docs or the client library internals would be worthwhile.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal-packages/clickhouse/src/client/client.ts (1)

75-78: 📐 Maintainability & Code Quality | 🔵 Trivial | 🏗️ Heavy lift

Add a regression test for the nested-JSON read path.

This is the kind of low-level client fix that's easy to regress in a later settings refactor. Please add an integration test in internal-packages/clickhouse/src/client/client.test.ts that inserts/reads a row whose nested JSON type complexity exceeds the old limit, so the disappearing-runs bug stays covered. As per coding guidelines, "Use Vitest for tests, and never mock anything; use testcontainers instead".

Source: Coding guidelines


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 475cc0e0-d479-40be-bb35-69ccbe77c89b

📥 Commits

Reviewing files that changed from the base of the PR and between 4ef8baa and 1808605.

📒 Files selected for processing (2)
  • .server-changes/clickhouse-binary-type-complexity.md
  • internal-packages/clickhouse/src/client/client.ts
📜 Review details
⏰ Context from checks skipped due to timeout. (25)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (10, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (2, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (8, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (11, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (9, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (1, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (12, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (5, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (10, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (9, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (7, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (4, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (3, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (6, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 10)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • internal-packages/clickhouse/src/client/client.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

  • internal-packages/clickhouse/src/client/client.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • internal-packages/clickhouse/src/client/client.ts
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs}: Use pnpm run typecheck for changes in apps (apps/*) and internal packages (internal-packages/*), and never use build to verify those changes.
Use Vitest for tests, and never mock anything; use testcontainers instead.
Prefer static imports over dynamic import(), and only use dynamic imports for unresolved circular dependencies, genuine code-splitting needs, or conditional runtime loading.

Files:

  • internal-packages/clickhouse/src/client/client.ts
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs,md,mdx}

📄 CodeRabbit inference engine (CLAUDE.md)

Always import from @trigger.dev/sdk when writing Trigger.dev tasks; never use @trigger.dev/sdk/v3 or deprecated client.defineJob.

Files:

  • internal-packages/clickhouse/src/client/client.ts
🧠 Learnings (10)
📚 Learning: 2026-05-14T14:54:39.095Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3545
File: .server-changes/agent-view-sessions.md:10-10
Timestamp: 2026-05-14T14:54:39.095Z
Learning: In the `trigger.dev` repository, do not flag inconsistent dot vs slash notation in route/path strings inside `.server-changes/*.md` files. These markdown files are consumed verbatim into the changelog, so the mixed notation (e.g., `resources.orgs.../runs.$runParam/...`) is intentional and should be preserved as-is.

Applied to files:

  • .server-changes/clickhouse-binary-type-complexity.md
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • internal-packages/clickhouse/src/client/client.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • internal-packages/clickhouse/src/client/client.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.

Applied to files:

  • internal-packages/clickhouse/src/client/client.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.

Applied to files:

  • internal-packages/clickhouse/src/client/client.ts
📚 Learning: 2026-06-13T19:53:13.759Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3937
File: packages/trigger-sdk/skills/realtime-and-frontend/SKILL.md:258-260
Timestamp: 2026-06-13T19:53:13.759Z
Learning: When reviewing code that uses `trigger.dev/react-hooks`’s `useRealtimeRun`, preserve the call signature where the first argument is the full realtime handle object (not `handle.id`). This is intentional to maintain type-safety and is consistent with the official docs; do not suggest changing the first argument from the handle object to `handle.id`.

Applied to files:

  • internal-packages/clickhouse/src/client/client.ts
📚 Learning: 2026-06-17T17:13:49.929Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3948
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.bulk-actions.$bulkActionParam/route.tsx:48-62
Timestamp: 2026-06-17T17:13:49.929Z
Learning: In triggerdotdev/trigger.dev, within `dashboardLoader`/`dashboardAction` (or similar context resolver code) whenever you resolve an organization ID from an organization slug for RBAC/enterprise authorization scope, always read from the primary Prisma client (`prisma`), not `$replica`. Using `$replica` can hit replica-lag and cause the RBAC lookup/authorization to run without the correct org scope (bypassing intended role enforcement). Implement the slug→org lookup with `prisma.organization.findFirst(...)` (or equivalent primary-client query) and add an inline comment documenting why the primary client is required (replica lag could lead to unscoped RBAC checks).

Applied to files:

  • internal-packages/clickhouse/src/client/client.ts
📚 Learning: 2026-06-23T13:04:21.413Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4023
File: apps/webapp/app/services/upsertBranch.server.ts:14-18
Timestamp: 2026-06-23T13:04:21.413Z
Learning: In TypeScript, it’s valid to `import { type X }` and then use `typeof X` in a type-only position, e.g. `type Alias = z.infer<typeof X>`. The `type` modifier suppresses the runtime import, but the type checker still has the full exported type so `z.infer<typeof X>` can resolve correctly. In code reviews, don’t flag this as a TypeScript compile error as long as `typeof X` is used in a type context (e.g., with `z.infer`, `type` aliases, generics), not as a runtime value.

Applied to files:

  • internal-packages/clickhouse/src/client/client.ts
📚 Learning: 2026-06-04T18:16:35.386Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3836
File: apps/supervisor/src/backpressure/backpressureMonitor.ts:3-5
Timestamp: 2026-06-04T18:16:35.386Z
Learning: When reviewing TypeScript in this repo, apply the rule “prefer type aliases over interfaces” only to data/object shapes and union/intersection type modeling. If an interface is being used as a behavioral contract for collaborators to implement (e.g., method-shape interfaces that define required behavior, such as `BackpressureLogger` / `BackpressureSignalSource` in `apps/supervisor/src/backpressure/backpressureMonitor.ts`), keep it as an `interface` and do not flag it as a type-alias-vs-interface violation.

Applied to files:

  • internal-packages/clickhouse/src/client/client.ts
📚 Learning: 2026-06-09T17:58:04.699Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 3879
File: apps/webapp/app/models/vercelIntegration.server.ts:619-630
Timestamp: 2026-06-09T17:58:04.699Z
Learning: In this codebase, outbound raw `fetch` calls should typically rely on Node/undici’s default request timeout (about ~300s) rather than adding a per-call `AbortController` + `setTimeout` wrapper inside individual functions (e.g. in files like `apps/webapp/app/models/vercelIntegration.server.ts`). During code review, do not flag the absence of a per-call timeout on a single `fetch` as an issue; if per-call timeouts are needed, they should be implemented via a codebase-wide convention (e.g., a shared fetch wrapper or documented pattern) rather than ad-hoc per-function changes.

Applied to files:

  • internal-packages/clickhouse/src/client/client.ts
🔇 Additional comments (2)
.server-changes/clickhouse-binary-type-complexity.md (1)

1-6: LGTM!

internal-packages/clickhouse/src/client/client.ts (1)

75-78: 🩺 Stability & Availability

No issue hereclickhouse_settings are merged with request-level overrides, so this client default still applies to query()/insert() calls.

			> Likely an incorrect or invalid review comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant