Skip to content

Add support for full text search on bloom_filter(tokens(..))#1620

Merged
kodiakhq[bot] merged 7 commits intomainfrom
mikeshi/faster-bloom-filters
Jan 21, 2026
Merged

Add support for full text search on bloom_filter(tokens(..))#1620
kodiakhq[bot] merged 7 commits intomainfrom
mikeshi/faster-bloom-filters

Conversation

@MikeShi42
Copy link
Contributor

Resolves HDX-1977

@changeset-bot
Copy link

changeset-bot bot commented Jan 17, 2026

🦋 Changeset detected

Latest commit: 1021ebb

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@hyperdx/common-utils Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link

vercel bot commented Jan 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
hyperdx-v2-oss-app Ready Ready Preview, Comment Jan 21, 2026 2:54pm

Request Review

@github-actions
Copy link
Contributor

github-actions bot commented Jan 17, 2026

E2E Test Results

All tests passed • 61 passed • 4 skipped • 751s

Status Count
✅ Passed 61
❌ Failed 0
⚠️ Flaky 0
⏭️ Skipped 4

Tests ran across 4 shards in parallel.

View full report →

@claude
Copy link

claude bot commented Jan 17, 2026

PR Review - bloom_filter tokens() optimization

This PR adds support for ClickHouse bloom_filter(tokens()) indices to optimize full-text search. Implementation is solid with comprehensive tests. Issues found:

Critical:

  • SQL Injection risk in indexCoversColumn (queryParser.ts:1077-1104) → Regex /\w+/g extraction could incorrectly match columns. While SqlString.format() prevents injection, the matching logic may have false positives. Consider using proper SQL expression parsing instead of regex word matching.

Important:

  • ⚠️ Case sensitivity bug (queryParser.ts:1087) → indexCoversColumn's normalize() function removes whitespace/backticks but doesn't handle case sensitivity. ClickHouse columns are case-sensitive. Your test expects tokens(body) NOT to match Body (line 565-568), but the implementation may not handle this correctly. Verify the normalize function preserves case or add explicit case checks.

  • ⚠️ Granularity type inconsistency (metadata.ts:640) → granularity typed as number but ClickHouse may return string. Test mocks show both "8" (line 238) and 8 (lines 416, 441, 467). Fix the type to number | string or ensure consistent parsing.

Minor:

  • 📝 Typo at queryParser.ts:922 → "This meeans" should be "This means"

Recommendation: Address the case sensitivity and type inconsistency issues before merge. The SQL injection concern is mitigated by SqlString.format() but the matching logic should be more robust.

@MikeShi42 MikeShi42 requested review from a team and pulpdrew and removed request for a team January 17, 2026 06:50
@kodiakhq kodiakhq bot merged commit 00854da into main Jan 21, 2026
11 of 12 checks passed
@kodiakhq kodiakhq bot deleted the mikeshi/faster-bloom-filters branch January 21, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants