Skip to content

feat(deployment)!: Decouple query_engine from deployment gating; Move to webui config and enable independent service control (resolves #2088).#2104

Draft
junhaoliao wants to merge 4 commits intoy-scope:mainfrom
junhaoliao:webui-query-engine

Conversation

@junhaoliao
Copy link
Copy Markdown
Member

Description

Previously, query_engine lived under package.query_engine and controlled two things at once:

  1. Which search interface the Web UI displays.
  2. Which Docker Compose file (and therefore which services) are deployed.

Setting query_engine: "presto" would select a "base" compose file that excluded the entire Celery
query pipeline (query-scheduler, query-worker, reducer, mcp-server), making it impossible to
run the API server (which depends on the query pipeline) alongside Presto. The two query engines were
mutually exclusive.

This PR decouples these concerns so that query_engine only controls the Web UI search interface,
and individual services are enabled/disabled through their own config keys — aligning Docker Compose
with the Helm chart (#2004), where components are independently toggled.

Changes

Config schema (clp_config.py):

  • Move query_engine from Package to WebUi (e.g., webui.query_engine instead of
    package.query_engine).
  • Remove validate_query_engine_package_compatibility from Package; add an equivalent
    cross-config validator at the ClpConfig level.
  • Simplify DeploymentType from four variants (BASE, FULL, SPIDER_BASE, SPIDER_FULL) to
    two (CELERY, SPIDER) — since deployment type is now based solely on compression_scheduler.type.
  • Make query_scheduler, query_worker, and reducer nullable (like api_server and
    log_ingestor). Setting them to null in clp-config.yaml disables those services.

Docker Compose (controller.py, compose files):

  • Delete docker-compose-base.yaml and docker-compose-spider-base.yaml (no longer needed).
  • Add deploy.replicas env var pattern to query-scheduler, query-worker, and reducer in
    docker-compose-all.yaml so they can be toggled via CLP_QUERY_SCHEDULER_ENABLED,
    CLP_QUERY_WORKER_ENABLED, and CLP_REDUCER_ENABLED.
  • Add api-server and mcp-server to docker-compose-spider.yaml (previously missing).
  • Update controller to emit *_ENABLED=0 when a service's config is null.

Helm chart:

  • Move query_engine from clpConfig.package to clpConfig.webui in values.yaml.
  • Update configmap.yaml references accordingly.

Documentation:

  • Update Presto guide, quick-start guides, K8s deployment guide, design docs, and building docs
    to reflect the new webui.query_engine config location and simplified deployment types.

Impact Assessment

  • Breaking change: package.query_engine no longer exists; users must move it to
    webui.query_engine. Existing clp-config.yaml files that set package.query_engine will fail
    validation.
  • New capability: Users can now run Presto and the Celery query pipeline simultaneously, or
    selectively disable query pipeline services by setting query_scheduler: null,
    query_worker: null, and reducer: null.
  • Helm chart: clpConfig.package.query_engine is replaced by clpConfig.webui.query_engine.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Scenario 1: Default clp-json deployment (clp-s, Celery)

Task: Verify that the default deployment starts all services (including query pipeline) and
compression works.

Command:

$ cd build/clp-package
$ ./sbin/start-clp.sh

Output:

2026-03-18T04:42:24.563 INFO [controller] Setting up environment for bundling database...
2026-03-18T04:42:24.563 INFO [controller] Setting up environment for bundling queue...
2026-03-18T04:42:24.563 INFO [controller] Setting up environment for bundling redis...
2026-03-18T04:42:24.564 INFO [controller] Setting up environment for bundling results_cache...
2026-03-18T04:42:24.564 INFO [controller] Setting up environment for database...
2026-03-18T04:42:24.564 INFO [controller] Setting up environment for queue...
2026-03-18T04:42:24.564 INFO [controller] Setting up environment for redis...
2026-03-18T04:42:24.564 INFO [controller] spider_scheduler is not configured, skipping environment setup...
2026-03-18T04:42:24.564 INFO [controller] Setting up environment for results_cache...
2026-03-18T04:42:24.564 INFO [controller] Setting up environment for compression_scheduler...
2026-03-18T04:42:24.564 INFO [controller] Setting up environment for query_scheduler...
2026-03-18T04:42:24.565 INFO [controller] Setting up environment for compression_worker...
2026-03-18T04:42:24.565 INFO [controller] Setting up environment for query_worker...
2026-03-18T04:42:24.565 INFO [controller] Setting up environment for reducer...
2026-03-18T04:42:24.565 INFO [controller] Setting up environment for api_server...
2026-03-18T04:42:24.565 INFO [controller] log_ingestor is only applicable for S3 logs input type, skipping environment setup...
2026-03-18T04:42:24.565 INFO [controller] Setting up environment for webui...
2026-03-18T04:42:24.565 INFO [controller] The MCP Server is not configured, skipping mcp_server creation...
2026-03-18T04:42:24.565 INFO [controller] Setting up environment for garbage_collector...
2026-03-18T04:42:24.592 INFO [controller] Starting CLP using Docker Compose (celery deployment)...
...
2026-03-18T04:42:36.721 INFO [controller] Started CLP.

Explanation: Deployment type is now celery (was full). All services including
query-scheduler, query-worker, reducer, and api-server started and became healthy.

Task: Verify compression works.

Command:

$ ./sbin/compress.sh --timestamp-key timestamp ~/samples/postgresql.jsonl

Output:

2026-03-18T04:42:42.664 INFO [compress] Compression job 1 submitted.
2026-03-18T04:42:44.668 INFO [compress] Compressed 385.21MB into 10.06MB (38.31x). Speed: 203.80MB/s.
2026-03-18T04:42:45.168 INFO [compress] Compression finished.
2026-03-18T04:42:45.168 INFO [compress] Compressed 385.21MB into 10.06MB (38.31x). Speed: 179.16MB/s.

Scenario 2: Presto query engine alongside Celery query pipeline

Task: Verify that setting webui.query_engine: "presto" no longer excludes the Celery query
pipeline services. Previously this would select docker-compose-base.yaml which omitted
query-scheduler, query-worker, and reducer.

Config (etc/clp-config.yaml):

webui:
  query_engine: "presto"

results_cache:
  retention_period: null

presto:
  host: "localhost"
  port: 8889

Command:

$ ./sbin/start-clp.sh

Output:

2026-03-18T04:43:31.528 INFO [controller] Setting up environment for bundling database...
2026-03-18T04:43:31.528 INFO [controller] Setting up environment for bundling queue...
2026-03-18T04:43:31.528 INFO [controller] Setting up environment for bundling redis...
2026-03-18T04:43:31.528 INFO [controller] Setting up environment for bundling results_cache...
2026-03-18T04:43:31.529 INFO [controller] Setting up environment for database...
2026-03-18T04:43:31.529 INFO [controller] Setting up environment for queue...
2026-03-18T04:43:31.529 INFO [controller] Setting up environment for redis...
2026-03-18T04:43:31.529 INFO [controller] spider_scheduler is not configured, skipping environment setup...
2026-03-18T04:43:31.529 INFO [controller] Setting up environment for results_cache...
2026-03-18T04:43:31.529 INFO [controller] Setting up environment for compression_scheduler...
2026-03-18T04:43:31.529 INFO [controller] Setting up environment for query_scheduler...
2026-03-18T04:43:31.529 INFO [controller] Setting up environment for compression_worker...
2026-03-18T04:43:31.529 INFO [controller] Setting up environment for query_worker...
2026-03-18T04:43:31.529 INFO [controller] Setting up environment for reducer...
2026-03-18T04:43:31.529 INFO [controller] Setting up environment for api_server...
2026-03-18T04:43:31.529 INFO [controller] log_ingestor is only applicable for S3 logs input type, skipping environment setup...
2026-03-18T04:43:31.529 INFO [controller] Setting up environment for webui...
2026-03-18T04:43:31.530 INFO [controller] The MCP Server is not configured, skipping mcp_server creation...
2026-03-18T04:43:31.530 INFO [controller] Retention period is not configured, skipping garbage_collector creation...
2026-03-18T04:43:31.556 INFO [controller] Starting CLP using Docker Compose (celery deployment)...
...
2026-03-18T04:43:39.137 INFO [controller] Started CLP.

Explanation: With webui.query_engine: "presto", the deployment type is still celery and ALL
services start — including query-scheduler, query-worker, reducer, and api-server. Previously
this config would have selected docker-compose-base.yaml which excluded these services entirely.
This validates the core goal of the issue: Presto and the Celery query pipeline can now run
simultaneously.

Task: Verify compression still works with Presto config.

Command:

$ ./sbin/compress.sh --timestamp-key timestamp ~/samples/postgresql.jsonl

Output:

2026-03-18T04:43:46.894 INFO [compress] Compression job 2 submitted.
2026-03-18T04:43:48.897 INFO [compress] Compressed 385.21MB into 10.06MB (38.31x). Speed: 193.53MB/s.
2026-03-18T04:43:49.398 INFO [compress] Compression finished.
2026-03-18T04:43:49.398 INFO [compress] Compressed 385.21MB into 10.06MB (38.31x). Speed: 179.16MB/s.

TODO: test with Presto deployment

…ve to webui config and enable independent service control (resolves y-scope#2088).
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 18, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f310e6c9-483d-4ee2-9df3-1dd51fb6d877

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant