docs(DOC-732): document partition_autobalancing_node_autodecommission_time#1607
docs(DOC-732): document partition_autobalancing_node_autodecommission_time#1607
Conversation
…_time Add new cluster property that enables automatic decommission of unavailable nodes after a configurable timeout. Updates both the cluster properties reference and the continuous data balancing guide. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
❌ Deploy Preview for redpanda-docs-preview failed. Why did it fail? →
|
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughTwo documentation files were updated to introduce a new cluster property Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc (1)
30-30: Minor phrasing improvement (optional).The phrase "at least this timeout duration" is slightly awkward. Consider simplifying to "for this timeout duration" since the "at least" is implied by a timeout.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc` at line 30, Update the sentence describing the decommission timeout in continuous-data-balancing.adoc: replace the phrase "at least this timeout duration" with "for this timeout duration" to simplify phrasing; keep the rest of the sentence and references to the property partition_autobalancing_node_availability_timeout_sec unchanged so the meaning remains that a node unavailable for this timeout is permanently decommissioned.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc`:
- Around line 29-34: The description for
partition_autobalancing_node_autodecommission_time is missing the unit and
prerequisite context: update the prose to state the unit is seconds (e.g.,
"measured in seconds") and add a sentence clarifying this property only applies
when partition_autobalancing_mode is set to continuous; also keep the existing
notes about default null/disabled, one-node-at-a-time behavior, and manual
intervention if decommission stalls so the table matches other properties'
phrasing and the PR objectives.
---
Nitpick comments:
In `@modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc`:
- Line 30: Update the sentence describing the decommission timeout in
continuous-data-balancing.adoc: replace the phrase "at least this timeout
duration" with "for this timeout duration" to simplify phrasing; keep the rest
of the sentence and references to the property
partition_autobalancing_node_availability_timeout_sec unchanged so the meaning
remains that a node unavailable for this timeout is permanently decommissioned.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2438c28d-c101-4515-b05e-1eacf4a7de5a
📒 Files selected for processing (2)
modules/manage/pages/cluster-maintenance/continuous-data-balancing.adocmodules/reference/partials/properties/cluster-properties.adoc
modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc
Outdated
Show resolved
Hide resolved
…roperty Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| + | ||
| Default is 900 seconds (15 minutes). | ||
|
|
||
| | `partition_autobalancing_node_autodecommission_time` |
There was a problem hiding this comment.
partition_autobalancing_node_autodecommission_timeout_sec
This got updated down the line to reflect units
There was a problem hiding this comment.
Good catch — removed the '(in seconds)' parenthetical since the _sec suffix already conveys the unit. Fixed in the latest commit.
| | `partition_autobalancing_node_autodecommission_time` | ||
| | When a node is unavailable for this timeout duration (in seconds), Redpanda automatically and permanently decommissions the node. This property only applies when `partition_autobalancing_mode` is set to `continuous`. Unlike `partition_autobalancing_node_availability_timeout_sec`, which moves partitions while keeping the node in the cluster, this property removes the node from the cluster entirely. A decommissioned node cannot rejoin the cluster. + | ||
| + | ||
| Only one node is decommissioned at a time. If a decommission is already in progress, automatic decommission does not trigger until it completes. If the decommission stalls (for example, because the node holds the only replica of a partition), manual intervention is required. + |
There was a problem hiding this comment.
Might be worth cross-linking to node-wise recovery here
There was a problem hiding this comment.
The xref to nodewise-partition-recovery.adoc is already in the description ('...manual intervention is required. See xref:manage:cluster-maintenance/nodewise-partition-recovery.adoc[].'). Let me know if you had a different location in mind.
…ommission_timeout_sec Property was renamed upstream to reflect units. Update continuous-data-balancing.adoc and regenerate cluster-properties partial from dev branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ommissions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@coderabbitai The phrase "at least this timeout duration" no longer appears in the file. The current text reads "When a node is unavailable for this timeout duration (in seconds)..." — already using the simpler phrasing. This comment appears to be based on a previous version of the file. |
|
Tip For best results, initiate chat on the files or code changes.
(^_^) 🧠 Learnings used |
…de-autodecommission
…timeout property The _sec suffix in partition_autobalancing_node_autodecommission_timeout_sec already conveys the unit; the "(in seconds)" parenthetical is redundant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs-data/property-overrides.json
Outdated
| "description": "Configure the addressing style that controls how Redpanda formats bucket URLs for S3-compatible object storage.\n\nLeave this property unset (`null`) to use automatic configuration:\n\n* For AWS S3: Redpanda attempts `virtual_host` addressing first, then falls back to `path` style if needed\n* For MinIO: Redpanda automatically uses `path` style regardless of `MINIO_DOMAIN` configuration\n\nSet this property explicitly to override automatic configuration, ensure consistent behavior across deployments, or when using S3-compatible storage that requires a specific URL format.\n\nCAUTION: AWS requires virtual-hosted addressing for buckets created after September 30, 2020. If you use AWS S3 with buckets created after this date, use `virtual_host` addressing.\n\nNOTE: For MinIO deployments, Redpanda defaults to `path` style when this property is unset. To use `virtual_host` addressing with a configured `MINIO_DOMAIN`, set this property explicitly to `virtual_host`. For other S3-compatible storage backends, consult your provider's documentation to determine the required URL style.", | ||
| "config_scope": "cluster" | ||
| }, | ||
| "cloud_topics_allow_materialization_failure": { |
There was a problem hiding this comment.
I can't speak to these. Intentional?
| | `partition_autobalancing_node_autodecommission_timeout_sec` | ||
| | When a node is unavailable for this timeout duration, Redpanda automatically and permanently decommissions the node. This property only applies when `partition_autobalancing_mode` is set to `continuous`. Unlike `partition_autobalancing_node_availability_timeout_sec`, which moves partitions while keeping the node in the cluster, this property removes the node from the cluster entirely. A decommissioned node cannot rejoin the cluster. + | ||
| + | ||
| Only one node is decommissioned at a time. If a decommission is already in progress, automatic decommission does not trigger until it completes. If the decommission stalls (for example, because the node holds the only replica of a partition), manual intervention is required. See xref:manage:cluster-maintenance/nodewise-partition-recovery.adoc[]. + |
| |=== | ||
|
|
||
|
|
||
| === alter_topic_cfg_timeout_ms |
joe-redpanda
left a comment
There was a problem hiding this comment.
The autodecom stuff looks good, I can't speak to the other unrelated changes. Approved on the auto decom description
…adoc - Add page-topic-type, personas, and learning objectives metadata - Rewrite intro to remove weak "enables" verb - Fix "But after" sentence opener; clarify rack replacement phrasing - "You can then customize" → "Customize the following properties" - Remove *Note:* inline label from table cell; fold into prose - Fix relative xref ./cluster-balancing.adoc → fully qualified path - Promote inline rpk commands to bash code blocks - "Use Data Balancing commands" → sentence case - Add intro sentence to commands section (no empty sections) - "check the following" → "verify:" - "in a specific node" → "on a specific node" - Fix NOTE ambiguous subject; remove passive "hasn't been turned off" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
micheleRP
left a comment
There was a problem hiding this comment.
Review Summary
Property automation analysis
Files that should be removed from this PR (auto-generated, will be overwritten, and add no reviewable value):
-
modules/reference/partials/properties/cluster-properties.adoc(1455+/2076- lines) — Fully auto-generated partial. Including it makes the PR unreviewable and the output will be overwritten on the next automation run. This should be generated by running the automation after the property ships in a GA release. -
modules/reference/attachments/redpanda-properties-v25.3.10.json— Generated property snapshot. Will be overwritten. -
docs-data/redpanda-property-changes-v25.3.10-to-dev.json— Generated diff report. Will be overwritten. -
docs-data/redpanda-property-changes-v25.3.9-to-v25.3.10.json(deleted) — Removing this generated artifact adds noise.
Files that are safe and should stay:
-
docs-data/property-overrides.json— This is the correct place to document new properties. Overrides are preserved through regeneration. -
modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc— Fully manual page. Safe from automation.
The "version": "dev" problem
The override sets "version": "dev" for the new property. The Handlebars template renders this as:
Introduced in dev
This will appear in the published docs and is confusing to users. The version field should either:
- Be omitted until the property ships in a GA release, or
- Be set to the actual release version (e.g.,
v25.4.1) once it's known
The automation will auto-update "version": "dev" → the correct GA version tag when it runs against a release that includes this property (see updatePropertyOverridesWithVersion in diff-utils.js). But until then, the published docs would show "Introduced in dev."
Additional concern: incorrect metadata in generated JSON
Because this property only exists in the dev branch (not v25.3.10), the generated redpanda-properties-v25.3.10.json shows it with "defined_in": "override" and incorrect metadata:
"config_scope": "topic"— should be cluster"type": "string"— should be integer"is_topic_property": true— incorrect
These will self-correct when the automation runs against a release that has the property in the C++ source. This is another reason to remove the generated files from this PR and let the automation handle them when the property ships.
What should stay in this PR
-
property-overrides.json: The override entry is correct, but consider:- Remove or leave
"version"empty until GA, to avoid "Introduced in dev" in published docs - Add
"config_scope": "cluster"to prevent misclassification - Add
"related_topics": ["xref:manage:cluster-maintenance/continuous-data-balancing.adoc[Configure Continuous Data Balancing]"] - Optionally add a
"description"override if the auto-extracted C++ description is too terse
- Remove or leave
-
continuous-data-balancing.adoc: The manual content is well-written and adds genuine value. The property table entry clearly differentiates auto-decommission from availability timeout. The learning objectives and page metadata additions are good.
Style notes on continuous-data-balancing.adoc
- Good: Clear differentiation between
availability_timeout_sec(moves partitions, node stays) vsautodecommission_timeout_sec(removes node permanently) - Good: Warning about stalled decommissions and xref to recovery docs
- Minor: Verify the checkbox learning objectives (
* [ ]) render correctly in the Antora build
…de-autodecommission
Remove property partials, JSON attachments, and diff files included by the auto-docs regen commit. Only property-overrides.json and the continuous-data-balancing content page belong in this PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@mfernest A few cleanup items before this can be merged: 1. Remove 2. Remove unrelated 3. Remove the Once it does ship, we can add |
….json Strip 30 bulk automation entries that each only carried "version": "dev", which would render as "Introduced in dev" in published docs. The partition_autobalancing_node_autodecommission_timeout_sec entry had no content beyond the version tag so it is removed entirely; the automation will add it with the correct version when the property reaches GA. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…thub.com/redpanda-data/docs into feat/doc-732-ghost-node-autodecommission
redpanda-property-changes-v25.3.10-to-dev.json is auto-generated and will be overwritten on the next automation run. Remove it from the PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Documents the new
partition_autobalancing_node_autodecommission_timecluster property introduced in PR #28946 (CORE-7111).partition_autobalancing_node_availability_timeout_secKey points documented:
partition_autobalancing_modeiscontinuousavailability_timeout_sec, the node cannot rejoinSME: Joe Miller
Preview
Test plan
🤖 Generated with Claude Code