docs(DOC-732): document partition_autobalancing_node_autodecommission_time by mfernest · Pull Request #1607 · redpanda-data/docs

mfernest · 2026-03-16T18:20:23Z

Summary

Documents the new partition_autobalancing_node_autodecommission_time cluster property introduced in PR #28946 (CORE-7111).

Cluster properties reference: adds the new property entry after partition_autobalancing_node_availability_timeout_sec
Continuous Data Balancing guide: adds the property to the configuration table with a clear explanation of how it differs from the availability timeout (permanent decommission vs. partition moves)

Key points documented:

Opt-in (null/disabled by default)
Only applies when partition_autobalancing_mode is continuous
Permanently removes the node — unlike availability_timeout_sec, the node cannot rejoin
One decommission at a time; stalled decommissions require manual intervention

SME: Joe Miller

Preview

Test plan

Netlify deploy preview passes
Property entry renders correctly in cluster properties reference
Continuous data balancing table renders correctly

🤖 Generated with Claude Code

…_time Add new cluster property that enables automatic decommission of unavailable nodes after a configurable timeout. Updates both the cluster properties reference and the continuous data balancing guide. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

netlify · 2026-03-16T18:20:30Z

❌ Deploy Preview for redpanda-docs-preview failed. Why did it fail? →

Name	Link
🔨 Latest commit	`4221053`
🔍 Latest deploy log	https://app.netlify.com/projects/redpanda-docs-preview/deploys/69c5656c0879a00008662f8d

coderabbitai · 2026-03-16T18:20:46Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d7cd354f-9986-4d2a-9d87-8789d44509a2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Two documentation files were updated to introduce a new cluster property partition_autobalancing_node_autodecommission_time. The property specifies a timeout in seconds for automatic node decommission when using continuous data balancing mode. Documentation includes the property's type, default state (disabled), behavior notes, and distinction from related properties. Updates maintain consistency across continuous data balancing and cluster properties reference documentation.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Suggested reviewers

mattschumpert
wdberkeley
micheleRP

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description covers the key information but is missing required template sections like Jira ticket link, review deadline, and checkbox selections.	Add the missing template sections: include the Jira ticket URL in the Description header, specify a review deadline, and check the appropriate category box (likely 'Content gap' or 'New feature').

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: documenting a new cluster property with specific reference to the JIRA ticket.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/doc-732-ghost-node-autodecommission

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc (1)
30-30: Minor phrasing improvement (optional).

The phrase "at least this timeout duration" is slightly awkward. Consider simplifying to "for this timeout duration" since the "at least" is implied by a timeout.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc` at
line 30, Update the sentence describing the decommission timeout in
continuous-data-balancing.adoc: replace the phrase "at least this timeout
duration" with "for this timeout duration" to simplify phrasing; keep the rest
of the sentence and references to the property
partition_autobalancing_node_availability_timeout_sec unchanged so the meaning
remains that a node unavailable for this timeout is permanently decommissioned.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc`:
- Around line 29-34: The description for
partition_autobalancing_node_autodecommission_time is missing the unit and
prerequisite context: update the prose to state the unit is seconds (e.g.,
"measured in seconds") and add a sentence clarifying this property only applies
when partition_autobalancing_mode is set to continuous; also keep the existing
notes about default null/disabled, one-node-at-a-time behavior, and manual
intervention if decommission stalls so the table matches other properties'
phrasing and the PR objectives.

---

Nitpick comments:
In `@modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc`:
- Line 30: Update the sentence describing the decommission timeout in
continuous-data-balancing.adoc: replace the phrase "at least this timeout
duration" with "for this timeout duration" to simplify phrasing; keep the rest
of the sentence and references to the property
partition_autobalancing_node_availability_timeout_sec unchanged so the meaning
remains that a node unavailable for this timeout is permanently decommissioned.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2438c28d-c101-4515-b05e-1eacf4a7de5a

📥 Commits

Reviewing files that changed from the base of the PR and between fe8a357 and 8129884.

📒 Files selected for processing (2)

modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc
modules/reference/partials/properties/cluster-properties.adoc

modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc

…roperty Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

joe-redpanda · 2026-03-16T21:17:03Z

modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc

 +
 Default is 900 seconds (15 minutes).

+| `partition_autobalancing_node_autodecommission_time`


partition_autobalancing_node_autodecommission_timeout_sec

This got updated down the line to reflect units

Good catch — removed the '(in seconds)' parenthetical since the _sec suffix already conveys the unit. Fixed in the latest commit.

joe-redpanda · 2026-03-16T21:18:15Z

modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc

+| `partition_autobalancing_node_autodecommission_time`
+| When a node is unavailable for this timeout duration (in seconds), Redpanda automatically and permanently decommissions the node. This property only applies when `partition_autobalancing_mode` is set to `continuous`. Unlike `partition_autobalancing_node_availability_timeout_sec`, which moves partitions while keeping the node in the cluster, this property removes the node from the cluster entirely. A decommissioned node cannot rejoin the cluster. +
+ +
+Only one node is decommissioned at a time. If a decommission is already in progress, automatic decommission does not trigger until it completes. If the decommission stalls (for example, because the node holds the only replica of a partition), manual intervention is required. +


Might be worth cross-linking to node-wise recovery here

The xref to nodewise-partition-recovery.adoc is already in the description ('...manual intervention is required. See xref:manage:cluster-maintenance/nodewise-partition-recovery.adoc[].'). Let me know if you had a different location in mind.

…ommission_timeout_sec Property was renamed upstream to reflect units. Update continuous-data-balancing.adoc and regenerate cluster-properties partial from dev branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ommissions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mfernest · 2026-03-17T18:43:00Z

@coderabbitai The phrase "at least this timeout duration" no longer appears in the file. The current text reads "When a node is unavailable for this timeout duration (in seconds)..." — already using the simpler phrasing. This comment appears to be based on a previous version of the file.

coderabbitai · 2026-03-17T18:43:18Z

Tip

For best results, initiate chat on the files or code changes.

@mfernest You're right, I apologize for the confusion. My comment was based on a stale version of the file. Since the current text already uses the simpler phrasing, this concern does not apply. Sorry for the noise!

(^_^)

🧠 Learnings used

Learnt from: mfernest
Repo: redpanda-data/docs PR: 0
File: :0-0
Timestamp: 2026-03-17T18:36:27.877Z
Learning: In the redpanda-data/docs repository, the `delete_topic_enable` property is a cluster-scoped boolean property (default: true) documented only in `modules/reference/partials/properties/cluster-properties.adoc` (line 4137). It does NOT appear in `modules/reference/partials/properties/topic-properties.adoc`. Its description override is in `docs-data/property-overrides.json` at line 796.

…de-autodecommission

…timeout property The _sec suffix in partition_autobalancing_node_autodecommission_timeout_sec already conveys the unit; the "(in seconds)" parenthetical is redundant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

joe-redpanda · 2026-03-23T17:52:44Z

docs-data/property-overrides.json

      "description": "Configure the addressing style that controls how Redpanda formats bucket URLs for S3-compatible object storage.\n\nLeave this property unset (`null`) to use automatic configuration:\n\n* For AWS S3: Redpanda attempts `virtual_host` addressing first, then falls back to `path` style if needed\n* For MinIO: Redpanda automatically uses `path` style regardless of `MINIO_DOMAIN` configuration\n\nSet this property explicitly to override automatic configuration, ensure consistent behavior across deployments, or when using S3-compatible storage that requires a specific URL format.\n\nCAUTION: AWS requires virtual-hosted addressing for buckets created after September 30, 2020. If you use AWS S3 with buckets created after this date, use `virtual_host` addressing.\n\nNOTE: For MinIO deployments, Redpanda defaults to `path` style when this property is unset. To use `virtual_host` addressing with a configured `MINIO_DOMAIN`, set this property explicitly to `virtual_host`. For other S3-compatible storage backends, consult your provider's documentation to determine the required URL style.",
      "config_scope": "cluster"
    },
+    "cloud_topics_allow_materialization_failure": {


I can't speak to these. Intentional?

joe-redpanda · 2026-03-23T17:53:27Z

modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc

+| `partition_autobalancing_node_autodecommission_timeout_sec`
+| When a node is unavailable for this timeout duration, Redpanda automatically and permanently decommissions the node. This property only applies when `partition_autobalancing_mode` is set to `continuous`. Unlike `partition_autobalancing_node_availability_timeout_sec`, which moves partitions while keeping the node in the cluster, this property removes the node from the cluster entirely. A decommissioned node cannot rejoin the cluster. +
+ +
+Only one node is decommissioned at a time. If a decommission is already in progress, automatic decommission does not trigger until it completes. If the decommission stalls (for example, because the node holds the only replica of a partition), manual intervention is required. See xref:manage:cluster-maintenance/nodewise-partition-recovery.adoc[]. +


joe-redpanda · 2026-03-23T17:53:43Z

modules/reference/partials/properties/cluster-properties.adoc

-|===
-
-
-=== alter_topic_cfg_timeout_ms


Not sure whats going on here

joe-redpanda

The autodecom stuff looks good, I can't speak to the other unrelated changes. Approved on the auto decom description

…adoc - Add page-topic-type, personas, and learning objectives metadata - Rewrite intro to remove weak "enables" verb - Fix "But after" sentence opener; clarify rack replacement phrasing - "You can then customize" → "Customize the following properties" - Remove *Note:* inline label from table cell; fold into prose - Fix relative xref ./cluster-balancing.adoc → fully qualified path - Promote inline rpk commands to bash code blocks - "Use Data Balancing commands" → sentence case - Add intro sentence to commands section (no empty sections) - "check the following" → "verify:" - "in a specific node" → "on a specific node" - Fix NOTE ambiguous subject; remove passive "hasn't been turned off" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

micheleRP

Review Summary

Property automation analysis

Files that should be removed from this PR (auto-generated, will be overwritten, and add no reviewable value):

modules/reference/partials/properties/cluster-properties.adoc (1455+/2076- lines) — Fully auto-generated partial. Including it makes the PR unreviewable and the output will be overwritten on the next automation run. This should be generated by running the automation after the property ships in a GA release.
modules/reference/attachments/redpanda-properties-v25.3.10.json — Generated property snapshot. Will be overwritten.
docs-data/redpanda-property-changes-v25.3.10-to-dev.json — Generated diff report. Will be overwritten.
docs-data/redpanda-property-changes-v25.3.9-to-v25.3.10.json (deleted) — Removing this generated artifact adds noise.

Files that are safe and should stay:

docs-data/property-overrides.json — This is the correct place to document new properties. Overrides are preserved through regeneration.
modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc — Fully manual page. Safe from automation.

The `"version": "dev"` problem

The override sets "version": "dev" for the new property. The Handlebars template renders this as:

Introduced in dev

This will appear in the published docs and is confusing to users. The version field should either:

Be omitted until the property ships in a GA release, or
Be set to the actual release version (e.g., v25.4.1) once it's known

The automation will auto-update "version": "dev" → the correct GA version tag when it runs against a release that includes this property (see updatePropertyOverridesWithVersion in diff-utils.js). But until then, the published docs would show "Introduced in dev."

Additional concern: incorrect metadata in generated JSON

Because this property only exists in the dev branch (not v25.3.10), the generated redpanda-properties-v25.3.10.json shows it with "defined_in": "override" and incorrect metadata:

"config_scope": "topic" — should be cluster
"type": "string" — should be integer
"is_topic_property": true — incorrect

These will self-correct when the automation runs against a release that has the property in the C++ source. This is another reason to remove the generated files from this PR and let the automation handle them when the property ships.

What should stay in this PR

property-overrides.json: The override entry is correct, but consider:
- Remove or leave "version" empty until GA, to avoid "Introduced in dev" in published docs
- Add "config_scope": "cluster" to prevent misclassification
- Add "related_topics": ["xref:manage:cluster-maintenance/continuous-data-balancing.adoc[Configure Continuous Data Balancing]"]
- Optionally add a "description" override if the auto-extracted C++ description is too terse
continuous-data-balancing.adoc: The manual content is well-written and adds genuine value. The property table entry clearly differentiates auto-decommission from availability timeout. The learning objectives and page metadata additions are good.

Style notes on `continuous-data-balancing.adoc`

Good: Clear differentiation between availability_timeout_sec (moves partitions, node stays) vs autodecommission_timeout_sec (removes node permanently)
Good: Warning about stalled decommissions and xref to recovery docs
Minor: Verify the checkbox learning objectives (* [ ]) render correctly in the Antora build

micheleRP

@mfernest please see Claude's review

…de-autodecommission

Remove property partials, JSON attachments, and diff files included by the auto-docs regen commit. Only property-overrides.json and the continuous-data-balancing content page belong in this PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

micheleRP · 2026-03-26T15:12:27Z

@mfernest A few cleanup items before this can be merged:

1. Remove redpanda-property-changes-v25.3.10-to-dev.json from this PR
This auto-generated diff file (536 lines) is still in the PR. It will be overwritten on the next automation run and adds no reviewable value. Please revert it.

2. Remove unrelated "version": "dev" entries from property-overrides.json
The diff includes 30+ entries unrelated to DOC-732 (e.g., cloud_topics_*, delete_topic_enable, default_redpanda_storage_mode, nested_group_behavior, oidc_group_claim_path, redpanda.storage.mode, schema_registry_enable_qualified_subjects, etc.). These appear to have come in from a merge or regen run. Please revert them so the PR diff only contains the autodecommission change.

3. Remove the partition_autobalancing_node_autodecommission_timeout_sec entry from property-overrides.json
The override currently sets "version": "dev", which will render as "Introduced in dev" in published docs. Removing "version" alone would leave an empty {} entry, which serves no purpose. The cleanest approach is to remove the entry entirely and let the automation create it with the correct version and metadata when the property ships in a GA release.

Once it does ship, we can add config_scope and related_topics to match the pattern used by the sibling property partition_autobalancing_node_availability_timeout_sec.

….json Strip 30 bulk automation entries that each only carried "version": "dev", which would render as "Introduced in dev" in published docs. The partition_autobalancing_node_autodecommission_timeout_sec entry had no content beyond the version tag so it is removed entirely; the automation will add it with the correct version when the property reaches GA. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…thub.com/redpanda-data/docs into feat/doc-732-ghost-node-autodecommission

redpanda-property-changes-v25.3.10-to-dev.json is auto-generated and will be overwritten on the next automation run. Remove it from the PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mfernest requested a review from a team as a code owner March 16, 2026 18:20

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

modules/manage/pages/cluster-maintenance/continuous-data-balancing.adoc Outdated Show resolved Hide resolved

mfernest requested a review from joe-redpanda March 16, 2026 18:42

fix(DOC-732): add unit and prerequisite context to autodecommission p…

05fa6f5

…roperty Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

joe-redpanda reviewed Mar 16, 2026

View reviewed changes

mfernest and others added 2 commits March 17, 2026 09:30

fix(DOC-732): cross-link node-wise partition recovery for stalled dec…

93caacd

…ommissions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mfernest and others added 3 commits March 18, 2026 16:20

Merge remote-tracking branch 'origin/main' into feat/doc-732-ghost-no…

217c5d2

…de-autodecommission

Merge branch 'main' into feat/doc-732-ghost-node-autodecommission

34affd1

joe-redpanda reviewed Mar 23, 2026

View reviewed changes

modules/reference/partials/properties/cluster-properties.adoc

|===

=== alter_topic_cfg_timeout_ms

Copy link

joe-redpanda Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whats going on here

joe-redpanda approved these changes Mar 23, 2026

View reviewed changes

micheleRP reviewed Mar 24, 2026

View reviewed changes

micheleRP requested changes Mar 24, 2026

View reviewed changes

This was referenced Mar 24, 2026

feat(DOC-1942): document delete_topic_enable enterprise feature #1610

Open

fix(DOC-1959): correct election_timeout_ms type/default; exclude raft_election_timeout_ms #1611

Open

mfernest and others added 2 commits March 24, 2026 16:57

Merge remote-tracking branch 'origin/main' into feat/doc-732-ghost-no…

7dec632

…de-autodecommission

mfernest requested a review from micheleRP March 25, 2026 00:02

Merge branch 'main' into feat/doc-732-ghost-node-autodecommission

b7dbdeb

mfernest and others added 3 commits March 26, 2026 09:42

Merge branch 'feat/doc-732-ghost-node-autodecommission' of https://gi…

274f8b6

…thub.com/redpanda-data/docs into feat/doc-732-ghost-node-autodecommission

fix(DOC-732): remove auto-generated property-changes diff file

4221053

redpanda-property-changes-v25.3.10-to-dev.json is auto-generated and will be overwritten on the next automation run. Remove it from the PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Conversation

mfernest commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Preview

Test plan

Uh oh!

netlify bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Deploy Preview for redpanda-docs-preview failed. Why did it fail? →

Uh oh!

coderabbitai bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joe-redpanda Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

mfernest Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

joe-redpanda Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

mfernest Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

mfernest commented Mar 17, 2026

Uh oh!

coderabbitai bot commented Mar 17, 2026

Uh oh!

joe-redpanda Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

joe-redpanda Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

joe-redpanda Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

joe-redpanda left a comment

Choose a reason for hiding this comment

Uh oh!

micheleRP left a comment

Choose a reason for hiding this comment

Review Summary

Property automation analysis

The "version": "dev" problem

Additional concern: incorrect metadata in generated JSON

What should stay in this PR

Style notes on continuous-data-balancing.adoc

Uh oh!

micheleRP left a comment

Choose a reason for hiding this comment

Uh oh!

micheleRP commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mfernest commented Mar 16, 2026 •

edited

Loading

netlify bot commented Mar 16, 2026 •

edited

Loading

coderabbitai bot commented Mar 16, 2026 •

edited

Loading

The `"version": "dev"` problem

Style notes on `continuous-data-balancing.adoc`