Skip to content

feat(switch-controller): add ConfigureCertificate phase with RMS job …#2330

Draft
vinodchitraliNVIDIA wants to merge 1 commit into
NVIDIA:mainfrom
vinodchitraliNVIDIA:vc/cm
Draft

feat(switch-controller): add ConfigureCertificate phase with RMS job …#2330
vinodchitraliNVIDIA wants to merge 1 commit into
NVIDIA:mainfrom
vinodchitraliNVIDIA:vc/cm

Conversation

@vinodchitraliNVIDIA

@vinodchitraliNVIDIA vinodchitraliNVIDIA commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

…polling

Extend the switch Configuring state machine with a certificate configuration sub-flow that runs after RotateOsPassword and before Validating. The handler submits an async RMS job via Component Manager and polls until completion.

State machine (api-model):

  • Add ConfigureCertificateState { Start, WaitForComplete { job_id } }
  • Nest under ConfiguringState::ConfigureCertificate
  • RotateOsPassword now transitions into ConfigureCertificate(Start)

Switch handler (configuring.rs):

  • Start: derive cert_name from switch.rack_id; build SwitchEndpoint from BMC MAC, NVOS interface, and vault credentials; call CM to start the job
  • WaitForComplete: poll CM for ConfigureSwitchCertificateState until Completed (→ Validating), Failed (→ Error), or in-progress (wait)
  • Skip certificate configuration when rack_id or component manager is absent

Component Manager:

  • Expose configure_switch_certificate(endpoint, cert_name) → job_id
  • Expose get_configure_switch_certificate_job_status(job_id) → job status
  • Extend NvSwitchManager; implement in mock (configurable job status), NSM (unsupported), and RmsBackend (stub until librms RPCs land)
  • Add ConfigureSwitchCertificateState { Started, InProgress, Completed, Failed }

Tests:

  • Integration tests for skip paths, Start → WaitForComplete, success/failure polling, and RotateOsPassword → ConfigureCertificate(Start)
  • Test fixtures for rack_id assignment and versioned state transitions

Docs:

  • Add switch_configure_certificate.md with FSM detail and RMS sequence diagrams
  • Update switch.md transitions and link to the new design doc

Description

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

2327

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@vinodchitraliNVIDIA vinodchitraliNVIDIA self-assigned this Jun 9, 2026
@vinodchitraliNVIDIA vinodchitraliNVIDIA requested review from a team and Coco-Ben as code owners June 9, 2026 13:19
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ca6cf1a7-ae65-43f6-afa5-4b0f4c0aa89d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@vinodchitraliNVIDIA vinodchitraliNVIDIA marked this pull request as draft June 9, 2026 13:20
@copy-pr-bot

copy-pr-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@vinodchitraliNVIDIA vinodchitraliNVIDIA added the rack lifecycle Issues that relate to managing the lifecycle of a full rack (compute, switches and powershelves) label Jun 9, 2026
…polling

Extend the switch Configuring state machine with a certificate configuration
sub-flow that runs after RotateOsPassword and before Validating. The handler
submits an async RMS job via Component Manager and polls until completion.

State machine (api-model):
- Add ConfigureCertificateState { Start, WaitForComplete { job_id } }
- Nest under ConfiguringState::ConfigureCertificate
- RotateOsPassword now transitions into ConfigureCertificate(Start)

Switch handler (configuring.rs):
- Start: derive cert_name from switch.rack_id; build SwitchEndpoint from BMC
  MAC, NVOS interface, and vault credentials; call CM to start the job
- WaitForComplete: poll CM for ConfigureSwitchCertificateState until Completed
  (→ Validating), Failed (→ Error), or in-progress (wait)
- Skip certificate configuration when rack_id or component manager is absent

Component Manager:
- Expose configure_switch_certificate(endpoint, cert_name) → job_id
- Expose get_configure_switch_certificate_job_status(job_id) → job status
- Extend NvSwitchManager; implement in mock (configurable job status), NSM
  (unsupported), and RmsBackend (stub until librms RPCs land)
- Add ConfigureSwitchCertificateState { Started, InProgress, Completed, Failed }

Tests:
- Integration tests for skip paths, Start → WaitForComplete, success/failure
  polling, and RotateOsPassword → ConfigureCertificate(Start)
- Test fixtures for rack_id assignment and versioned state transitions

Docs:
- Add switch_configure_certificate.md with FSM detail and RMS sequence diagrams
- Update switch.md transitions and link to the new design doc

Signed-off-by: Vinod Chitrali <vchitrali@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rack lifecycle Issues that relate to managing the lifecycle of a full rack (compute, switches and powershelves)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant