-
Notifications
You must be signed in to change notification settings - Fork 86
Enhance Claude failure analysis with Velero source and must-gather feedback #2051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: oadp-dev
Are you sure you want to change the base?
Conversation
…edback - Clone openshift/velero (oadp-dev branch) in ci-Dockerfile for source code investigation during failure analysis - Add Velero source code investigation prompts to analyze_failures.sh, enabling Claude to trace errors back to Velero implementation - Add must-gather improvement suggestions section to analysis output, creating a feedback loop for improving diagnostics collection - Add data mover volume restore limitation to error ignore patterns (claim Selector not supported per vmware-tanzu/velero#7946) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a Velero repo checkout to the CI Dockerfile, extends an E2E error-ignore pattern for a Data Mover restore message, and enriches the failure-analysis script with Velero/OADP source-code references and step-by-step investigation guidance. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**⚙️ CodeRabbit configuration file
Files:
🔇 Additional comments (5)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kaovilai The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the Claude-based failure analysis system by integrating Velero source code access and adding a feedback mechanism for must-gather improvements. The changes enable deeper root cause analysis by allowing Claude to investigate Velero implementation details when analyzing test failures, and create a feedback loop for improving diagnostic data collection.
Key Changes:
- Added Velero source code cloning in the CI Docker image for runtime investigation during failure analysis
- Extended failure analysis prompts to guide Claude through Velero source code investigation when errors originate from Velero packages
- Added must-gather improvement suggestions section to capture gaps in diagnostic data collection
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| build/ci-Dockerfile | Clones openshift/velero (oadp-dev branch) to provide source code access for failure analysis |
| tests/e2e/scripts/analyze_failures.sh | Adds Velero source investigation prompts and must-gather feedback section to guide Claude's analysis workflow |
| tests/e2e/lib/flakes.go | Adds known data mover limitation to error ignore patterns based on upstream Velero issue |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 5. **Correlation**: Group related errors together - if multiple errors reference the same resource (backup name, PVC, pod), keep them together with their context. | ||
| 6. **Source references**: When you find errors from Velero packages (pkg/backup/, pkg/restore/, pkg/controller/, pkg/nodeagent/), note the file:line references for later source code investigation. |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after 'pkg/nodeagent/' before the closing parenthesis for consistency with other package references.
| 6. **Source references**: When you find errors from Velero packages (pkg/backup/, pkg/restore/, pkg/controller/, pkg/nodeagent/), note the file:line references for later source code investigation. | |
| 6. **Source references**: When you find errors from Velero packages (pkg/backup/, pkg/restore/, pkg/controller/, pkg/nodeagent/ ), note the file:line references for later source code investigation. |
| Note: Prow's build-log.txt is NOT available during this analysis (it's written after tests complete). | ||
| Focus on JUnit results, preprocessed log summaries, must-gather diagnostics, and per-test pod logs. | ||
| Focus on JUnit results, preprocessed log summaries, must-gather diagnostics, per-test pod logs, and Velero source investigation. |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line references 'Velero source investigation' but this capability depends on the Docker image containing the cloned source code. Consider adding a note about this dependency or verifying the source code is available before attempting investigation.
|
|
||
| # Clone openshift/velero source code for failure analysis | ||
| # Uses oadp-dev branch to match OADP operator development | ||
| RUN git clone --depth 1 --branch oadp-dev \ |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The --depth 1 shallow clone limits git history to one commit. If failure analysis requires examining recent changes or commit history for context, this might be insufficient. Consider if a deeper history (e.g., --depth 50) would be more useful for analysis purposes, or document this limitation if intentional.
| RUN git clone --depth 1 --branch oadp-dev \ | |
| RUN git clone --depth 50 --branch oadp-dev \ |
Enable Claude to investigate OADP operator source at /go/src/github.com/openshift/oadp-operator/ during failure analysis: - Add OADP operator source to Available Artifacts section - Rename "Velero Source Code Investigation" to "Source Code Investigation" with subsections for both Velero and OADP packages - Update Claude invocation prompt to reference OADP source - List key OADP packages: internal/controller/, pkg/velero/, pkg/credentials/, api/v1alpha1/, tests/e2e/lib/ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
@kaovilai: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
investigation during failure analysis
enabling Claude to trace errors back to Velero implementation
creating a feedback loop for improving diagnostics collection
(claim Selector not supported per failed to restore volume with StorageClass, claim Selector is not supported vmware-tanzu/velero#7946)
🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.5 [email protected]
Why the changes were made
How to test the changes made