Enhance Claude failure analysis with Velero source and must-gather feedback #2051

kaovilai · 2025-12-18T16:19:33Z

Clone openshift/velero (oadp-dev branch) in ci-Dockerfile for source code
investigation during failure analysis
Add Velero source code investigation prompts to analyze_failures.sh,
enabling Claude to trace errors back to Velero implementation
Add must-gather improvement suggestions section to analysis output,
creating a feedback loop for improving diagnostics collection
Add data mover volume restore limitation to error ignore patterns
(claim Selector not supported per failed to restore volume with StorageClass, claim Selector is not supported vmware-tanzu/velero#7946)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 [email protected]

Why the changes were made

How to test the changes made

…edback - Clone openshift/velero (oadp-dev branch) in ci-Dockerfile for source code investigation during failure analysis - Add Velero source code investigation prompts to analyze_failures.sh, enabling Claude to trace errors back to Velero implementation - Add must-gather improvement suggestions section to analysis output, creating a feedback loop for improving diagnostics collection - Add data mover volume restore limitation to error ignore patterns (claim Selector not supported per vmware-tanzu/velero#7946) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

coderabbitai · 2025-12-18T16:19:44Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a Velero repo checkout to the CI Dockerfile, extends an E2E error-ignore pattern for a Data Mover restore message, and enriches the failure-analysis script with Velero/OADP source-code references and step-by-step investigation guidance.

Changes

Cohort / File(s)	Summary
CI/Build Infrastructure `build/ci-Dockerfile`	Inserts a git clone of `github.com/openshift/velero` (branch `oadp-dev`, depth 1) into `/go/src/github.com/openshift/velero` prior to running `go mod download`.
Test Failure Patterns `tests/e2e/lib/flakes.go`	Adds an error-ignore pattern: "failed to restore volume with StorageClass, claim Selector is not supported".
Failure Analysis Script `tests/e2e/scripts/analyze_failures.sh`	Adds Velero and OADP source-code references, a "Velero Source Code Investigation" section, expanded analysis prompts (including Claude preprocessing), updated artifact listings, and must-gather improvement suggestions—documentation and guidance only.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Review focus areas:
- build/ci-Dockerfile: verify clone step ordering, network/security implications, and path correctness.
- tests/e2e/scripts/analyze_failures.sh: confirm prompts and references are accurate and do not alter runtime behavior.
- tests/e2e/lib/flakes.go: ensure the added ignore pattern is precise and intended.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 226dc5f and 5a8aaca.

📒 Files selected for processing (1)

tests/e2e/scripts/analyze_failures.sh (6 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

tests/e2e/scripts/analyze_failures.sh

🔇 Additional comments (5)

tests/e2e/scripts/analyze_failures.sh (5)

242-270: Excellent addition for source-based failure analysis.

This section provides comprehensive guidance for investigating error origins in source code. The step-by-step workflow and detailed package listings will help Claude effectively trace errors back to their implementation, improving root cause analysis quality.

380-400: Great addition for diagnostic improvement feedback loop.

This section creates a valuable feedback mechanism for must-gather evolution. By explicitly prompting Claude to identify missing diagnostic data, you enable continuous improvement of the data collection process. The structured format with examples makes the feedback actionable.

412-414: LGTM!

This guidance reinforces the must-gather feedback loop introduced earlier, ensuring Claude explicitly notes when missing information hampers analysis. This consistency strengthens the diagnostic improvement mechanism.

455-465: LGTM!

The source code investigation instructions are well-integrated into the Claude invocation prompt. They provide clear guidance on using source code to understand error conditions, retry logic, and edge cases. This complements the earlier "Source Code Investigation" section and leverages the --add-dir "/go/src" access granted on line 436.

220-227: No documentation change needed—paths are CI infrastructure.

Lines 220-227 reference /go/src/github.com/openshift/velero/ and /go/src/github.com/openshift/oadp-operator/, which are guaranteed to exist in the CI environment. The ci-Dockerfile explicitly clones Velero to the first path (using the oadp-dev branch) and sets up the OADP operator in the second. These are container build-time dependencies, not runtime dependencies that require verification or documentation within the script. The paths are already documented in the Claude prompt itself (lines 220-227 provide detailed descriptions for Claude's analysis), and the script correctly passes --add-dir "/go/src" to grant Claude access to both directories.

Likely an incorrect or invalid review comment.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2025-12-18T16:20:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [kaovilai]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot

Pull request overview

This PR enhances the Claude-based failure analysis system by integrating Velero source code access and adding a feedback mechanism for must-gather improvements. The changes enable deeper root cause analysis by allowing Claude to investigate Velero implementation details when analyzing test failures, and create a feedback loop for improving diagnostic data collection.

Key Changes:

Added Velero source code cloning in the CI Docker image for runtime investigation during failure analysis
Extended failure analysis prompts to guide Claude through Velero source code investigation when errors originate from Velero packages
Added must-gather improvement suggestions section to capture gaps in diagnostic data collection

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
build/ci-Dockerfile	Clones openshift/velero (oadp-dev branch) to provide source code access for failure analysis
tests/e2e/scripts/analyze_failures.sh	Adds Velero source investigation prompts and must-gather feedback section to guide Claude's analysis workflow
tests/e2e/lib/flakes.go	Adds known data mover limitation to error ignore patterns based on upstream Velero issue

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-18T16:20:41Z

tests/e2e/scripts/analyze_failures.sh


 5. **Correlation**: Group related errors together - if multiple errors reference the same resource (backup name, PVC, pod), keep them together with their context.

+6. **Source references**: When you find errors from Velero packages (pkg/backup/, pkg/restore/, pkg/controller/, pkg/nodeagent/), note the file:line references for later source code investigation.


Missing space after 'pkg/nodeagent/' before the closing parenthesis for consistency with other package references.

Suggested change

6. **Source references**: When you find errors from Velero packages (pkg/backup/, pkg/restore/, pkg/controller/, pkg/nodeagent/), note the file:line references for later source code investigation.

6. **Source references**: When you find errors from Velero packages (pkg/backup/, pkg/restore/, pkg/controller/, pkg/nodeagent/ ), note the file:line references for later source code investigation.

Copilot · 2025-12-18T16:20:41Z

tests/e2e/scripts/analyze_failures.sh


 Note: Prow's build-log.txt is NOT available during this analysis (it's written after tests complete).
-Focus on JUnit results, preprocessed log summaries, must-gather diagnostics, and per-test pod logs.
+Focus on JUnit results, preprocessed log summaries, must-gather diagnostics, per-test pod logs, and Velero source investigation.


The line references 'Velero source investigation' but this capability depends on the Docker image containing the cloned source code. Consider adding a note about this dependency or verifying the source code is available before attempting investigation.

Copilot · 2025-12-18T16:20:42Z

build/ci-Dockerfile


+# Clone openshift/velero source code for failure analysis
+# Uses oadp-dev branch to match OADP operator development
+RUN git clone --depth 1 --branch oadp-dev \


The --depth 1 shallow clone limits git history to one commit. If failure analysis requires examining recent changes or commit history for context, this might be insufficient. Consider if a deeper history (e.g., --depth 50) would be more useful for analysis purposes, or document this limitation if intentional.

Suggested change

RUN git clone --depth 1 --branch oadp-dev \

RUN git clone --depth 50 --branch oadp-dev \

Enable Claude to investigate OADP operator source at /go/src/github.com/openshift/oadp-operator/ during failure analysis: - Add OADP operator source to Available Artifacts section - Rename "Velero Source Code Investigation" to "Source Code Investigation" with subsections for both Velero and OADP packages - Update Claude invocation prompt to reference OADP source - List key OADP packages: internal/controller/, pkg/velero/, pkg/credentials/, api/v1alpha1/, tests/e2e/lib/ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

openshift-ci · 2025-12-18T20:01:14Z

@kaovilai: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/4.20-e2e-test-cli-aws	`5a8aaca`	link	false	`/test 4.20-e2e-test-cli-aws`
ci/prow/4.20-e2e-test-aws	`5a8aaca`	link	true	`/test 4.20-e2e-test-aws`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copilot AI review requested due to automatic review settings December 18, 2025 16:19

openshift-ci bot requested review from eemcmullan and shubham-pampattiwar December 18, 2025 16:20

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2025

Copilot AI reviewed Dec 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance Claude failure analysis with Velero source and must-gather feedback #2051

Enhance Claude failure analysis with Velero source and must-gather feedback #2051

kaovilai commented Dec 18, 2025

Uh oh!

coderabbitai bot commented Dec 18, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 18, 2025

Uh oh!

Copilot AI Dec 18, 2025

Uh oh!

Copilot AI Dec 18, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		5. Correlation: Group related errors together - if multiple errors reference the same resource (backup name, PVC, pod), keep them together with their context.

		6. Source references: When you find errors from Velero packages (pkg/backup/, pkg/restore/, pkg/controller/, pkg/nodeagent/), note the file:line references for later source code investigation.

	RUN git clone --depth 1 --branch oadp-dev \
	RUN git clone --depth 50 --branch oadp-dev \

Enhance Claude failure analysis with Velero source and must-gather feedback #2051

Are you sure you want to change the base?

Enhance Claude failure analysis with Velero source and must-gather feedback #2051

Conversation

kaovilai commented Dec 18, 2025

Why the changes were made

How to test the changes made

Uh oh!

coderabbitai bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Dec 18, 2025 •

edited

Loading