Skip to content

CMP-4116: Fix platform scan pod stuck when RawResultStorage is disabled#1097

Merged
rhmdnd merged 4 commits intoComplianceAsCode:masterfrom
Vincent056:fix-platform-scan-raw-result-storage-disabled
Apr 2, 2026
Merged

CMP-4116: Fix platform scan pod stuck when RawResultStorage is disabled#1097
rhmdnd merged 4 commits intoComplianceAsCode:masterfrom
Vincent056:fix-platform-scan-raw-result-storage-disabled

Conversation

@Vincent056
Copy link
Copy Markdown

Summary

  • When RawResultStorage.Enabled=false, addResultsCollectionPods unconditionally added a TLS volume referencing the result-client-cert-{scanName} secret, which is only created when raw result storage is enabled. This caused the platform scan pod to get stuck in Init:0/2.
  • Reuse existing getLogCollectorVolumeMounts() and conditionally append the TLS volume only when RawResultStorage.Enabled=true, matching the existing behavior in getNodeScannerPodVolumes.
  • Add e2e test TestScheduledSuitePlatformNoStorage covering platform scans with disabled raw result storage.

Made with Cursor

@github-actions
Copy link
Copy Markdown

🤖 To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:1097-c31c6c686380ba795e3914bd7142de390c521595

Copy link
Copy Markdown
Collaborator

@rhmdnd rhmdnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one recommendation about improving the test check by moving it to after the scan completes, which ensures we've exercised the conditional given the scan is done and must have gone through the aggregating phase successfully.

I'm concerned if we check PVCs too soon, we'll short-circuit the check. We take the same approach in TestScanSettingBindingNoStorage.

Comment thread tests/e2e/parallel/main_test.go
Comment thread tests/e2e/parallel/main_test.go Outdated
@rhmdnd rhmdnd changed the title Fix platform scan pod stuck when RawResultStorage is disabled CMP-4116: Fix platform scan pod stuck when RawResultStorage is disabled Feb 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

@Vincent056: This pull request references CMP-4116 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Summary

  • When RawResultStorage.Enabled=false, addResultsCollectionPods unconditionally added a TLS volume referencing the result-client-cert-{scanName} secret, which is only created when raw result storage is enabled. This caused the platform scan pod to get stuck in Init:0/2.
  • Reuse existing getLogCollectorVolumeMounts() and conditionally append the TLS volume only when RawResultStorage.Enabled=true, matching the existing behavior in getNodeScannerPodVolumes.
  • Add e2e test TestScheduledSuitePlatformNoStorage covering platform scans with disabled raw result storage.

Made with Cursor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@xiaojiey
Copy link
Copy Markdown
Collaborator

xiaojiey commented Mar 16, 2026

Verification pass. Really confused why this bug not observed previously.

$ oc-compliance bind -N test-cis profile/ocp4-cis profile/ocp4-cis-node
Creating ScanSettingBinding test-cis
$ oc get pod
NAME                                             READY   STATUS    RESTARTS      AGE
compliance-operator-6d56dc6c9f-4nqbs             1/1     Running   2 (45m ago)   45m
ocp4-openshift-compliance-pp-7474f47c7c-h2rpw    1/1     Running   0             44m
rhcos4-openshift-compliance-pp-6cf7d7c49-zfjlk   1/1     Running   0             44m
$ oc get scan
NAME                   PHASE     RESULT
ocp4-cis               RUNNING   NOT-AVAILABLE
ocp4-cis-node-master   RUNNING   NOT-AVAILABLE
ocp4-cis-node-worker   RUNNING   NOT-AVAILABLE
$ oc get pod -n openshift-compliance -l compliance.openshift.io/scan-name=ocp4-cis -o yaml | grep -A20 "volumes:"
    volumes:
    - emptyDir: {}
      name: content-dir
    - name: kube-api-access-n4v9s
      projected:
        defaultMode: 420
        sources:
        - serviceAccountToken:
            expirationSeconds: 3607
            path: token
        - configMap:
            items:
            - key: ca.crt
              path: ca.crt
            name: kube-root-ca.crt
        - downwardAPI:
            items:
            - fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
              path: namespace
--
    volumes:
    - emptyDir: {}
      name: tmp-dir
    - emptyDir: {}
      name: fetch-results
    - emptyDir: {}
      name: report-dir
    - emptyDir: {}
      name: content-dir
    - configMap:
        defaultMode: 493
        name: ocp4-cis-openscap-container-entrypoint
      name: ocp4-cis-openscap-container-entrypoint
    - name: kube-api-access-kd6xw
      projected:
        defaultMode: 420
        sources:
        - serviceAccountToken:
            expirationSeconds: 3607
            path: token
        - configMap:
$ oc get scan
NAME                   PHASE   RESULT
ocp4-cis               DONE    NON-COMPLIANT
ocp4-cis-node-master   DONE    COMPLIANT
ocp4-cis-node-worker   DONE    COMPLIANT
$ oc get pv
No resources found
$ oc patch ss default --type='merge' -p '{"rawResultStorage":{"enabled":true}}'
scansetting.compliance.openshift.io/default patched
$ oc-compliance rerun-now scansettingbinding test-cis
Rerunning scans from 'test-cis': ocp4-cis, ocp4-cis-node-master, ocp4-cis-node-worker
Re-running scan 'openshift-compliance/ocp4-cis'
Re-running scan 'openshift-compliance/ocp4-cis-node-master'
Re-running scan 'openshift-compliance/ocp4-cis-node-worker'
$ oc get scan -w
NAME                   PHASE     RESULT
ocp4-cis               RUNNING   NOT-AVAILABLE
ocp4-cis-node-master   RUNNING   NOT-AVAILABLE
ocp4-cis-node-worker   RUNNING   NOT-AVAILABLE
ocp4-cis-node-worker   AGGREGATING   NOT-AVAILABLE
ocp4-cis               AGGREGATING   NOT-AVAILABLE
ocp4-cis-node-master   AGGREGATING   NOT-AVAILABLE
ocp4-cis-node-worker   AGGREGATING   NOT-AVAILABLE
ocp4-cis-node-worker   DONE          COMPLIANT
ocp4-cis               AGGREGATING   NOT-AVAILABLE
ocp4-cis               DONE          NON-COMPLIANT
ocp4-cis-node-master   AGGREGATING   NOT-AVAILABLE
ocp4-cis-node-master   DONE          COMPLIANT
$ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                       STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
pvc-38052cd7-5490-45e2-b08b-dbc6dfeaf5c1   1Gi        RWO            Delete           Bound    openshift-compliance/ocp4-cis-node-worker   gp3-csi        <unset>                          2m24s
pvc-4e5a2fa7-e77d-4875-b344-1becbacd1fea   1Gi        RWO            Delete           Bound    openshift-compliance/ocp4-cis-node-master   gp3-csi        <unset>                          2m24s
pvc-ec19d635-0ca2-484a-9665-5e41d7cd3a1b   1Gi        RWO            Delete           Bound    openshift-compliance/ocp4-cis               gp3-csi        <unset>   

@github-actions
Copy link
Copy Markdown

🤖 To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:1097-58bc304ba7e0a848d24711cc75d94e460ba6a07b

@github-actions
Copy link
Copy Markdown

🤖 To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:1097-cb6fa23abbd0dd4d225df9d291faf68b723e8dcc

@Vincent056
Copy link
Copy Markdown
Author

#1125 is needed to fix e2e

The addResultsCollectionPods function unconditionally added the TLS
volume and mount referencing the result-client-cert secret, which is
only created when RawResultStorage.Enabled=true. This caused the
platform scan pod to get stuck in Init:0/2 when RawResultStorage was
disabled.

Reuse getLogCollectorVolumeMounts and conditionally append the TLS
volume, matching the existing behavior in getNodeScannerPodVolumes.

Made-with: Cursor
@Vincent056 Vincent056 force-pushed the fix-platform-scan-raw-result-storage-disabled branch from cb6fa23 to 6b34394 Compare March 31, 2026 14:48
@github-actions
Copy link
Copy Markdown

🤖 To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:1097-6b343947753c4ffbaf619c43961868ee61e175e7

@yuumasato
Copy link
Copy Markdown
Member

/test e2e-aws-parallel-arm

Copy link
Copy Markdown
Member

@yuumasato yuumasato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 2, 2026

@Vincent056: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-rosa 6b34394 link true /test e2e-rosa

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@yuumasato
Copy link
Copy Markdown
Member

/test e2e-aws-parallel-arm

@yuumasato yuumasato added this to the 1.9.0 milestone Apr 2, 2026
Copy link
Copy Markdown
Member

@yuumasato yuumasato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@Vincent056 Vincent056 requested a review from rhmdnd April 2, 2026 14:59
Copy link
Copy Markdown
Collaborator

@rhmdnd rhmdnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 2, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhmdnd, Vincent056, yuumasato

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [Vincent056,rhmdnd,yuumasato]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhmdnd rhmdnd merged commit e453026 into ComplianceAsCode:master Apr 2, 2026
18 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants