Skip to content

Caracal upgrade to Rocky Linux 9.7#2126

Merged
priteau merged 19 commits intostackhpc/2024.1from
rl-9.7-caracal
Feb 23, 2026
Merged

Caracal upgrade to Rocky Linux 9.7#2126
priteau merged 19 commits intostackhpc/2024.1from
rl-9.7-caracal

Conversation

@elelaysh
Copy link
Contributor

@elelaysh elelaysh commented Feb 5, 2026

  • DOCA ofed 3.2.1 is needed for RL 9.7

Also see #2025

@elelaysh elelaysh requested a review from a team as a code owner February 5, 2026 16:35
@elelaysh elelaysh changed the base branch from stackhpc/2025.1 to stackhpc/2024.1 February 5, 2026 16:35
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a wide range of changes to support Rocky Linux 9.7 and upgrade various components. The changes include updates to package versions, container image tags, Ansible playbooks, documentation, and CI configuration. Notably, there's a significant effort to add multi-architecture support, refactor secret store deployment playbooks into a unified set, and improve the logic for fixing OVN chassis priorities. The addition of numerous release notes is a great practice. I have a few suggestions for improvement regarding a hardcoded value in an alerting rule, a dependency pointing to a temporary branch, and a long inline script that could be refactored for better maintainability. Overall, this is a substantial and well-executed upgrade.

I am having trouble creating individual review comments. Click here to see my feedback.

etc/kayobe/kolla/config/prometheus/rabbitmq.rules (23)

high

The number of RabbitMQ nodes in this alert expression is hardcoded to 3. This seems to be a regression, as a variable (alertmanager_number_of_rabbitmq_nodes) was likely used before, and is still used for another alert in this file. Hardcoding this value may cause incorrect alerts if the number of RabbitMQ nodes is different from 3. Please consider restoring the use of a variable to determine the number of nodes dynamically.

etc/kayobe/kolla/kolla-build.conf (18)

medium

The openstack-base source is pointing to a temporary branch bp/bump-django-4.2/2024.1. This is risky for long-term maintainability as temporary branches may be deleted. It's better to point to a stable tag or branch. If this is a temporary measure, it would be good to add a comment explaining the situation and when it can be reverted.

etc/kayobe/ansible/ovn-fix-chassis-priorities.yml (55-160)

medium

The shell script in this task is very long and complex. Embedding large scripts directly in Ansible playbooks makes them difficult to read, maintain, and test. Consider moving this script to a separate file within the repository (e.g., in a files/ or scripts/ directory) and executing it by copying it to the target container and running it with ansible.builtin.command. This would improve readability and maintainability of the playbook.

@elelaysh elelaysh force-pushed the rl-9.7-caracal branch 2 times, most recently from 2697c42 to 30b87fc Compare February 6, 2026 08:09
grzegorzkoper
grzegorzkoper previously approved these changes Feb 10, 2026
Copy link
Contributor

@grzegorzkoper grzegorzkoper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@priteau priteau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK overall. Rebase once @jovial's PR for Nova/Ironic rebuild has merged and maybe remove the commits that are added and then reverted?

@elelaysh
Copy link
Contributor Author

elelaysh and others added 19 commits February 23, 2026 11:44
- DOCA 3.2.1 for RL 9.7
- Bump Rocky 9 Security SIG repo, add source
removes the RockyLinux minor version in the name and path
when DOCA version is greater than 3.2.0.
Doesn't apply to DOCA modules because they are still compiled
for a specific RL minor version.
Latest version for RockyLinux is 29.2
Tested on multinode.

Fix install-doca.yml to not install doca-ofed anymore (avoid dkms).

The stackhpc_doca_kernel_version_matrix variable contains kernel module
versions to install for last 2 supported minor RockyLinux versions.
It must be changed after a new pre-compiled kernel module version has been built.
to see which sources are downloaded before docker build
to accomodate temporary errors from ark (was getting a 500 error)
Use the authenticating pulp_proxy for all CI build jobs that need packages from Ark -
host images, Kolla images and the IPA image.
See actions/runs/21713574987
- bump cadvisor to 0.56.2

- Ignore CVE-2024-24790 in prometheus exporters
  control plane is trusted

- Upgrade prometheus-msteams to 1.5.3
  to fix CVE-2023-24538 CVE-2023-24540

- opensearch-dashboard: ignore CVE-2025-68428
  CVE-2025-68428 is still present in opensearch-dashboards 2.19.4
  because jspdf is still in version 3.0.1

- Ignore CVE-2024-24790 in prometheus-mtail
  control plane is trusted

- Bump grafana to 12.3.3 to fix CVE-2025-68121
  grafana server 12.3.3 is fixed but the opensearch-datasource plugin
  is still affected.

- Bump etcd to 3.5.27 to fix CVE-2025-68121

- Ignore CVE-2025-68121 for prometheus images
  - server-side: exporters and server are not listening with tls
  - as client: only querying known services

- Ignore CVE-2025-68121 for influxdb
  No new version is available and it runs on a secure network

- Ignore CVE-2025-68121 for letsencrypt-lego
  it only talks to known servers

- Ignore CVE-2025-68121 for neutron
  it is the docker client that triggers it and we don't speak to remote
  docker over tls
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants