Caracal upgrade to Rocky Linux 9.7#2126
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a wide range of changes to support Rocky Linux 9.7 and upgrade various components. The changes include updates to package versions, container image tags, Ansible playbooks, documentation, and CI configuration. Notably, there's a significant effort to add multi-architecture support, refactor secret store deployment playbooks into a unified set, and improve the logic for fixing OVN chassis priorities. The addition of numerous release notes is a great practice. I have a few suggestions for improvement regarding a hardcoded value in an alerting rule, a dependency pointing to a temporary branch, and a long inline script that could be refactored for better maintainability. Overall, this is a substantial and well-executed upgrade.
I am having trouble creating individual review comments. Click here to see my feedback.
etc/kayobe/kolla/config/prometheus/rabbitmq.rules (23)
The number of RabbitMQ nodes in this alert expression is hardcoded to 3. This seems to be a regression, as a variable (alertmanager_number_of_rabbitmq_nodes) was likely used before, and is still used for another alert in this file. Hardcoding this value may cause incorrect alerts if the number of RabbitMQ nodes is different from 3. Please consider restoring the use of a variable to determine the number of nodes dynamically.
etc/kayobe/kolla/kolla-build.conf (18)
The openstack-base source is pointing to a temporary branch bp/bump-django-4.2/2024.1. This is risky for long-term maintainability as temporary branches may be deleted. It's better to point to a stable tag or branch. If this is a temporary measure, it would be good to add a comment explaining the situation and when it can be reverted.
etc/kayobe/ansible/ovn-fix-chassis-priorities.yml (55-160)
The shell script in this task is very long and complex. Embedding large scripts directly in Ansible playbooks makes them difficult to read, maintain, and test. Consider moving this script to a separate file within the repository (e.g., in a files/ or scripts/ directory) and executing it by copying it to the target container and running it with ansible.builtin.command. This would improve readability and maintainability of the playbook.
2697c42 to
30b87fc
Compare
f406652 to
d7f0e66
Compare
|
2024.1-rocky-9-20260220T082113 image build is at https://github.com/stackhpc/stackhpc-kayobe-config/actions/runs/22216734996 OVN multinode passed in https://github.com/stackhpc/stackhpc-kayobe-config/actions/runs/22222833863/job/64282195630 |
- DOCA 3.2.1 for RL 9.7 - Bump Rocky 9 Security SIG repo, add source
removes the RockyLinux minor version in the name and path when DOCA version is greater than 3.2.0. Doesn't apply to DOCA modules because they are still compiled for a specific RL minor version.
Latest version for RockyLinux is 29.2
Tested on multinode. Fix install-doca.yml to not install doca-ofed anymore (avoid dkms). The stackhpc_doca_kernel_version_matrix variable contains kernel module versions to install for last 2 supported minor RockyLinux versions. It must be changed after a new pre-compiled kernel module version has been built.
to see which sources are downloaded before docker build
to accomodate temporary errors from ark (was getting a 500 error)
Use the authenticating pulp_proxy for all CI build jobs that need packages from Ark - host images, Kolla images and the IPA image.
See actions/runs/21713574987
- bump cadvisor to 0.56.2 - Ignore CVE-2024-24790 in prometheus exporters control plane is trusted - Upgrade prometheus-msteams to 1.5.3 to fix CVE-2023-24538 CVE-2023-24540 - opensearch-dashboard: ignore CVE-2025-68428 CVE-2025-68428 is still present in opensearch-dashboards 2.19.4 because jspdf is still in version 3.0.1 - Ignore CVE-2024-24790 in prometheus-mtail control plane is trusted - Bump grafana to 12.3.3 to fix CVE-2025-68121 grafana server 12.3.3 is fixed but the opensearch-datasource plugin is still affected. - Bump etcd to 3.5.27 to fix CVE-2025-68121 - Ignore CVE-2025-68121 for prometheus images - server-side: exporters and server are not listening with tls - as client: only querying known services - Ignore CVE-2025-68121 for influxdb No new version is available and it runs on a secure network - Ignore CVE-2025-68121 for letsencrypt-lego it only talks to known servers - Ignore CVE-2025-68121 for neutron it is the docker client that triggers it and we don't speak to remote docker over tls
Also see #2025