Fix bump-version to use YAML parsing for packages.yml updates#2148
Fix bump-version to use YAML parsing for packages.yml updates#2148devin-ai-integration[bot] wants to merge 2 commits intomasterfrom
Conversation
The sed-based version replacement in bump-version.yml had two issues: 1. It only matched 'version: X.Y.Z' patterns, missing the case where master has a git hash reference for dbt-data-reliability 2. It also matched commented-out version lines Replace the sed commands for packages.yml with a Python script that properly parses the YAML and handles both cases: - Updates an existing package version reference - Replaces a git hash reference with a proper package version The sed replacement for docs snippets and pyproject.toml is kept as-is since those files always have a version pattern. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @devin-ai-integration[bot] |
📝 WalkthroughWalkthroughAdds a Python script to update the elementary package version in Changes
Sequence Diagram(s)sequenceDiagram
participant GH as "GitHub Actions"
participant Runner as "Action runner (env)"
participant Python as "bump_packages_version.py"
participant YAML as "packages.yml (repo)"
participant Git as "git commit/push"
GH->>Runner: start workflow
Runner->>Runner: Determine package version\n(write to GITHUB_OUTPUT)
Runner->>Python: export PKG_VERSION\nrun script
Python->>YAML: read `packages.yml`
Python->>YAML: update/replace elementary entry\nappend helper comments
Python-->>Runner: exit 0 / success message
Runner->>Git: commit & push updated `packages.yml`
Git-->>GH: push confirmed
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/scripts/bump_packages_version.py:
- Around line 19-20: The script currently uses PyYAML's yaml.safe_load() and
yaml.dump(), which strip comments from PACKAGES_FILE during the
read/modify/write round-trip; replace this with a round-trip YAML API that
preserves comments (e.g., ruamel.yaml's YAML() loader/dumper) so comments and
formatting in PACKAGES_FILE are retained when reading into the variable
currently named data and when writing back (the code paths that call yaml.dump
at the end of the script). Update imports and replace yaml.safe_load(...) /
yaml.dump(...) usage with the ruamel YAML().load(...) and YAML().dump(...)
workflow (or an equivalent round-trip-preserving YAML library) while ensuring
the same data modifications still target the in-memory variable (data) and are
written back to the same file.
In @.github/workflows/bump-version.yml:
- Around line 70-77: Update the "Determine package version" step (id:
pkg-version) to fail when a user-supplied dbt-package-version override is
present but validation produced an empty validated-dbt-package-version: check
whether the override (the dbt-package-version
input/needs.validate-version.outputs.requested-dbt-package-version or
equivalent) is non-empty and the
needs.validate-version.outputs.validated-dbt-package-version is empty, and if so
echo an error and exit non‑zero instead of falling back to
default-dbt-package-version; otherwise continue to set version to the validated
value or default as before.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 122bfa05-a351-4733-a09d-c2486fa51890
📒 Files selected for processing (2)
.github/scripts/bump_packages_version.py.github/workflows/bump-version.yml
| with open(PACKAGES_FILE) as f: | ||
| data = yaml.safe_load(f) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python -m pip install --quiet PyYAML
python - <<'PY'
from pathlib import Path
import yaml
path = Path("elementary/monitor/dbt_project/packages.yml")
src = path.read_text()
before = [line for line in src.splitlines() if line.lstrip().startswith("#")]
round_tripped = yaml.dump(
yaml.safe_load(src),
default_flow_style=False,
sort_keys=False,
)
after = [line for line in round_tripped.splitlines() if line.lstrip().startswith("#")]
print(f"comments_before={len(before)}")
print(f"comments_after={len(after)}")
print("sample_comments_before:")
for line in before[:10]:
print(line)
PYRepository: elementary-data/elementary
Length of output: 622
PyYAML's safe_load/dump strip all comments from packages.yml.
The file contains 7 substantive comments (e.g., version management notes for unreleased CLI versions), all of which are dropped during the round-trip through yaml.safe_load() at lines 19–20 and yaml.dump() at lines 52–57. This contradicts the PR's stated goal of preserving commented content. Use a round-trip YAML editor that retains comments and formatting instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/scripts/bump_packages_version.py around lines 19 - 20, The script
currently uses PyYAML's yaml.safe_load() and yaml.dump(), which strip comments
from PACKAGES_FILE during the read/modify/write round-trip; replace this with a
round-trip YAML API that preserves comments (e.g., ruamel.yaml's YAML()
loader/dumper) so comments and formatting in PACKAGES_FILE are retained when
reading into the variable currently named data and when writing back (the code
paths that call yaml.dump at the end of the script). Update imports and replace
yaml.safe_load(...) / yaml.dump(...) usage with the ruamel YAML().load(...) and
YAML().dump(...) workflow (or an equivalent round-trip-preserving YAML library)
while ensuring the same data modifications still target the in-memory variable
(data) and are written back to the same file.
There was a problem hiding this comment.
This is by design. The script intentionally uses yaml.safe_load() / yaml.dump() to strip comments, then re-appends the helper comments block explicitly (lines 58-59). This ensures the comments are always up-to-date with the correct version, regardless of what state the file was in before. Using ruamel.yaml for round-trip preservation would add complexity without benefit here, since the comments need to be regenerated with the new version anyway.
There was a problem hiding this comment.
Skipped: comment is from another GitHub bot.
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/bump-version.yml (1)
68-69: Pin PyYAML to a vetted version for deterministic releases.Installing
pyyamlwithout a version can make this release workflow non-reproducible and brittle over time.Proposed change
- - name: Install PyYAML - run: pip install pyyaml + - name: Install PyYAML + run: python -m pip install "PyYAML==<vetted-version>"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/bump-version.yml around lines 68 - 69, Update the "Install PyYAML" workflow step to install a specific, vetted PyYAML release instead of the unpinned package; change the pip command in the Install PyYAML step to pin to a known good version (for example pyyaml==6.0) so the release workflow is deterministic and reproducible.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.github/workflows/bump-version.yml:
- Around line 68-69: Update the "Install PyYAML" workflow step to install a
specific, vetted PyYAML release instead of the unpinned package; change the pip
command in the Install PyYAML step to pin to a known good version (for example
pyyaml==6.0) so the release workflow is deterministic and reproducible.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 97eb3bf2-b98f-4c4b-b605-4cf25c71eb0a
📒 Files selected for processing (1)
.github/workflows/bump-version.yml
Summary
The
sed-based version replacement inbump-version.ymlhad two bugs when updatingpackages.yml:sedregex only matchedversion: X.Y.Zpatterns, so whenmasterhad a git hash reference (e.g.- git: https://github.com/elementary-data/dbt-data-reliability.git+revision: <hash>), the version bump silently did nothing.sedwould replace version strings inside YAML comments, while potentially missing the actual active entry.This PR replaces the
sedcommands forpackages.ymlwith a Python script (.github/scripts/bump_packages_version.py) that uses PyYAML to properly parse the file. It handles both cases — updating an existing package version and replacing a git hash reference with a proper versioned package entry. Thesedreplacement fordocs/_snippets/quickstart-package-install.mdxandpyproject.tomlis kept as-is since those files always have a simple version pattern.The two conditional steps ("using input" / "using default") are also consolidated into a single "Determine package version" step that resolves the version once, simplifying the workflow.
Updates since last revision
dbt-package-versionis provided as input but fails validation, instead of silently falling back to the CLI version. This prevents accidentally releasing with the wrong package version. (Requested by @haritamar based on coderabbit review.)Review & Testing Checklist for Human
yaml.dump()serializes differently than the hand-written file. In particular, thedbt_utilsversion constraint[">=0.8.0", "<0.9.0"]will be written as a YAML block list instead of inline flow style. Verify thatdbt depsstill parses the resultingpackages.ymlcorrectly.yaml.dump()writes the file, the helper comments block is appended as raw text. Verify the resulting file looks reasonable (comments should appear after thepackages:entries).inputs.dbt-package-versionnon-empty butvalidated-dbt-package-versionempty → error) won't block legitimateworkflow_callinvocations wheredbt-package-versionis intentionally omitted.packages.ymlin both scenarios — when master has a version reference, and (more importantly) when master has a git hash reference.Notes
PKG_VERSIONenv var instead of a heredoc to avoid prettier YAML parsing issues in the workflow file.Summary by CodeRabbit