Fix DagVersion.bundle_version not refreshed when only the bundle SHA changes#67545
Open
avolant wants to merge 4 commits into
Open
Fix DagVersion.bundle_version not refreshed when only the bundle SHA changes#67545avolant wants to merge 4 commits into
avolant wants to merge 4 commits into
Conversation
When `SerializedDagModel.write_dag` is called for an unchanged DAG file that ships in a new bundle commit, the short-circuit at the top of `write_dag` returns False before updating the existing `DagVersion`'s `bundle_version`. The condition only checks `dag_hash` and `bundle_name`, so a bundle redeploy with byte-identical DAG code never refreshes the `bundle_version` column, even though the bundle manager's `_bundle_versions` map already points at the new commit. Downstream consumers (e.g. the DAG source endpoint, task instance history, anything that resolves "which bundle commit ran this DAG?") then resolve to a stale commit and serve outdated job code from sibling files in the bundle. Add `dag_version.bundle_version == bundle_version` to the short-circuit so a bundle SHA change forces a new `DagVersion` row to be created, mirroring the existing behaviour for `bundle_name` changes. Includes a regression test alongside the existing `test_new_dag_version_created_when_bundle_name_changes_and_hash_unchanged`. Signed-off-by: avolant <arthur.volant@datadoghq.com>
Signed-off-by: avolant <arthur.volant@datadoghq.com>
Pedrinhonitz
suggested changes
May 26, 2026
Signed-off-by: avolant <arthur.volant@datadoghq.com>
Contributor
Author
|
Manually patched our deployment with my commit, and has been working for the past 48hours in production |
Member
|
@avolant A few things need addressing before review — see our Pull Request quality criteria. No rush. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SerializedDagModel.write_dagshort-circuits when the DAG hash and bundle name are unchanged, returning before persisting the newbundle_version. As a result, redeploying a bundle (new commit / archive) where a given DAG file is byte-identical leaves the existingDagVersion.bundle_versionrow pointing at the previous commit, even thoughDagFileProcessorManager._bundle_versions[bundle]already reflects the new commit.Downstream callers that resolve "which bundle commit produced this DAG?" — for example the DAG source endpoint and task-instance/version history views — then return stale bundle artifacts (sibling files such as job definitions are served from the previous commit), which is surprising for users who redeploy a bundle expecting the new code to take effect.
Fix
Extend the short-circuit predicate with
dag_version.bundle_version == bundle_version. If a bundle SHA change is observed for an otherwise-unchanged DAG, a newDagVersionrow is created — mirroring the existing behaviour whenbundle_namechanges (covered bytest_new_dag_version_created_when_bundle_name_changes_and_hash_unchanged).The fix is safe for
Nonebundle versions: both sides areNonefor non-versioned bundles (e.g.LocalDagBundle), so the new term is a no-op there.Reproduction
With a versioned
LocalDagBundle-style bundle (get_current_version()returning the commit SHA) andmin_file_process_interval = 0so the dag-processor re-queues the DAG on every refresh:v1, parse —DagVersion.bundle_version = v1.v2where the DAG.pyis byte-identical (only sibling files change)._bundle_versions[<bundle>] = v2, butwrite_dagreturns False at the short-circuit andDagVersion.bundle_versionstays atv1.With this patch, step 2 creates a new
DagVersionwithbundle_version = v2.Test
Added
test_new_dag_version_created_when_bundle_version_changes_and_hash_unchangednext to the analogousbundle_nametest inairflow-core/tests/unit/models/test_serialized_dag.py.Was generative AI tooling used to co-author this PR?