Skip to content

diff: walk LCS as ordered list, not set, to fix block ordering#94

Open
pmolodo wants to merge 2 commits into
pmolodowitch/ast-diff-refactorfrom
pmolodowitch/diff-fix-duplicate-item-ordering
Open

diff: walk LCS as ordered list, not set, to fix block ordering#94
pmolodo wants to merge 2 commits into
pmolodowitch/ast-diff-refactorfrom
pmolodowitch/diff-fix-duplicate-item-ordering

Conversation

@pmolodo
Copy link
Copy Markdown
Collaborator

@pmolodo pmolodo commented May 1, 2026

diff_block_lists and diff_list_nodes previously checked LCS membership
via set lookup, then assumed two pointers each "in the LCS" must be at
the same LCS position. With duplicate nodes (or earlier pointer drift)
the two pointers could be at different LCS elements; pairing them
advanced both ptrs across mismatched positions and produced merged
output where deletions and insertions were no longer adjacent for
_pair_adjacent_changes to fold into substitutions.

Drive the merge from the LCS list in order: at each LCS element drain
non-LCS blocks from 'before' as deletions and from 'after' as
insertions, then take the LCS block. Same shape applied to
diff_list_nodes.

pmolodo and others added 2 commits April 28, 2026 18:29
ie, a before with:

- <unchanged node that is duplicated elsewhere>
- <altered node - old>

and after with:

- <unchanged node that is duplicated elsewhere>
- <altered node - new>

would sometimes result in:

- <Added: <changed node - new>>
- <unchanged node that is duplicated elsewhere>
- <Removed: <changed node - old>>

...instead of

- <unchanged node that is duplicated elsewhere>
- <Removed: <changed node - old>>
- <Added: <changed node - new>>

This would make it seem that the new version moved the unchanged node
after the altered node, when it didn't. Additionally, it prevented the altered
node from being detected as a substitution.
diff_block_lists and diff_list_nodes previously checked LCS membership
via set lookup, then assumed two pointers each "in the LCS" must be at
the same LCS position.  With duplicate nodes (or earlier pointer drift)
the two pointers could be at different LCS elements; pairing them
advanced both ptrs across mismatched positions and produced merged
output where deletions and insertions were no longer adjacent for
_pair_adjacent_changes to fold into substitutions.

Drive the merge from the LCS list in order: at each LCS element drain
non-LCS blocks from 'before' as deletions and from 'after' as
insertions, then take the LCS block.  Same shape applied to
diff_list_nodes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@OleksiyPuzikov OleksiyPuzikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants