Skip to content

Dedupe individual tracks during import based on stable identifier#6723

Open
tommyschnabel wants to merge 8 commits into
beetbox:masterfrom
tommyschnabel:duplicates_import_actions
Open

Dedupe individual tracks during import based on stable identifier#6723
tommyschnabel wants to merge 8 commits into
beetbox:masterfrom
tommyschnabel:duplicates_import_actions

Conversation

@tommyschnabel

Copy link
Copy Markdown
Contributor

Description

This PR adds the duplicate_track_resolution (bool) and duplicate_track_action (skip, remove, keep/merge, ask) config options to import to enable deduplication of individual tracks when importing.

The recommended config for this is:

import:
    duplicate_track_resolution: yes
    duplicate_action: ask          # whole-album duplicates
    duplicate_track_action: skip   # per-track duplicates: fold new tracks into the existing album
    duplicate_keys:
        item: mb_trackid           # match on a stable id (recommended when autotagging)

With the above config, any duplicate imports of tracks will be dropped, and if the whole album has been imported before, the album import itself will be dropped. If part of an album is deduped, the new tracks will be folded into the existing album. This config allows for idempotency on any repeat beet import executions.

Copied from the new docs, here's what each option does:

skip drops the duplicate tracks and adds the remaining new tracks to the
existing album they belong to, instead of importing them as a separate album.
Use this to complete a partially-imported album. If every track is already
present, the whole album is skipped. (If the matching tracks do not all belong
to a single album, the new tracks are imported as their own album.)

  • remove removes the matching old items from the library before importing.
  • keep (and merge) import the album unchanged.
  • ask prompts you to choose one of the above.

The duplicate_keys config option is what's used to dedupe, so set it to something stable like mb_trackid.

In the previous PR, @snejus suggested adding this as an addition to the album deduplication that already exists. This implementation is separate from the album dedupe logic because the use-cases seemed incompatible.

The album dedupe skips/merges/removes based on an album already being imported, but the use-case I wanted was a deeper track-based mechanism. Bolting this onto the album dedupe produced some really weird results, and I wasn't able to achieve parity with my last impl with the available config options. This is a better, more flexible, approach.

To Do

  • Documentation. (If you've added a new command-line flag, for example, find the appropriate page under docs/ to describe it.)
  • Changelog. (Add an entry to docs/changelog.rst to the bottom of one of the lists near the top of the document.)
  • Tests. (Very much encouraged but not strictly required.)

@tommyschnabel tommyschnabel requested a review from a team as a code owner June 8, 2026 21:31
@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 75.53191% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.78%. Comparing base (8bfcfb1) to head (b03e30c).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
beets/ui/commands/import_/session.py 6.66% 14 Missing ⚠️
beets/importer/tasks.py 81.48% 2 Missing and 3 partials ⚠️
beets/importer/stages.py 92.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6723      +/-   ##
==========================================
- Coverage   74.78%   74.78%   -0.01%     
==========================================
  Files         163      163              
  Lines       20966    21049      +83     
  Branches     3302     3326      +24     
==========================================
+ Hits        15680    15742      +62     
- Misses       4529     4547      +18     
- Partials      757      760       +3     
Files with missing lines Coverage Δ
beets/importer/session.py 93.10% <100.00%> (+0.09%) ⬆️
beets/importer/stages.py 92.75% <92.00%> (-0.29%) ⬇️
beets/importer/tasks.py 90.89% <81.48%> (-0.27%) ⬇️
beets/ui/commands/import_/session.py 57.91% <6.66%> (-3.42%) ⬇️
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

tommyschnabel and others added 4 commits June 16, 2026 18:55
Move the import-time duplicate-track handling out of the `duplicates`
plugin and into importer core, next to the existing duplicate machinery.

Add the `import.duplicate_track_resolution` option (off by default). When
enabled, album imports check each track against the library using
`import.duplicate_keys.item` and resolve matches via the existing
`import.duplicate_action`:

- `skip` drops the already-imported tracks and imports the rest of the
  album (a fully-duplicate album is skipped);
- `remove` removes the matching old library items;
- `keep`/`merge` import everything unchanged;
- `ask` prompts the session.

The check runs before the autotag lookup so dropped tracks are excluded
from the match. The `duplicates` plugin is left untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Match track duplicates against all library items (including album members,
not just singletons), so re-importing catches tracks already present in an
existing album.

When per-track resolution prunes some tracks of an album, suppress the
album-level duplicate check so the remaining new tracks are still imported
instead of the whole (partial) album being skipped.

Add a dedicated `duplicate_track_action` option (inheriting `duplicate_action`
when unset) with a new `fold` action that adds the remaining new tracks to the
existing album they belong to, completing a partially-imported album.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Collapse the separate `fold` track duplicate action into `skip`: skipping
per-track duplicates now folds the remaining new tracks into the existing
album they belong to (falling back to a new album when the matches do not
resolve to a single album). Removes the `fold` action, prompt option, and
`FOLD` resolution.

Show the per-track "Skipping duplicate track" message and the whole-album
"Skipping album, all tracks are duplicates" message in the warning color.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@tommyschnabel tommyschnabel force-pushed the duplicates_import_actions branch from 6ece05a to 41c25aa Compare June 16, 2026 23:00
@tommyschnabel

Copy link
Copy Markdown
Contributor Author

@semohr this is ready for review

@snejus

snejus commented Jun 20, 2026

Copy link
Copy Markdown
Member

Hi @tommyschnabel, thanks for this work! I would suggest to rebase as there has been a considerable amount of changes and see how refactored duplicates handling by introducing DuplicateAction. See how you'd be able to use/reuse it here?

tommyschnabel and others added 2 commits June 23, 2026 14:19
…_actions

# Conflicts:
#	beets/importer/session.py
#	beets/importer/stages.py
#	beets/importer/tasks.py
#	beets/test/helper.py
#	docs/changelog.rst
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@tommyschnabel

Copy link
Copy Markdown
Contributor Author

@snejus got your changes integrated! Ready for a review 😄

@tommyschnabel

Copy link
Copy Markdown
Contributor Author

@semohr ready for review again 😄

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

grug see PR add per-track duplicate handling during album import. goal is make repeat beet import more idempotent by matching tracks on stable key (like mb_trackid) and then skip/remove/keep/ask for only the dup tracks.

Changes:

  • add import.duplicate_track_resolution + import.duplicate_track_action config, plus docs + changelog entry
  • add importer pipeline stage to detect/resolve per-track dupes before autotag lookup, with optional fold-into-existing-album behavior
  • add tests for skip/remove/keep/ask flows and fold behavior

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/test_importer.py add tests for per-track duplicate resolution behaviors
beets/ui/commands/import_/session.py add interactive prompt for track-duplicate resolution in terminal session
beets/test/helper.py extend ImportSessionFixture to simulate track-duplicate prompt answers
beets/importer/stages.py implement per-track duplicate detection + resolution pipeline stage
beets/importer/tasks.py factor duplicate-item deletion helper; support folding new tracks into existing album; remove track-dup items during manipulate stage
beets/importer/session.py add ImportSession abstract hook for track-duplicate resolution + stage wiring
beets/config_default.yaml add default config keys for new options
docs/reference/config.rst document new config options and behavior notes
docs/changelog.rst add changelog entry for new feature

Comment on lines +213 to +217
existing = [item for matches in duplicates.values() for item in matches]
ui.print_("Old: " + summarize_items(existing, True))
if config["import"]["duplicate_verbose_prompt"]:
for item in existing:
print(f" {item}")
Comment on lines +219 to +222
ui.print_("New: " + summarize_items(list(duplicates), True))
if config["import"]["duplicate_verbose_prompt"]:
for item in duplicates:
print(f" {item}")
Comment thread beets/test/helper.py
Comment on lines +673 to +682
Resolution = Enum("Resolution", "REMOVE SKIP KEEPBOTH MERGE")

default_resolution = "REMOVE"

def resolve_track_duplicates(self, task, duplicates):
return {
self.Resolution.SKIP: "s",
self.Resolution.KEEPBOTH: "k",
self.Resolution.REMOVE: "r",
}.get(self.default_resolution, "k")

@snejus snejus left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! The first round of reviews :)

Comment thread beets/test/helper.py

choose_item = choose_match

Resolution = Enum("Resolution", "REMOVE SKIP KEEPBOTH MERGE")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we introducing a new Enum here? Can't we reuse DuplicateAction?

"""Decide what to do with album tracks already in the library."""
log.warning("Some tracks are already in the library!")

if config["import"]["quiet"]:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast these config options to bool:

Suggested change
if config["import"]["quiet"]:
if config["import"]["quiet"].get(bool):


return action

def resolve_track_duplicates(self, task, duplicates) -> str:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to see types here!

Comment thread docs/reference/config.rst
A typical configuration for completing partially-imported albums while
autotagging looks like this:

::

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use

Suggested change
::
.. code-block:: yaml

for syntax highlighting.

Comment thread docs/reference/config.rst
duplicate_action: ask # whole-album duplicates
duplicate_track_action: skip # per-track duplicates: fold new tracks into the existing album
duplicate_keys:
item: mb_trackid # match on a stable id (recommended when autotagging)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users will find it helpful to see the default values as well!

Comment thread beets/importer/session.py

# Optionally drop or replace album tracks that already exist in
# the library before the autotag lookup runs.
stages += [stagefuncs.resolve_track_duplicates(self)]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this expected to interact with the default threaded: yes mode?

Looking at the pipeline ordering, I wonder whether this stage may check a second task before the first one reaches add(). For example, if two one-track albums with the same duplicate keys are imported in one invocation, could both pass this check in threaded mode while the second one would be skipped in sequential mode? I'm not sufficiently familiar with the pipeline internals to tell whether this is possible, so could you clarify what guarantees we have here?

Have you tested such scenario yourself?

Comment thread beets/importer/stages.py
Uses ``import.duplicate_track_action`` when set, otherwise falls back to
``import.duplicate_action``.
"""
choices = {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we re-use DuplicateAction here? See how it used in such cases:

    def get_duplicate_action(
        self, task: ImportTask, found_duplicates: list[AnyLibModel]
    ) -> DuplicateAction:
        """Get the configured duplicate action."""
        choice = config["import"]["duplicate_action"].as_choice(
            DuplicateAction.choices()
        )
        log.debug("default action for duplicates: {}", choice)
        return DuplicateAction(choice)  # type: ignore[call-arg]

Then, in the logic I want to see conditionals like action is DuplicationAction.SKIP instead of action == "s".

Comment thread beets/importer/stages.py
task.duplicate_tracks_resolved = True
# Fold the remaining new tracks into the existing album, if the
# matched duplicates all belong to a single one.
album_ids = {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could filtering out matches with album_id is None affect how the fold target is selected here?

For example, if one duplicate belongs to an album and another duplicate is a singleton, would album_ids contain only the first duplicate's album ID and cause the remaining tracks to be folded into that album? The documentation says that new tracks are imported as their own album when the matching tracks do not all belong to a single album, so I wonder whether singleton matches should prevent folding in this case. Is this scenario handled elsewhere?

I would like to see a test covering the following situation:

  • Incoming album: tracks A, B, and C.
  • Existing track A belongs to album X.
  • Existing track B is a singleton.
  • Both A and B match incoming tracks.

Would you mind adding this case and making the intended outcome for track C explicit? I think this would clarify whether it should be folded into album X or imported as its own album.

Comment thread beets/test/helper.py

default_resolution = "REMOVE"

def resolve_track_duplicates(self, task, duplicates):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I doubt that we need this any more. You should be able to control this using the configuration. See the commit where I removed it for good: ca3fdb7

You can step through commits under this PR to see how I removed the need to redefine anything to do with duplicates in helper.py: https://github.com/beetbox/beets/pull/6702/commits

Comment thread beets/importer/session.py
def choose_item(self, task: SingletonImportTask) -> TrackMatch | Action:
raise NotImplementedError

def resolve_track_duplicates(self, task: ImportTask, duplicates) -> str:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this is going to be a list of Items?

Suggested change
def resolve_track_duplicates(self, task: ImportTask, duplicates) -> str:
def resolve_track_duplicates(self, task: ImportTask, duplicates: list[Item]) -> str:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants