#Properly Removing Photo Duplicates Using Immich's Built-in Tools

1 messages · Page 1 of 1 (latest)

steep falcon
#

Hello everyone!

I'm trying to use Immich's duplicate detection tool to remove duplicate photos from my library. Many of my photos are duplicated only due to different levels of compression (with the uncompressed versions being larger). When I set the "Maximum image difference" parameter in the machine learning settings to 0.001, Immich fails to recognize these as duplicates. However, when I increase this parameter to 0.004, it starts flagging other similar but distinct photos as duplicates.

Could anyone suggest the correct settings or workflow to accurately detect and remove only the exact duplicates that differ solely by compression? Any advice on how to handle this within Immich would be greatly appreciated!

languid jewelBOT
#

:wave: Hey @steep falcon,

Thanks for reaching out to us. Please follow the recommended actions below; this will help us be more effective in our support effort and leave more time for building Immich immich.

References

Checklist

  1. :ballot_box_with_check: I have verified I'm on the latest release(note that mobile app releases may take some time).
  2. :ballot_box_with_check: I have read applicable release notes.
  3. :ballot_box_with_check: I have reviewed the FAQs for known issues.
  4. :ballot_box_with_check: I have reviewed Github for known issues.
  5. :ballot_box_with_check: I have tried accessing Immich via local ip (without a custom reverse proxy).
  6. :ballot_box_with_check: I have uploaded the relevant logs, docker compose, and .env files, making sure to use code formatting.
  7. :ballot_box_with_check: I have tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable

(an item can be marked as "complete" by reacting with the appropriate number)

If this ticket can be closed you can use the /close command, and re-open it later if needed.

languid jewelBOT
novel coyote
#

There is no guaranteed way for this to work, since it is based on maching learning and similarities between images

#

You just need to play with the distance like you have

steep falcon
# novel coyote You just need to play with the distance like you have

Thank you for your response! I have a couple of follow-up questions:

Am I correct in understanding that to re-run the duplicate detection, I need to first change the maximum image difference parameter, then run the smart indexing, and finally the duplicate check?

How can I best approach this task? Do you have any ideas on the simplest way to achieve it? For example, could I manually delete the smaller files from the Immich library using an external script and then run a task within Immich to sync the changes? Or, if necessary, I could try to write a script to directly delete the smaller duplicates from the library (though I’m not very experienced in this). My goal is quite straightforward—simply go through all the files with the same names and delete the ones that are smaller in size.

languid jewelBOT
steep falcon
#

It would be great if there were a separate parameter within Immich specifically for detecting duplicates—not just similar images. As I understand it, identifying exact duplicates (even with slight compression differences) is more of an algorithmic task compared to finding visually similar photos. Having this distinction could make the duplicate detection process more precise.

novel coyote
#

You can't delete images from the library, this will break Immich. Deletions need to be done through the API

#

you probably would have been better off deduping this before uploading tbh

shut violet
#

Am I correct in understanding that to re-run the duplicate detection, I need to first change the maximum image difference parameter, then run the smart indexing, and finally the duplicate check?
Changing the distance threshold and re-running duplicate detection is enough. There's no need to re-run smart search

#

You went from 0.001 to 0.004. You could try 0.002 or 0.003 instead

#

In general, the duplicate detection tool is designed to be flexible in what you consider a duplicate. For some, that might be two images that are visually indistinguishable, for others it could include compression, and for others it could extend to images that have different angles, etc.

#

It tries to answer the question "do I really need to keep all of these very similar images?", and people have different criteria for that

steep falcon
#

Thank you for the suggestion! I have already tried the intermediate distance values, but unfortunately, they didn’t show much difference from 0.001 in my case, which is why I mentioned 0.004. Also, thanks for clarifying the correct process for re-running duplicate detection.

To clarify, my goal is not to find similar photos but to remove identical ones. This is a much simpler task than using AI to analyze images. In my case, I just need to delete files with identical filenames (or nearly identical—sometimes the duplicates have a "_compressed" suffix in the filename).

Thanks to both of you for your help! I didn’t know about the API option, which is a great idea. After our discussion, I also discovered immich-go, which seems like it might handle removing files with identical names.

I have two follow-up questions:

Where can I find detailed documentation for beginners on working with the API? For example, it took me a while to figure out the correct value for the DeviceID parameter in one of the API functions. It would be great if there was a reference that explained all the functions and their parameters.
Is it possible to set specific rules in immich-go for deleting files? I’d like to set a rule that if a file "filename.jpg" exists, then "filename_compressed.jpg" should be deleted.

shut violet
#

You can ask that over at #immich-go . Also, if you know all the files with compressed in the name have originals, you can search by file name to delete them

#

And yeah, the API docs don’t have any documentation. We should add more context on usage, restrictions, etc