


The search for duplicates and near duplicates uses a radius query on a KD tree. Nevertheless, the approach described here was a great inspiration: Unfortunately, difPy could not be used as it does not integrate with the Photos database (minor nuisance regarding updating the database) and uses a quadratic algorithm that compares each image to all other images, i.e., N*(N-1) comparisons for N images. The approach used for scaling images is inspired by the approach taken in difPy. each set of original and duplicates is tagged with the UUID of the original and put in an album called photosdup-UUID.each time a duplicate is found, the higher-quality image (as judged by total file size) is tagged with the keyword photosdup-duplicate while the lower-quality duplicates are tagged with the keyword photosdup-original.The result of the scan is stored by photosdup in two ways:

If the graphical user interface has stability problems, force single core code using 0 for the cores parameter.
