Skip to main content

A library to find duplicate images and delete unwanted ones

Project description

py-image-dedup Build Status PyPI version

py-image-dedup is a tool to scan through a library of photos, find duplicates and remove them in a prioritized way.

It is build upon Image-Match a very popular library to compute a pHash for an image and store the result in an ElasticSearch backend for very high scalability.

asciicast

How to use

Setup elasticsearch backend

Elasticsearch version

This library requires elasticsearch version 5 or later. Sadly the Image-Match library specifies version 2 for no apparent reason, so you have to remove this requirement from it's requirements.

Because of this py-image-dedup will exit with an error on first install.

To fix this find the installed files of the image-match library, f.ex.

../venv/lib/python3.6/site-packages/image_match-1.1.2-py3.6.egg-info/requires.txt    

and remove the second line

elasticsearch<2.4,>=2.3

from the file.
After that py-image-dedup should install and run as expected.

Set up the index

Since this library is based on Image-Match you need a running elasticsearch instance for efficient storing and querying of image signatures.

py-image-dedup uses a single index called images that you can create using the following command:

curl -X PUT "192.168.2.24:9200/images?pretty" -H "Content-Type: application/json" -d "
{
  \"mappings\": {
    \"image\": {
      \"properties\": {
        \"path\": {
          \"type\": \"keyword\",
          \"ignore_above\": 256
        }
      }
    }
  }
}

Configuration

py-image-dedup offers customization options to make sure it can detect the best image with the highest probability possible.

Name Description Default
threads Number of threads to use for image analysis 2
recursive Toggle to analyse given directories recursively False
search_across_dirs Toggle to allow duplicate results across given directories False
file_extensions Comma separated list of file extensions to analyse "png,jpg,jpeg"
max_dist Maximum distance of image signatures to consider. This is a value in the range [0..1] 0.1

Command line usage

py-image-dedup can be used from the command line like this:

py-image-dedup deduplicate --help

Have a look at the help output to see how you can customize it.

Dry run

To analyze images and get an overview of what images would be deleted be sure to make a dry run first.

py-image-dedup -d "/home/mydir" --dry-run

Contributing

GitHub is for social coding: if you want to write code, I encourage contributions through pull requests from forks of this repository. Create GitHub tickets for bugs and new features and comment on the ones that you are interested in.

License

py-image-dedup by Markus Ressel
Copyright (C) 2018  Markus Ressel

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-image-dedup-1.0.0.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

py_image_dedup-1.0.0-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file py-image-dedup-1.0.0.tar.gz.

File metadata

  • Download URL: py-image-dedup-1.0.0.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.2

File hashes

Hashes for py-image-dedup-1.0.0.tar.gz
Algorithm Hash digest
SHA256 31043395e5ed9bb035fffbb666f027502e356801ccde48861d020e345d69cf2c
MD5 45ad918f682cc9f8b0d88a0d89e24ba4
BLAKE2b-256 35c08e9362cd9e77ac98b94fdc748da0c4e89a12b0aa9a9fc9c7de3cba342f58

See more details on using hashes here.

File details

Details for the file py_image_dedup-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: py_image_dedup-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.2

File hashes

Hashes for py_image_dedup-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ce42bb325763f86e888e1357f4abc27f613b0289d39c6d215f1ce7c7bbf8eb51
MD5 b2bb4d0a4d14495b2f29b1a6c100306a
BLAKE2b-256 5930fe496043acca2fc0a88317cab359aba23e3af8ed946a240cd03324fe6c07

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page