A library to find duplicate images and delete unwanted ones
Project description
py-image-dedup
py-image-dedup is a tool to scan through a library of photos, find duplicates and remove them in a prioritized way.
It is build upon Image-Match a very popular library to compute a pHash for an image and store the result in an ElasticSearch backend for very high scalability.
How to use
Setup elasticsearch backend
Elasticsearch version
This library requires elasticsearch version 5 or later. Sadly the Image-Match library specifies version 2 for no apparent reason, so you have to remove this requirement from it's requirements.
Because of this py-image-dedup will exit with an error on first install.
To fix this find the installed files of the image-match library, f.ex.
../venv/lib/python3.6/site-packages/image_match-1.1.2-py3.6.egg-info/requires.txt
and remove the second line
elasticsearch<2.4,>=2.3
from the file.
After that py-image-dedup should install and run as expected.
Set up the index
Since this library is based on Image-Match you need a running elasticsearch instance for efficient storing and querying of image signatures.
py-image-dedup uses a single index called images
that you can create using the following command:
curl -X PUT "192.168.2.24:9200/images?pretty" -H "Content-Type: application/json" -d "
{
\"mappings\": {
\"image\": {
\"properties\": {
\"path\": {
\"type\": \"keyword\",
\"ignore_above\": 256
}
}
}
}
}
Configuration
py-image-dedup offers customization options to make sure it can detect the best image with the highest probability possible.
Name | Description | Default |
---|---|---|
threads | Number of threads to use for image analysis | 2 |
recursive | Toggle to analyse given directories recursively | False |
search_across_dirs | Toggle to allow duplicate results across given directories | False |
file_extensions | Comma separated list of file extensions to analyse | "png,jpg,jpeg" |
max_dist | Maximum distance of image signatures to consider. This is a value in the range [0..1] | 0.1 |
Command line usage
py-image-dedup can be used from the command line like this:
py-image-dedup deduplicate --help
Have a look at the help output to see how you can customize it.
Dry run
To analyze images and get an overview of what images would be deleted be sure to make a dry run first.
py-image-dedup -d "/home/mydir" --dry-run
Contributing
GitHub is for social coding: if you want to write code, I encourage contributions through pull requests from forks of this repository. Create GitHub tickets for bugs and new features and comment on the ones that you are interested in.
License
py-image-dedup by Markus Ressel
Copyright (C) 2018 Markus Ressel
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file py-image-dedup-1.0.0.tar.gz
.
File metadata
- Download URL: py-image-dedup-1.0.0.tar.gz
- Upload date:
- Size: 18.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 31043395e5ed9bb035fffbb666f027502e356801ccde48861d020e345d69cf2c |
|
MD5 | 45ad918f682cc9f8b0d88a0d89e24ba4 |
|
BLAKE2b-256 | 35c08e9362cd9e77ac98b94fdc748da0c4e89a12b0aa9a9fc9c7de3cba342f58 |
File details
Details for the file py_image_dedup-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: py_image_dedup-1.0.0-py3-none-any.whl
- Upload date:
- Size: 20.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce42bb325763f86e888e1357f4abc27f613b0289d39c6d215f1ce7c7bbf8eb51 |
|
MD5 | b2bb4d0a4d14495b2f29b1a6c100306a |
|
BLAKE2b-256 | 5930fe496043acca2fc0a88317cab359aba23e3af8ed946a240cd03324fe6c07 |