Skip to main content

Detect duplicates in the Wagtail images library.

Project description

Wagtail Images De-duplicator

wagtail-images-deduplicator is a Wagtail app to detect duplicate images in the admin. It's built with imagehash.

Requirements

Wagtail Images De-duplicator works with wagtail>=3.0.

Installation

Use pip to install this package:

pip install wagtail-images-deduplicator

Configuration

  • Add wagtail_images_deduplicator to your INSTALLED_APPS in your project's settings.

  • Add the DuplicateFindingMixin to your custom image model. An example of doing it is shown below:

from wagtail.images.models import Image, AbstractImage, AbstractRendition

from wagtail_images_deduplicator.models import DuplicateFindingMixin


class CustomImage(DuplicateFindingMixin, AbstractImage):
    admin_form_fields = Image.admin_form_fields


class CustomRendition(AbstractRendition):
    image = models.ForeignKey(
        CustomImage, on_delete=models.CASCADE, related_name="renditions"
    )

    class Meta:
        unique_together = (("image", "filter_spec", "focal_point_key"),)

If you choose to add the mixin and have existing image data, you will need to call save() on all existing instances to fill in the new hash value:

from wagtail.images import get_image_model

for image in get_image_model().objects.all():
    image.save()

Settings

WAGTAILIMAGESDEDUPLICATOR_HASH_FUNC

This setting determines the hash function to use.

Hash function Reference Setting name
Average hashing http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html average_hash
Perceptual hashing http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html phash (default)
Difference hashing http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html dhash or dhash_vertical
Wavelet hashing https://fullstackml.com/2016/07/02/wavelet-image-hash-in-python/ whash
HSV color hashing colorhash
Crop-resistant hashing https://ieeexplore.ieee.org/document/6980335 crop_resistant_hash

WAGTAILIMAGESDEDUPLICATOR_MAX_DISTANCE_THRESOLD

This setting determines the maximum distance between 2 images to consider them as duplicates.
The default value is 5.

To help you assess how these different algorithms behave and to learn more about hash distances, check out the examples section of the imagehash library's README.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wagtail-images-deduplicator-1.0a1.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file wagtail-images-deduplicator-1.0a1.tar.gz.

File metadata

File hashes

Hashes for wagtail-images-deduplicator-1.0a1.tar.gz
Algorithm Hash digest
SHA256 1fc949c3b7e3ac4fc096fde44c3b0e541063108748e55e4df454f8ede5856db4
MD5 a707328438bfeb01de3763f2ce6c33c2
BLAKE2b-256 0fee4189203b649062a884c7670b79564810e8f76f4b9470cfa2ee0b5e58a6dd

See more details on using hashes here.

File details

Details for the file wagtail_images_deduplicator-1.0a1-py3-none-any.whl.

File metadata

File hashes

Hashes for wagtail_images_deduplicator-1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 aa6d0115eb2f911be45f9a0e33435f1b688c5718767a1741be8ca5a9a3bd2b8d
MD5 086c39267b9fae963d87e1a199642c5a
BLAKE2b-256 cd2afa16a831676c80d9fb6a4ea88366b98138c676193d01417e625c4615209f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page