Skip to main content

📧 CLI to deduplicate mails from mail boxes.

Project description

Mail Deduplicate

Last release Python versions Unittests status Documentation status Coverage status DOI

What is Mail Deduplicate?

Provides the mdedup CLI, an utility to deduplicate mails from a set of boxes.

Mail Deduplicate

Features

  • Duplicate detection based on cherry-picked and normalized mail headers.
  • Fetch mails from multiple sources.
  • Reads and writes to mbox, maildir, babyl, mh and mmdf formats.
  • Deduplication strategies based on size, content, timestamp, file path or random choice.
  • Copy, move or delete the resulting set of duplicates.
  • Dry-run mode.
  • Protection against false-positives with safety checks on size and content differences.
  • Supports macOS, Linux and Windows.
  • Standalone executables for Linux, macOS and Windows.
  • Shell auto-completion for Bash, Zsh and Fish.

⚠️ Warning: Performances

mdedup implementation is quite naive at the moment and everything resides in memory.

If this is good enough for a volume of a couple of gigabytes, the more emails mdedup try to parse, the closer you'll reach the memory limits of your machine. In which case mdedup will exit abrubtly, zapped by the OOM killer of your OS. Of course your mileage may vary depending on your hardware.

You can influence implementation of this feature with pull requests, or purchase of business support 🤝 and sponsorship 🫶.

Example

Installation

From sources

Easiest way is to install mdedup from sources with pipx:

$ pipx install mail-deduplicate

Other alternatives installation methods are available in the documentation.

Executables

Standalone executables of mdedup's latest version are available for several platforms and architectures:

Platform x86_64 arm64
Linux Download mdedup-linux-x64.bin
macOS Download mdedup-macos-x64.bin Download mdedup-macos-arm64.bin
Windows Download mdedup-windows-x64.exe

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mail_deduplicate-7.4.0.tar.gz (24.3 kB view details)

Uploaded Source

Built Distribution

mail_deduplicate-7.4.0-py3-none-any.whl (30.4 kB view details)

Uploaded Python 3

File details

Details for the file mail_deduplicate-7.4.0.tar.gz.

File metadata

  • Download URL: mail_deduplicate-7.4.0.tar.gz
  • Upload date:
  • Size: 24.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for mail_deduplicate-7.4.0.tar.gz
Algorithm Hash digest
SHA256 bf32f8373bb83d0608ef2c061c0954b435cf0ec81615cbd1ae634fce1daa0589
MD5 9a0d7e9ea2c38d1b53bbc0eb6f89e0c1
BLAKE2b-256 7af23685f392327cc5b6e6afad650f1338b793c48ecc3777a8937b0cc10cd5a1

See more details on using hashes here.

File details

Details for the file mail_deduplicate-7.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mail_deduplicate-7.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 07b20817dddd8ef95c6f10a4231dd4e021d4361a1ef3f9709ee37f42ac106421
MD5 3c48cc7a69c62cbac38ed1aa7eac1aed
BLAKE2b-256 28a4e8b5aa072ed54ee31a9e089c49043cfe2d51557b83d345e029670e882b99

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page