a de-duplication command line tool
Project description
- Version:
- 0.1.5
- Copyright:
- This document has been placed in the public domain.
Summary
A deduplication command line tool and library. A relatively efficient algorithm based on searching like sized files, and then performing a full md5 checksum, is used to determine duplicate files/file objects. Files can be deleted upon discovery, and pattern matching can be used to limit search results. Finally, configuration file use is supported, and there is a developing API that lends itself to customization via an ActionsMixin class.
Example CLI Usage:
Size:
Search by size using –size or -s option:
liten.py -s 1 /mnt/raid is equal to liten.py -s 1MB /mnt/raid liten.py -s 1bytes /mnt/raid liten.py -s 1KB /mnt/raid liten.py -s 1MB /mnt/raid liten.py -s 1GB /mnt/raid liten.py c:\in d:\ is equal to liten.py -s 1MB c:\in d:\
Report Location:
Generate custom report path using -r or –report=/tmp/report.txt:
./liten.py --report=/tmp/test.txt /Users/ngift/Documents
By default a report will be created in CWD, called LitenDuplicateReport.csv
Config File:
You can use a config file in the following format:
[Options] path=/tmp size=1MB pattern=*.m4v delete=True
You can call the config file anything and place it anywhere.
Here is an example usage:
./liten.py --config=myconfig.ini
Verbosity:
All stdout can be suppressed by using –quiet or -q.
Delete:
By using –delete the duplicate files will be automatically deleted. The API has support for an interactive mode and a dry-run mode, they have not been implemented in the CLI as of yet.
Example Library/API Usage:
>>> Liten = Liten(spath='testData') >>> dupeFileOne = 'testData/testDocOne.txt' >>> checksumOne = Liten.createChecksum(dupeFileOne) >>> dupeFileTwo = 'testData/testDocTwo.txt' >>> checksumTwo = Liten.createChecksum(dupeFileTwo) >>> nonDupeFile = 'testData/testDocThree_wrong_match.txt' >>> checksumThree = Liten.createChecksum(nonDupeFile) >>> checksumOne == checksumTwo True >>> checksumOne == checksumThree False
There is also the concept of an Action, which can be implemented later, that will allow customizable actions to occur upon an a condition that gets defined as you walk down a tree of files.
Tests:
Run Doctests: ./liten -t or –test
Run test_liten.py
- Run test_create_file.py then delete those test files using liten::
python2.5 liten.py –delete /tmp
Display Options:
Stdout:
stdout will show you duplicate file paths and sizes such as:
Printing dups over 1 MB using md5 checksum: [SIZE] [ORIG] [DUP] 7 MB Orig: /Users/ngift/Downloads/bzr-0-2.17.tar Dupe: /Users/ngift/Downloads/bzr-0-4.17.tar
Report:
A report named LitenDuplicateReport.csv will be created in your current working directory:
Duplicate Version, Path, Size, ModDate Original, /Users/ngift/Downloads/bzr-0-2.17.tar, 7 MB, 07/10/2007 01:43:12 AM Duplicate, /Users/ngift/Downloads/bzr-0-3.17.tar, 7 MB, 07/10/2007 01:43:27 AM
Debug Mode Environmental Variables:
To enable print statement debugging set LITEN_DEBUG to 1
To enable pdb break point debugging set LITEN_DEBUG to 2
LITEN_DEBUG_MODE = int(os.environ.get(‘LITEN_DEBUG’, 0))
Note: When DEBUG MODE is enabled, a message will appear to standard out
QUESTIONS: noah dot gift at gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file liten-0.1.6.tar.gz
.
File metadata
- Download URL: liten-0.1.6.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cfa32ef3faaa9ee50cedb4ababf2d04a697916b8df54d8caa86f1e05ea417377 |
|
MD5 | 9a65b60952d5c51a1fbf64e5a0b26f7c |
|
BLAKE2b-256 | f537b609912784be69876beb1f386faf7bd5bcaa3d387f226c69267513dd0b49 |
File details
Details for the file liten-0.1.6-py2-none-any.whl
.
File metadata
- Download URL: liten-0.1.6-py2-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 2
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26432a0624d9042916091e250e96e01c996806a1364eb84f9b7fe18c5913a348 |
|
MD5 | 8b24947e6a1b611f768c14418554351a |
|
BLAKE2b-256 | e30695ed41fde2b38500e8758d46b1de68ce01a3d6bbc401a178beab1bd96c1b |