Analyse all files in one or more directories and manage duplicate files (the same file present with different names)
Project description
Introduction
This application help you cleaning your filesystem from duplicate files. The duplicate meaning here is: two or more files have the same content but can have different names.
You can use it in this way:
Usage: duplicatefinder.py [options] [directories] Analyse all files in one or more directories and manage duplicate files (the same file present with different names) Options: --version show program's version number and exit -h, --help show this help message and exit -a ACTION, --action=ACTION choose an action to do when a duplicate is found. Valid options are print,rename,move,ask; print is the default -r, --recursive also check files in subdirectories recursively -p PREFIX, --prefix=PREFIX prefix used for renaming duplicated files when the 'rename' action is chosen. Default is "DUPLICATED" -m PATH, --move-path=PATH the directory where duplicate will be moved when the 'move' action is chosen -v, --verbose more verbose output -q, --quiet do not print any messages at all Filters: Use those options to limit and filter directories and files to check. Options belowe that rely on file or directory name support usage of jolly characters and can also be used multiple times -s MIN_SIZE, --min-size=MIN_SIZE indicate the min size in bytes of a file for being checked. Default is 128. Empty file are always ignored --include-dir=INCLUDE_DIR only check directories with this name --exclude-dir=EXCLUDE_DIR do not check directories with this name --include-file=INCLUDE_FILE limit the search inside file with that name --exclude-file=EXCLUDE_FILE ignore the search inside file with that name Report bugs (and suggestions) to <luca@keul.it>.
TODO
More tests coverage (maybe some tests can be merged togheter).
Controls recursion maximum depth.
Internationalization (at least italian).
A “move to trash” action (dependency on trash-cli can be a great idea).
Release this as a Debian/Ubuntu/Kubuntu package (I’ll really love this).
Credits
Thanks to Lord Epzylon for sending me some code and modifications.
Subversion and other
The SVN repository is hosted at the Keul’s Python Libraries
Changelog
0.3.0
The runnable script name has been changed to duplicatefinder.py.
You can now pass multiple target directories as parameters.
Added a –action=ask option for choosing at every duplicate what action perform (interactive mode).
Added the –include-dir option for limit the search only to specific directories.
Added the –exclude-dir option for skipping the search from some directories.
Added the –include-file option for match only some files in the search.
Added the –exclude-file option for skipping files from the search, based on file name.
Using a wrong directory name was not handled, but was producing only abnormal termination.
More kindly handle of the break (CTRL+C) user’s action.
Added the –verbose option to print some more message infos.
Added the –quiet option to output nothing at all.
Removed the _same_file function. Python already have a filecmp module (hoping this is faster)!
Added environment for automated tests, and tests too (use the –action=tests).
Some fixes to the command line help.
0.2.0
Added the move action.
Added the –recursive option, to walk an entire tree of folders (thanks to Lord Epzylon).
Added the –min-size option, to specify a minimum size of the files to be checked.
0.1.2
Bad bug in the setup.py. Code was ok but the 0.1.1 egg was not installable. Thanks to the everywhere present A. Jung.
0.1.1
Fix to the setup.py script.
Added doc infos.
First egg official release.
0.1.0 - Unreleased
First (un)release
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for PyDirDuplicateFinder-0.3.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c4cfbeaf247266ba37a0315f7f0a3f94a265767c88e5fc666d0ae8fa34f9618 |
|
MD5 | f9953ec624f6bc06d749d11937d95e86 |
|
BLAKE2b-256 | a2c6638b2a5d5baddae5e71500015c1f4565ee8cd31800450b414a19001a2ed9 |
Hashes for PyDirDuplicateFinder-0.3.0-py2.5.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09d7059a99a6025385035bb0dbdfba8ec08ddc82088e1fba174d8415b467de11 |
|
MD5 | 9985f235dd81b4aa67d899db60ac9577 |
|
BLAKE2b-256 | e2aca681a3e02c1f50381c5cdd4a71faa8eb6dc5a3d76e6c0b48e0ef4886f9a3 |