Threaded directory iteration via os.scandir() with progress indicator and resume function.
Project description
IterFilesystem
Multiprocess directory iteration via os.scandir():
“stats” processes:
only counts up all directories and files.
accumulates the sizes of all files.
“worker” process:
Filesystem walk and process the real action with dir/files
among other things these packages are used:
progress bar tqdm
Requirement:
Python 3.6 or newer.
Pipenv. Packages and virtual environment manager.
Please: try, fork and contribute! ;)
Example
Use example CLI, e.g.:
~$ git clone https://github.com/jedie/IterFilesystem.git ~$ cd IterFilesystem ~/IterFilesystem$ pipenv install ~/IterFilesystem$ pipenv shell (IterFilesystem) ~/IterFilesystem$ print_fs_stats --help (IterFilesystem) ~/IterFilesystem$ pip install -e . ... Successfully installed iterfilesystem (IterFilesystem) ~/IterFilesystem$ $ print_fs_stats --help usage: print_fs_stats.py [-h] [-v] [--debug] [--path PATH] [--skip_dir_patterns [SKIP_DIR_PATTERNS [SKIP_DIR_PATTERNS ...]]] [--skip_file_patterns [SKIP_FILE_PATTERNS [SKIP_FILE_PATTERNS ...]]] Scan filesystem and print some information optional arguments: -h, --help show this help message and exit -v, --version show program's version number and exit --debug enable DEBUG --path PATH The file path that should be scanned e.g.: "~/foobar/" default is "~" --skip_dir_patterns [SKIP_DIR_PATTERNS [SKIP_DIR_PATTERNS ...]] Directory names to exclude from scan. --skip_file_patterns [SKIP_FILE_PATTERNS [SKIP_FILE_PATTERNS ...]] File names to ignore.
example output looks like this:
(IterFilesystem) ~/IterFilesystem$ $ print_fs_stats --path ~/IterFilesystem --skip_dir_patterns ".*" "*.egg-info" --skip_file_patterns ".*" Read/process: '~/IterFilesystem'... Skip directory patterns: * .* * *.egg-info Skip file patterns: * .* Filesystem items..:Read/process: '~/IterFilesystem'... ... Filesystem items..: 100%|█████████████████████████████████████████|135/135 13737.14entries/s [00:00<00:00, 13737.14entries/s] File sizes........: 100%|██████████████████████████████████████████████████████████████|843k/843k [00:00<00:00, 88.5MBytes/s] Average progress..: 100%|███████████████████████████████████████████████████████████████████████████████████████|00:00<00:00 Current File......:, /home/jens/repos/IterFilesystem/Pipfile Processed 135 filesystem items in 0.02 sec SHA515 hash calculated over all file content: 10f9475b21977f5aea1d4657a0e09ad153a594ab30abc2383bf107dbc60c430928596e368ebefab3e78ede61dcc101cb638a845348fe908786cb8754393439ef File count: 109 Total file size: 843.5 KB 6 directories skipped. 6 files skipped.
History
dev - compare v1.1.0…master
TBC
12.10.2019 - compare v1.0.0…v1.1.0
don’t create separate process for worker: Just do the work in main process
dir/file filter uses now fnmatch
12.10.2019 - compare v0.2.0…v1.0.0
refactoring:
don’t use persist-queue
switch from threading to multiprocessing
enhance progress display with multiple tqdm process bars
15.09.2019 - compare v0.1.0…v0.2.0
store persist queue in temp directory
Don’t catch process_path_item errors, this should be made in child class
15.09.2019 - compare v0.0.1…v0.1.0
add some project meta files and tests
setup CI
fix tests
15.09.2019 - v0.0.1
first Release on PyPi
Links
Donating
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for iterfilesystem-1.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 270df8edfe5347a6c0e9f202fa3de0223b6d9af933deaa7a8d6be77b4b40dbdb |
|
MD5 | 8027ffc9a252f311b698fdfd2e5fb538 |
|
BLAKE2b-256 | 769bc25ad900f23d8139e178f2ed335c34379584bcb5ef152bafa39d4baa2576 |