Skip to main content

Multiprocess directory iteration via os.scandir() with progress indicator via tqdm bars.

Project description

IterFilesystem

Multiprocess directory iteration via os.scandir()

Who’s this Lib for?

You want to process a large number of files and/or a few very big files and give feedback to the user on how long it will take.

Features:

  • Progress indicator:

    • Immediately after start: process files and indication of progress via multiprocess

    • process bars via tqdm

    • Estimated time based on file count and size

  • Easy to implement extra process bar for big file processing.

  • Skip directories and file name via fnmatch.

How it works:

The main process starts statistic processes in background via Python multiprocess and starts directly with the work.

There are two background statistic processes collects information for the process bars:

  • Count up all directories and files.

  • Accumulates the sizes of all files.

Why two processes?

Because collect only the count of all filesystem items via os.scandir() is very fast. This is the fastest way to predict a processing time.

Use os.DirEntry.stat() to get the file size is significantly slower: It requires another system call.

OK, but why two processed?

Use only the total count of all DirEntry may result in bad estimated time Progress indication. It depends on what the actual work is about: When processing the contents of large files, it is good to know how much total data to be processed.

That’s why we used two ways: the DirEntry count to forecast a processing time very quickly and the size to improve the predicted time.

requirements:

  • Python 3.6 or newer.

  • tqdm for process bars

  • psutils for setting process priority

  • For dev.: Pipenv. Packages and virtual environment manager

contribute

Please: try, fork and contribute! ;)

Build Status on travis-ci.org

travis-ci.org/jedie/IterFilesystem

Build Status on appveyor.com

ci.appveyor.com/project/jedie/IterFilesystem

Coverage Status on codecov.io

codecov.io/gh/jedie/IterFilesystem

Coverage Status on coveralls.io

coveralls.io/r/jedie/IterFilesystem

Requirements Status on requires.io

requires.io/github/jedie/IterFilesystem/requirements/

Example

Use example CLI, e.g.:

~$ git clone https://github.com/jedie/IterFilesystem.git
~$ cd IterFilesystem
~/IterFilesystem$ pipenv install
~/IterFilesystem$ pipenv shell
(IterFilesystem) ~/IterFilesystem$ print_fs_stats --help
(IterFilesystem) ~/IterFilesystem$ pip install -e .
...
Successfully installed iterfilesystem

(IterFilesystem) ~/IterFilesystem$ $ print_fs_stats --help
usage: print_fs_stats.py [-h] [-v] [--debug] [--path PATH]
                         [--skip_dir_patterns [SKIP_DIR_PATTERNS [SKIP_DIR_PATTERNS ...]]]
                         [--skip_file_patterns [SKIP_FILE_PATTERNS [SKIP_FILE_PATTERNS ...]]]

Scan filesystem and print some information

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  --debug               enable DEBUG
  --path PATH           The file path that should be scanned e.g.: "~/foobar/"
                        default is "~"
  --skip_dir_patterns [SKIP_DIR_PATTERNS [SKIP_DIR_PATTERNS ...]]
                        Directory names to exclude from scan.
  --skip_file_patterns [SKIP_FILE_PATTERNS [SKIP_FILE_PATTERNS ...]]
                        File names to ignore.

example output looks like this:

(IterFilesystem) ~/IterFilesystem$ $ print_fs_stats --path ~/IterFilesystem --skip_dir_patterns ".*" "*.egg-info" --skip_file_patterns ".*"
Read/process: '~/IterFilesystem'...
Skip directory patterns:
    * .*
    * *.egg-info

Skip file patterns:
    * .*

Filesystem items..:Read/process: '~/IterFilesystem'...

...

Filesystem items..: 100%|█████████████████████████████████████████|135/135 13737.14entries/s [00:00<00:00, 13737.14entries/s]
File sizes........: 100%|██████████████████████████████████████████████████████████████|843k/843k [00:00<00:00, 88.5MBytes/s]
Average progress..: 100%|███████████████████████████████████████████████████████████████████████████████████████|00:00<00:00
Current File......:, /home/jens/repos/IterFilesystem/Pipfile


Processed 135 filesystem items in 0.02 sec
SHA515 hash calculated over all file content: 10f9475b21977f5aea1d4657a0e09ad153a594ab30abc2383bf107dbc60c430928596e368ebefab3e78ede61dcc101cb638a845348fe908786cb8754393439ef
File count: 109
Total file size: 843.5 KB
6 directories skipped.
6 files skipped.

History

Donating

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iterfilesystem-1.3.1.tar.gz (19.0 kB view details)

Uploaded Source

Built Distributions

iterfilesystem-1.3.1-py3.6.egg (19.1 kB view details)

Uploaded Source

iterfilesystem-1.3.1-py2.py3-none-any.whl (21.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file iterfilesystem-1.3.1.tar.gz.

File metadata

  • Download URL: iterfilesystem-1.3.1.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.8

File hashes

Hashes for iterfilesystem-1.3.1.tar.gz
Algorithm Hash digest
SHA256 32f458fea0c606adc21c113e4379f7b9845c7f4bb498b63dc27e33363a419421
MD5 8340dc5f3e2a245566978fe15d5b9ff3
BLAKE2b-256 c6c53aa01251e5d1b24087d25b6079d3036d4422dad3927c5b3de2554c80bbfd

See more details on using hashes here.

Provenance

File details

Details for the file iterfilesystem-1.3.1-py3.6.egg.

File metadata

  • Download URL: iterfilesystem-1.3.1-py3.6.egg
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.8

File hashes

Hashes for iterfilesystem-1.3.1-py3.6.egg
Algorithm Hash digest
SHA256 88d1b8198e8d03342fc148703df58ed40923cb78a17ab129537b676f67010a78
MD5 c92788f52607c46af5e5bf9b23e26c7e
BLAKE2b-256 7d37f645855ae2c974d0515003a6a97895bb6aba2ce1be0e5cc10286d9568efc

See more details on using hashes here.

Provenance

File details

Details for the file iterfilesystem-1.3.1-py2.py3-none-any.whl.

File metadata

  • Download URL: iterfilesystem-1.3.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 21.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.8

File hashes

Hashes for iterfilesystem-1.3.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 bb8d6d7d0db70fc10e68e9387ddf2cd5414847e2edff135eb27b9b396d4bec86
MD5 4a0c7200afdf202ffe2f9f357028185f
BLAKE2b-256 f4d51872bf0353865d3048c6ac81d2ce903d61470e0795654b88d2ba760bbbab

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page