Skip to main content

Walk directories trees with os.scandir, generating DirEntry objects

Project description

scanwalk

scanwalk.walk() walks a directory tree, generating DirEntry objects. It's an alternative to os.walk() modelled on os.scandir().

>>> import scanwalk
>>> for entry in scanwalk.walk('data/demo'):
...     print(entry.path, entry.name, entry.is_dir(), entry.is_file())
...
data/demo demo True False
data/demo/adir adir True False
data/demo/adir/anotherfile anotherfile False True
data/demo/adir/anotherdir anotherdir True False
data/demo/afile afile False True

a rough equivalent with os.walk() would be

>>> import os
>>> for parent, dirs, files in os.walk('data/demo'):
...     print(parent, name, True, False)
...     for name in dirs:
...         print(os.path.join(parent, name), name, True, False)
...     for name in files:
...         print(os.path.join(parent, name), name, False, True)
...
data/demo demo True False
data/demo/adir adir True False
data/demo/afile afile False True
data/demo/adir/anotherdir anotherdir True False
data/demo/adir/anotherfile anotherfile False True

Notable features and differences between scanwalk.walk() and os.walk()

os.walk() scanwalk.walk()
Yields (dirpath, dirnames, filenames) DirEntry objects
Consumers Nested for loops for loop, generator expression, or comprehension
Order Sorted, directories & files seperate Unsorted, directories & files intermingled
Traversal Depth first or breadth first Semi depth first, directories traversed on arrival
Exceptions onerror() callback try/except block
Allocations Builds intermediate lists Direct from os.scandir()
Performance 1.0x 1.1 - 1.2x faster

Installation

python -m pip install scanwalk

Requirements

  • Python 3.7+

License

MIT

Questions and Answers

What's wrong with os.walk()?

scanwalk.walk() isn't better or worse then os.walk(), each has tradeoffs. os.walk() is fine for most use cases, if you're happy with it then carry on.

Why use scanwalk?

scanwalk.walk() eeks out a little more speed (10-20% in an adhoc benchmark). It doesn't require nested for loops, so code is easier to read and write. In particular list comprehensions and generator expressions become simpler.

Why not use scanwalk?

scanwalk is still alpha, mostly untested, and almost entirely undocumented. It only supports newer Pythons, on platforms with a working os.scandir().

scanwalk.walk() lacks features compared to os.walk()

  • entries aren't sorted, they arrive in an undefined order
  • there's no control over traversal order (e.g. depth first, breadth first)
  • there's no way to skip directories

Related work

  • scandir - backport of os.scandir() for Python 2.7 and 3.4

TODO

  • Expose directory skip mechanism, probably generator.send()
  • Implement context manager protocol, similar to os.scandir()
  • Documentation
  • Tests
  • Continuous Integration
  • Coverage
  • Code quality checks (MyPy, flake8, etc.)
  • scanwalk.copytree()?
  • scanwalk.DirEntry.depth?
  • Linux io_uring support?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scanwalk-0.0.4.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

scanwalk-0.0.4-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file scanwalk-0.0.4.tar.gz.

File metadata

  • Download URL: scanwalk-0.0.4.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for scanwalk-0.0.4.tar.gz
Algorithm Hash digest
SHA256 3f6a65fa5df788295acfb36a9eb3764384076dc62734342e6e5cc4b533255b7f
MD5 5d4679735096c3b4afa4e78b237ce5f7
BLAKE2b-256 d00d5582a893b212695152a6f5691f6a54996aab6ec9bf1158db7a81f0e083ba

See more details on using hashes here.

File details

Details for the file scanwalk-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: scanwalk-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for scanwalk-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 52051d994f85055b3a6769fb9aeae36205a644b353f47b636097173e2a24e588
MD5 e3b715daa0efd4f8de56aa0caaad0b3a
BLAKE2b-256 0c60689fe2e9f86b63702e058ec2d3c3241e003423f6b9265b28a753ebba3393

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page