Skip to main content

scandir, a better directory iterator and faster os.walk()

Project description

scandir on PyPI (Python Package Index) Travis CI tests (Linux) Appveyor tests (Windows)

scandir() is a directory iteration function like os.listdir(), except that instead of returning a list of bare filenames, it yields DirEntry objects that include file type and stat information along with the name. Using scandir() increases the speed of os.walk() by 2-20 times (depending on the platform and file system) by avoiding unnecessary calls to os.stat() in most cases.

Now included in a Python near you!

scandir has been included in the Python 3.5 standard library as os.scandir(), and the related performance improvements to os.walk() have also been included. So if you’re lucky enough to be using Python 3.5 (release date September 13, 2015) you get the benefit immediately, otherwise just download this module from PyPI, install it with pip install scandir, and then do something like this in your code:

# Use the built-in version of scandir/walk if possible, otherwise
# use the scandir module version
try:
    from os import scandir, walk
except ImportError:
    from scandir import scandir, walk

PEP 471, which is the PEP that proposes including scandir in the Python standard library, was accepted in July 2014 by Victor Stinner, the BDFL-delegate for the PEP.

This scandir module is intended to work on Python 2.6+ and Python 3.2+ (and it has been tested on those versions).

Background

Python’s built-in os.walk() is significantly slower than it needs to be, because – in addition to calling listdir() on each directory – it calls stat() on each file to determine whether the filename is a directory or not. But both FindFirstFile / FindNextFile on Windows and readdir on Linux/OS X already tell you whether the files returned are directories or not, so no further stat system calls are needed. In short, you can reduce the number of system calls from about 2N to N, where N is the total number of files and directories in the tree.

In practice, removing all those extra system calls makes os.walk() about 7-50 times as fast on Windows, and about 3-10 times as fast on Linux and Mac OS X. So we’re not talking about micro-optimizations. See more benchmarks in the “Benchmarks” section below.

Somewhat relatedly, many people have also asked for a version of os.listdir() that yields filenames as it iterates instead of returning them as one big list. This improves memory efficiency for iterating very large directories.

So as well as a faster walk(), scandir adds a new scandir() function. They’re pretty easy to use, but see “The API” below for the full docs.

Benchmarks

Below are results showing how many times as fast scandir.walk() is than os.walk() on various systems, found by running benchmark.py with no arguments:

System version

Python version

Times as fast

Windows 7 64-bit

2.7.7 64-bit

10.4

Windows 7 64-bit SSD

2.7.7 64-bit

10.3

Windows 7 64-bit NFS

2.7.6 64-bit

36.8

Windows 7 64-bit SSD

3.4.1 64-bit

9.9

Windows 7 64-bit SSD

3.5.0 64-bit

9.5

CentOS 6.2 64-bit

2.6.6 64-bit

3.9

Ubuntu 14.04 64-bit

2.7.6 64-bit

5.8

Mac OS X 10.9.3

2.7.5 64-bit

3.8

All of the above tests were done using the fast C version of scandir (source code in _scandir.c).

Note that the gains are less than the above on smaller directories and greater on larger directories. This is why benchmark.py creates a test directory tree with a standardized size.

The API

walk()

The API for scandir.walk() is exactly the same as os.walk(), so just read the Python docs.

scandir()

The full docs for scandir() and the DirEntry objects it yields are available in the Python documentation here. But below is a brief summary as well.

scandir(path=’.’) -> iterator of DirEntry objects for given path

Like listdir, scandir calls the operating system’s directory iteration system calls to get the names of the files in the given path, but it’s different from listdir in two ways:

  • Instead of returning bare filename strings, it returns lightweight DirEntry objects that hold the filename string and provide simple methods that allow access to the additional data the operating system may have returned.

  • It returns a generator instead of a list, so that scandir acts as a true iterator instead of returning the full list immediately.

scandir() yields a DirEntry object for each file and sub-directory in path. Just like listdir, the '.' and '..' pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each DirEntry object has the following attributes and methods:

  • name: the entry’s filename, relative to the scandir path argument (corresponds to the return values of os.listdir)

  • path: the entry’s full path name (not necessarily an absolute path) – the equivalent of os.path.join(scandir_path, entry.name)

  • is_dir(*, follow_symlinks=True): similar to pathlib.Path.is_dir(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases; don’t follow symbolic links if follow_symlinks is False

  • is_file(*, follow_symlinks=True): similar to pathlib.Path.is_file(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases; don’t follow symbolic links if follow_symlinks is False

  • is_symlink(): similar to pathlib.Path.is_symlink(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases

  • stat(*, follow_symlinks=True): like os.stat(), but the return value is cached on the DirEntry object; does not require a system call on Windows (except for symlinks); don’t follow symbolic links (like os.lstat()) if follow_symlinks is False

  • inode(): return the inode number of the entry; the return value is cached on the DirEntry object

Here’s a very simple example of scandir() showing use of the DirEntry.name attribute and the DirEntry.is_dir() method:

def subdirs(path):
    """Yield directory names not starting with '.' under given path."""
    for entry in os.scandir(path):
        if not entry.name.startswith('.') and entry.is_dir():
            yield entry.name

This subdirs() function will be significantly faster with scandir than os.listdir() and os.path.isdir() on both Windows and POSIX systems, especially on medium-sized or large directories.

Further reading

  • The Python docs for scandir

  • PEP 471, the (now-accepted) Python Enhancement Proposal that proposed adding scandir to the standard library – a lot of details here, including rejected ideas and previous discussion

Flames, comments, bug reports

Please send flames, comments, and questions about scandir to Ben Hoyt:

http://benhoyt.com/

File bug reports for the version in the Python 3.5 standard library here, or file bug reports or feature requests for this module at the GitHub project page:

https://github.com/benhoyt/scandir

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scandir-1.9.0.tar.gz (33.3 kB view details)

Uploaded Source

Built Distributions

scandir-1.9.0-cp37-cp37m-win_amd64.whl (21.9 kB view details)

Uploaded CPython 3.7m Windows x86-64

scandir-1.9.0-cp37-cp37m-win32.whl (21.2 kB view details)

Uploaded CPython 3.7m Windows x86

scandir-1.9.0-cp36-cp36m-win_amd64.whl (21.9 kB view details)

Uploaded CPython 3.6m Windows x86-64

scandir-1.9.0-cp36-cp36m-win32.whl (21.2 kB view details)

Uploaded CPython 3.6m Windows x86

scandir-1.9.0-cp35-cp35m-win_amd64.whl (21.9 kB view details)

Uploaded CPython 3.5m Windows x86-64

scandir-1.9.0-cp35-cp35m-win32.whl (21.2 kB view details)

Uploaded CPython 3.5m Windows x86

scandir-1.9.0-cp34-cp34m-win_amd64.whl (19.5 kB view details)

Uploaded CPython 3.4m Windows x86-64

scandir-1.9.0-cp34-cp34m-win32.whl (19.2 kB view details)

Uploaded CPython 3.4m Windows x86

scandir-1.9.0-cp27-cp27m-win_amd64.whl (20.0 kB view details)

Uploaded CPython 2.7m Windows x86-64

scandir-1.9.0-cp27-cp27m-win32.whl (19.5 kB view details)

Uploaded CPython 2.7m Windows x86

File details

Details for the file scandir-1.9.0.tar.gz.

File metadata

  • Download URL: scandir-1.9.0.tar.gz
  • Upload date:
  • Size: 33.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.9.0.tar.gz
Algorithm Hash digest
SHA256 44975e209c4827fc18a3486f257154d34ec6eaec0f90fef0cca1caa482db7064
MD5 506c4cc5f38c00b301642a9cb0433910
BLAKE2b-256 162a557af1181e6b4e30254d5a6163b18f5053791ca66e251e77ab08887e8fe3

See more details on using hashes here.

Provenance

File details

Details for the file scandir-1.9.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.9.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.9.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 a5e232a0bf188362fa00123cc0bb842d363a292de7126126df5527b6a369586a
MD5 f4f0bec10d2b36989dbc872ce405109e
BLAKE2b-256 31db1307d0f7a6869e0ecc00fd75b407a0d804e576799dd94ccf4e9fc9d57399

See more details on using hashes here.

Provenance

File details

Details for the file scandir-1.9.0-cp37-cp37m-win32.whl.

File metadata

  • Download URL: scandir-1.9.0-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.9.0-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 04b8adb105f2ed313a7c2ef0f1cf7aff4871aa7a1883fa4d8c44b5551ab052d6
MD5 6aa1931981336310062b8cc510d25be7
BLAKE2b-256 bb7712db4a5e3dac85da829269e0f2c5d4d3521cbb6335200e7d3184e3a594ea

See more details on using hashes here.

Provenance

File details

Details for the file scandir-1.9.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.9.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.9.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 c14701409f311e7a9b7ec8e337f0815baf7ac95776cc78b419a1e6d49889a383
MD5 269a85432ef404274df95834c1812519
BLAKE2b-256 fef834d9af2346360b14997b52500b602bd9b555f3964c9814815572251ca9da

See more details on using hashes here.

Provenance

File details

Details for the file scandir-1.9.0-cp36-cp36m-win32.whl.

File metadata

  • Download URL: scandir-1.9.0-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.9.0-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 c9009c527929f6e25604aec39b0a43c3f831d2947d89d6caaab22f057b7055c8
MD5 023a084945d20eb423e28ac08b4abde2
BLAKE2b-256 90b470f891a902e3e01f3896dcadec695d8585ba649e5a0ea0a3a91040665433

See more details on using hashes here.

Provenance

File details

Details for the file scandir-1.9.0-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.9.0-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.9.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 c7708f29d843fc2764310732e41f0ce27feadde453261859ec0fca7865dfc41b
MD5 5aa90565047d1433e5a1a3a066100000
BLAKE2b-256 659185974609fe7f65d0bce3387dfde34e96786158bcc5198a0c307b28b11cdf

See more details on using hashes here.

Provenance

File details

Details for the file scandir-1.9.0-cp35-cp35m-win32.whl.

File metadata

  • Download URL: scandir-1.9.0-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.9.0-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 61859fd7e40b8c71e609c202db5b0c1dbec0d5c7f1449dec2245575bdc866792
MD5 addf3991929fcfb13ad443f35b2a1c05
BLAKE2b-256 ee1ed93e37f2c8fbe59c228c0d2a9a7d8b5f75f3b185273430ef0b33ff8550f0

See more details on using hashes here.

Provenance

File details

Details for the file scandir-1.9.0-cp34-cp34m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.9.0-cp34-cp34m-win_amd64.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: CPython 3.4m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.9.0-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 1444134990356c81d12f30e4b311379acfbbcd03e0bab591de2696a3b126d58e
MD5 250511c0c3c2acd6ce40e7b7e02f72bc
BLAKE2b-256 cb408eb5af0929dcb4c0b71c6d49d816376746ead0b713ae6914f079be5f6c9a

See more details on using hashes here.

Provenance

File details

Details for the file scandir-1.9.0-cp34-cp34m-win32.whl.

File metadata

  • Download URL: scandir-1.9.0-cp34-cp34m-win32.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: CPython 3.4m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.9.0-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 346619f72eb0ddc4cf355ceffd225fa52506c92a2ff05318cfabd02a144e7c4e
MD5 44e47f746f39c8fa340a54ee48346d92
BLAKE2b-256 c06f45b0c805f2e98c76ca73d5c827566677910aad03fc01aca9cc3c4de1cefa

See more details on using hashes here.

Provenance

File details

Details for the file scandir-1.9.0-cp27-cp27m-win_amd64.whl.

File metadata

  • Download URL: scandir-1.9.0-cp27-cp27m-win_amd64.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: CPython 2.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.9.0-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 1b5c314e39f596875e5a95dd81af03730b338c277c54a454226978d5ba95dbb6
MD5 6449951c370cc2025d713d3f75394b2e
BLAKE2b-256 43dbfb071a52252d210d0c30843690b0e098ba75e67aff4b7eac27639a84ade7

See more details on using hashes here.

Provenance

File details

Details for the file scandir-1.9.0-cp27-cp27m-win32.whl.

File metadata

  • Download URL: scandir-1.9.0-cp27-cp27m-win32.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: CPython 2.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/38.5.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/2.7.14

File hashes

Hashes for scandir-1.9.0-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 f5c71e29b4e2af7ccdc03a020c626ede51da471173b4a6ad1e904f2b2e04b4bd
MD5 a9c09f99fe92be4b683bbdbf428f7d23
BLAKE2b-256 9ea556b4dec02b16bb720cac9872fccd63b61a815b70633ef15bfe3ea5ce4488

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page