Skip to main content

scandir, a better directory iterator and faster os.walk()

Project description

scandir on PyPI (Python Package Index) Travis CI tests (Linux) Appveyor tests (Windows)

scandir() is a directory iteration function like os.listdir(), except that instead of returning a list of bare filenames, it yields DirEntry objects that include file type and stat information along with the name. Using scandir() increases the speed of os.walk() by 2-20 times (depending on the platform and file system) by avoiding unnecessary calls to os.stat() in most cases.

Now included in a Python near you!

scandir has been included in the Python 3.5 standard library as os.scandir(), and the related performance improvements to os.walk() have also been included. So if you’re lucky enough to be using Python 3.5 (release date September 13, 2015) you get the benefit immediately, otherwise just download this module from PyPI, install it with pip install scandir, and then do something like this in your code:

# Use the built-in version of scandir/walk if possible, otherwise
# use the scandir module version
try:
    from os import scandir, walk
except ImportError:
    from scandir import scandir, walk

PEP 471, which is the PEP that proposes including scandir in the Python standard library, was accepted in July 2014 by Victor Stinner, the BDFL-delegate for the PEP.

This scandir module is intended to work on Python 2.6+ and Python 3.2+ (and it has been tested on those versions).

Background

Python’s built-in os.walk() is significantly slower than it needs to be, because – in addition to calling listdir() on each directory – it calls stat() on each file to determine whether the filename is a directory or not. But both FindFirstFile / FindNextFile on Windows and readdir on Linux/OS X already tell you whether the files returned are directories or not, so no further stat system calls are needed. In short, you can reduce the number of system calls from about 2N to N, where N is the total number of files and directories in the tree.

In practice, removing all those extra system calls makes os.walk() about 7-50 times as fast on Windows, and about 3-10 times as fast on Linux and Mac OS X. So we’re not talking about micro-optimizations. See more benchmarks in the “Benchmarks” section below.

Somewhat relatedly, many people have also asked for a version of os.listdir() that yields filenames as it iterates instead of returning them as one big list. This improves memory efficiency for iterating very large directories.

So as well as a faster walk(), scandir adds a new scandir() function. They’re pretty easy to use, but see “The API” below for the full docs.

Benchmarks

Below are results showing how many times as fast scandir.walk() is than os.walk() on various systems, found by running benchmark.py with no arguments:

System version

Python version

Times as fast

Windows 7 64-bit

2.7.7 64-bit

10.4

Windows 7 64-bit SSD

2.7.7 64-bit

10.3

Windows 7 64-bit NFS

2.7.6 64-bit

36.8

Windows 7 64-bit SSD

3.4.1 64-bit

9.9

Windows 7 64-bit SSD

3.5.0 64-bit

9.5

CentOS 6.2 64-bit

2.6.6 64-bit

3.9

Ubuntu 14.04 64-bit

2.7.6 64-bit

5.8

Mac OS X 10.9.3

2.7.5 64-bit

3.8

All of the above tests were done using the fast C version of scandir (source code in _scandir.c).

Note that the gains are less than the above on smaller directories and greater on larger directories. This is why benchmark.py creates a test directory tree with a standardized size.

The API

walk()

The API for scandir.walk() is exactly the same as os.walk(), so just read the Python docs.

scandir()

The full docs for scandir() and the DirEntry objects it yields are available in the Python documentation here. But below is a brief summary as well.

scandir(path=’.’) -> iterator of DirEntry objects for given path

Like listdir, scandir calls the operating system’s directory iteration system calls to get the names of the files in the given path, but it’s different from listdir in two ways:

  • Instead of returning bare filename strings, it returns lightweight DirEntry objects that hold the filename string and provide simple methods that allow access to the additional data the operating system may have returned.

  • It returns a generator instead of a list, so that scandir acts as a true iterator instead of returning the full list immediately.

scandir() yields a DirEntry object for each file and sub-directory in path. Just like listdir, the '.' and '..' pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each DirEntry object has the following attributes and methods:

  • name: the entry’s filename, relative to the scandir path argument (corresponds to the return values of os.listdir)

  • path: the entry’s full path name (not necessarily an absolute path) – the equivalent of os.path.join(scandir_path, entry.name)

  • is_dir(*, follow_symlinks=True): similar to pathlib.Path.is_dir(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases; don’t follow symbolic links if follow_symlinks is False

  • is_file(*, follow_symlinks=True): similar to pathlib.Path.is_file(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases; don’t follow symbolic links if follow_symlinks is False

  • is_symlink(): similar to pathlib.Path.is_symlink(), but the return value is cached on the DirEntry object; doesn’t require a system call in most cases

  • stat(*, follow_symlinks=True): like os.stat(), but the return value is cached on the DirEntry object; does not require a system call on Windows (except for symlinks); don’t follow symbolic links (like os.lstat()) if follow_symlinks is False

  • inode(): return the inode number of the entry; the return value is cached on the DirEntry object

Here’s a very simple example of scandir() showing use of the DirEntry.name attribute and the DirEntry.is_dir() method:

def subdirs(path):
    """Yield directory names not starting with '.' under given path."""
    for entry in os.scandir(path):
        if not entry.name.startswith('.') and entry.is_dir():
            yield entry.name

This subdirs() function will be significantly faster with scandir than os.listdir() and os.path.isdir() on both Windows and POSIX systems, especially on medium-sized or large directories.

Further reading

  • The Python docs for scandir

  • PEP 471, the (now-accepted) Python Enhancement Proposal that proposed adding scandir to the standard library – a lot of details here, including rejected ideas and previous discussion

Flames, comments, bug reports

Please send flames, comments, and questions about scandir to Ben Hoyt:

http://benhoyt.com/

File bug reports for the version in the Python 3.5 standard library here, or file bug reports or feature requests for this module at the GitHub project page:

https://github.com/benhoyt/scandir

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scandir-1.6.tar.gz (29.7 kB view details)

Uploaded Source

Built Distributions

scandir-1.6-cp36-cp36m-win_amd64.whl (25.7 kB view details)

Uploaded CPython 3.6m Windows x86-64

scandir-1.6-cp36-cp36m-win32.whl (24.9 kB view details)

Uploaded CPython 3.6m Windows x86

scandir-1.6-cp35-cp35m-win_amd64.whl (25.7 kB view details)

Uploaded CPython 3.5m Windows x86-64

scandir-1.6-cp35-cp35m-win32.whl (24.9 kB view details)

Uploaded CPython 3.5m Windows x86

scandir-1.6-cp34-cp34m-win_amd64.whl (23.3 kB view details)

Uploaded CPython 3.4m Windows x86-64

scandir-1.6-cp34-cp34m-win32.whl (23.0 kB view details)

Uploaded CPython 3.4m Windows x86

scandir-1.6-cp33-cp33m-win_amd64.whl (23.4 kB view details)

Uploaded CPython 3.3m Windows x86-64

scandir-1.6-cp33-cp33m-win32.whl (23.0 kB view details)

Uploaded CPython 3.3m Windows x86

scandir-1.6-cp27-cp27m-win_amd64.whl (23.8 kB view details)

Uploaded CPython 2.7m Windows x86-64

scandir-1.6-cp27-cp27m-win32.whl (23.3 kB view details)

Uploaded CPython 2.7m Windows x86

File details

Details for the file scandir-1.6.tar.gz.

File metadata

  • Download URL: scandir-1.6.tar.gz
  • Upload date:
  • Size: 29.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for scandir-1.6.tar.gz
Algorithm Hash digest
SHA256 e0278a2d4bc6c0569aedbe66bf26c8ab5b2b08378b3289de49257f23ac624338
MD5 0180ddb97c96cbb2d4f25d2ae11c64ac
BLAKE2b-256 773f916f524f50ee65e3f465a280d2851bd63685250fddb3020c212b3977664d

See more details on using hashes here.

File details

Details for the file scandir-1.6-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for scandir-1.6-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 937d27e367af994afd3792904b794a82645ea9616dd336f5030e0b50e527eb57
MD5 d3a8f832d055207aee1a9f15e04ec867
BLAKE2b-256 a0eda5c8ba9d939611adfb909c647f04091549d54eab1b116bcf2431ff501713

See more details on using hashes here.

File details

Details for the file scandir-1.6-cp36-cp36m-win32.whl.

File metadata

File hashes

Hashes for scandir-1.6-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 9851e782da220073093da68b3451e3c33b10f84eca2aec17a24661c7c63357a2
MD5 6f4111b493e90104798e459349180cc8
BLAKE2b-256 a5b0bf8e5789e64834ae897880127f19c7efd15f5fbe4ea5250c3827cd888a0e

See more details on using hashes here.

File details

Details for the file scandir-1.6-cp35-cp35m-win_amd64.whl.

File metadata

File hashes

Hashes for scandir-1.6-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 a93b6cc872eeccdc91b4c1c1e510820bee17f79c9455064fb8d3b73b51e52024
MD5 f3a7775882e1fc58bf923d08edff7e2b
BLAKE2b-256 4de78c8274f66aa02c7d10ace8b66d629ee40509a25e7dc4a70f2ff2c2fa074c

See more details on using hashes here.

File details

Details for the file scandir-1.6-cp35-cp35m-win32.whl.

File metadata

File hashes

Hashes for scandir-1.6-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 8fe782abf9314f2733c09d2191c1b3047475218ddbae90052b5c0f1a4215d5e2
MD5 246d08e1fe6c80756a59b133eadbf35f
BLAKE2b-256 b6efc6d7419e4f4dcb848981b7f6d8315eda16ba6da695d5c8e05825c313b744

See more details on using hashes here.

File details

Details for the file scandir-1.6-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for scandir-1.6-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 8129fe7b9211d080457e0ff87397d85bb9be6ebb482b6be6ad9700059ac2e516
MD5 4107251dfa6c5216b9f4c7231fc50f23
BLAKE2b-256 c44ab035858ccfc0149ebe6097398386cb3764e5de5efa35b8e278c56ab1ae93

See more details on using hashes here.

File details

Details for the file scandir-1.6-cp34-cp34m-win32.whl.

File metadata

File hashes

Hashes for scandir-1.6-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 6db5aadb667bb709cc23921203e9c27f08225506a9b84b7ebe2b645dee47a4dd
MD5 382c448c73ac8f04494de8fcd378575b
BLAKE2b-256 c92a882991b0a295c3d31e38b2d06f53f5c70cec395ecf888a1adebe0181b7df

See more details on using hashes here.

File details

Details for the file scandir-1.6-cp33-cp33m-win_amd64.whl.

File metadata

File hashes

Hashes for scandir-1.6-cp33-cp33m-win_amd64.whl
Algorithm Hash digest
SHA256 f14476800cfdd6809d5130840f78ca3c08aa25544113e2b33a0b2fe914583d69
MD5 6c74841727effc3a9de31545b8df13e2
BLAKE2b-256 1d74c7c1c39ef76cc7b26517f32cc9ff09b125f4cecfb0c209d323ebc10f7f61

See more details on using hashes here.

File details

Details for the file scandir-1.6-cp33-cp33m-win32.whl.

File metadata

File hashes

Hashes for scandir-1.6-cp33-cp33m-win32.whl
Algorithm Hash digest
SHA256 2b28d118b372de8950f85b65d8ddfd43643f139a5b721281dd6532bed6b8321c
MD5 0e3a66c862e82adbdf420d987c7cd8d3
BLAKE2b-256 a840accb6524ca41dfeb872dc47a1d1cd000ea17b0dc91defa00f908a2e4f08a

See more details on using hashes here.

File details

Details for the file scandir-1.6-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for scandir-1.6-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 eb9d4a55bbeb0473a9c7d3ff81e12d44f0ad86daff48b02a95e2398c87ff1a00
MD5 c10274691cfddfa56d177d07226d015f
BLAKE2b-256 11a11d8ad784bb1755b74e4e1b6a0d44575444c9a0e3982a09e1af74e5791854

See more details on using hashes here.

File details

Details for the file scandir-1.6-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for scandir-1.6-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 913d0d04f3ea8f38a52a38e930a08deacd3643d71875a0751a5c01e006102998
MD5 369517d3cd6d189afaefe7149adecc92
BLAKE2b-256 adba72797d0d2cd7b9d2a2ca06bce94f59b8864dc737548c8169bbdf90be22ff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page