Skip to main content

Super simple file reader.

Project description

vsifile


Documentation: https://vincentsarago.github.io/vsifile/

Source Code: https://github.com/vincentsarago/vsifile


Description

Experiment using Rasterio/GDAL Python file opener VSI plugin https://github.com/rasterio/rasterio/pull/2898/files

Future version of rasterio will accept an custom dataset opener:

opener : callable, optional
        A custom dataset opener which can serve GDAL's virtual
        filesystem machinery via Python file-like objects. The
        underlying file-like object is obtained by calling *opener* with
        (*fp*, *mode*) or (*fp*, *mode* + "b") depending on the format
        driver's native mode. *opener* must return a Python file-like
        object that provides read, seek, tell, and close methods.

ref: https://github.com/rasterio/rasterio/blob/d966440c06f3324aca1fa761d490cc780a9f619c/rasterio/__init__.py#L185-L191

Install

You can install vsifile using pip

python -m pip install -U pip
python -m pip install -U vsifile

or install from source:

git clone https://github.com/vincentsarago/vsifile.git
cd vsifile
python -m pip install -U pip
python -m pip install -e .

Usage

from vsifile import VSIFile, FileReader

src_path = "tests/fixture.cog.tif"

with VSIFile(src_path, "rb") as f:
    assert isinstance(f, FileReader)
    assert hash(f)
    assert "FileReader" in str(f)

    assert not f.closed
    assert f.header_cache
    assert len(f.header) == 32768
    assert f.tell() == 0
    assert f.seekable

    b = f.read(100)
    assert len(b) == 100
    assert f.header[0:100] == b
    assert f.tell() == 100

    _ = f.seek(0)
    assert f.tell() == 0

    _ = f.seek(40000)
    assert f.tell() == 40000

    b = f.read(100)
    assert f.tell() == 40100

    # fetch the same block (should be from LRU cache)
    _ = f.seek(40000)
    b_cache = f.read(100)
    assert f.tell() == 40100
    assert b_cache == b

    b = f.get_byte_ranges([100, 200], [10, 20])
    assert len(b) == 2
    assert len(b[0]) == 10
    assert len(b[1]) == 20
    assert f.tell() == 220

With Rasterio

import rasterio
from vsifile.rasterio import opener

with rasterio.open("tests/fixtures/cog.tif",  opener=opener) as src:
    ...

Caches Configuration

Header Cache

vsifile uses DiskCache to create a persistent File Header cache (TTL: Time To Live cache). By default the cache will be cleaned up when closing the file handle, you can change this behaviour by setting VSIFILE_CACHE_DIRECTORY="{your temp directory}" environment variable.

Settings:

  • VSIFILE_CACHE_DIRECTORY: Diskcache directory (defaults to None)
  • VSIFILE_CACHE_HEADERS_TTL: Time to Live of each object in the cache, in seconds (defaults to 300)
  • VSIFILE_CACHE_HEADERS_MAXSIZE: Maximum size of the cache, in Bytes (defaults to 5120000000)

Block Cache

vsifile has a second layer of cache for the blocks (non-header read) based on cachetools.

Settings:

  • VSIFILE_CACHE_BLOCKS_TTL: Time to Live of each object in the cache, in seconds (defaults to 300)
  • VSIFILE_CACHE_BLOCKS_MAXSIZE: Maximum size of the cache, in number of items (defaults to 512)

Note: you can disable cache by setting: VSIFILE_CACHE_DISABLE=TRUE

Other Configurations

  • VSIFILE_INGESTED_BYTES_AT_OPEN: Bytes ingested when opening a file (header) (defaults to 32768)

Contribution & Development

See CONTRIBUTING.md

Changes

See CHANGES.md.

License

See LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vsifile-0.1.0.tar.gz (9.1 kB view details)

Uploaded Source

File details

Details for the file vsifile-0.1.0.tar.gz.

File metadata

  • Download URL: vsifile-0.1.0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for vsifile-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cc59a206d1c355e8c5d9055bfb9fcc66aa4e4f92467eb206bee398fa17c48af0
MD5 df22bd133b23340c4a15c30410c7e3b1
BLAKE2b-256 b5971ee057b72b260f65c3b687985da94ac646b852a0e5bebd6588b42eb75d96

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page