Skip to main content

Fast sampling from large images

Project description

GitlabCIPipeline GitlabCICoverage Pypi Downloads

Read the docs

https://ndsampler.readthedocs.io

Gitlab (main)

https://gitlab.kitware.com/computer-vision/ndsampler

Github (mirror)

https://github.com/Kitware/ndsampler

Pypi

https://pypi-hypernode.com/project/ndsampler

The main webpage for this project is: https://gitlab.kitware.com/computer-vision/ndsampler

Fast random access to small regions in large images.

Random access is amortized by converting images into an efficient backend format (current backends include cloud-optimized geotiffs (cog) or numpy array files (npy)). If images are already in COG format, then no conversion is needed.

The ndsampler module was built with detection, segmentation, and classification tasks in mind, but it is not limited to these use cases.

The basic idea is to ensure your data is in MS-coco format, and then the CocoSampler class will let you sample positive and negative regions.

For classification tasks the MS-COCO data could just be that every image has an annotation that takes up the entire image.

Installation

The ndsampler. package can be installed via pip:

pip install ndsampler

Note that ndsampler depends on kwimage, where there is a known compatibility issue between opencv-python and opencv-python-headless. Please ensure that one or the other (but not both) are installed as well:

pip install opencv-python-headless

# OR

pip install opencv-python

Lastly, to fully leverage ndsampler’s features GDAL must be installed (although much of ndsampler can work without it). Kitware has a pypi index that hosts GDAL wheels for linux systems, but other systems will need to find some way of installing gdal (conda is safe choice).

pip install --find-links https://girder.github.io/large_image_wheels GDAL

Features

  • CocoDataset for managing and manipulating annotated image datasets

  • Amortized O(1) sampling of N-dimension space-time data (wrt to constant window size) (e.g. images and video).

  • Hierarchical or mutually exclusive category management.

  • Random negative window sampling.

  • Coverage-based positive sampling.

  • Dynamic toydata generator.

Also installs the kwcoco package and CLI tool.

Usage

The main pattern of usage is:

  1. Use kwcoco to load a json-based COCO dataset (or create a kwcoco.CocoDataset programatically).

  2. Pass that dataset to an ndsampler.CocoSampler object, and that effectively wraps the json structure that holds your images and annotations and it allows you to sample patches from those images efficiently.

  3. You can either manually specify image + region or you can specify an annotation id, in which case it loads the region corresponding to the annotation.

Example

This example shows how you can efficiently load subregions from images.

>>> # Imagine you have some images
>>> import kwimage
>>> image_paths = [
>>>     kwimage.grab_test_image_fpath('astro'),
>>>     kwimage.grab_test_image_fpath('carl'),
>>>     kwimage.grab_test_image_fpath('airport'),
>>> ]  # xdoc: +IGNORE_WANT
['~/.cache/kwimage/demodata/KXhKM72.png',
 '~/.cache/kwimage/demodata/flTHWFD.png',
 '~/.cache/kwimage/demodata/Airport.jpg']
>>> # And you want to randomly load subregions of them in O(1) time
>>> import ndsampler
>>> # First make a COCO dataset that refers to your images (and possibly annotations)
>>> dataset = {
>>>     'images': [{'id': i, 'file_name': fpath} for i, fpath in enumerate(image_paths)],
>>>     'annotations': [],
>>>     'categories': [],
>>> }
>>> coco_dset = ndsampler.CocoDataset(dataset)
>>> print(coco_dset)
<CocoDataset(tag=None, n_anns=0, n_imgs=3, n_cats=0)>
>>> # Now pass the dataset to a sampler and tell it where it can store temporary files
>>> workdir = ub.ensure_app_cache_dir('ndsampler/demo')
>>> sampler = ndsampler.CocoSampler(coco_dset, workdir=workdir)
>>> # Now you can load arbirary samples by specifing a target dictionary
>>> # with an image_id (gid) center location (cx, cy) and width, height.
>>> target = {'gid': 0, 'cx': 200, 'cy': 200, 'width': 100, 'height': 100}
>>> sample = sampler.load_sample(target)
>>> # The sample contains the image data, any visible annotations, a reference
>>> # to the original target, and params of the transform used to sample this
>>> # patch
>>> print(sorted(sample.keys()))
['annots', 'im', 'params', 'tr']
>>> im = sample['im']
>>> print(im.shape)
(100, 100, 3)
>>> # The load sample function is at the core of what ndsampler does
>>> # There are other helper functions like load_positive / load_negative
>>> # which deal with annotations. See those for more details.
>>> # For random negative sampling see coco_regions.

A Note On COGs

COGs (cloud optimized geotiffs) are the backbone efficient sampling in the ndsampler library.

To preform deep learning efficiently you need to be able to effectively randomly sample cropped regions from images, so when ndsampler.Sampler (more acurately the FramesSampler belonging to the base Sampler object) is in “cog” mode, it caches all images larger than 512x512 in cog format.

I’ve noticed a significant speedups even for “small” 1024x1024 images. I haven’t made effective use of the overviews feature yet, but in the future I plan to, as I want to allow ndsampler to sample in scale as well as in space.

Its possible to obtain this speedup with the “npy” backend, which supports true random sampling, but this is an uncompressed format, which can require a large amount of disk space. Using the “None” backend, means that loading a small windowed region requires loading the entire image first (which can be ok for some applications).

Using COGs requires that GDAL is installed. Installing GDAL is a pain though.

https://gist.github.com/cspanring/5680334

Using conda is relatively simple

conda install gdal

# Test that this works
python -c "from osgeo import gdal; print(gdal)"

Also possible to use system packages

# References:
# https://gis.stackexchange.com/questions/28966/python-gdal-package-missing-header-file-when-installing-via-pip
# https://gist.github.com/cspanring/5680334


# Install GDAL system libs
sudo apt install libgdal-dev

GDAL_VERSION=`gdal-config --version`
echo "GDAL_VERSION = $GDAL_VERSION"
pip install --global-option=build_ext --global-option="-I/usr/include/gdal" GDAL==$GDAL_VERSION


# Test that this works
python -c "from osgeo import gdal; print(gdal)"

Kitware also has a pypi index that hosts GDAL wheels for linux systems:

pip install --find-links https://girder.github.io/large_image_wheels GDAL

TODO

  • [ ] Currently only supports image-based detection tasks, but not much work is needed to extend to video. The code was originally based on sampling code for video, so ndimensions is builtin to most places in the code. However, there are currently no test cases that demonstrate that this library does work with video. So we should (a) port the video toydata code from irharn to test ndcases and (b) fix the code to work for both still images and video where things break.

  • [ ] Currently we are good at loading many small objects in 2d images. However, we are bad at loading images with one single large object that needs to be downsampled (e.g. loading an entire 1024x1024 image and downsampling it to 224x224). We should find a way to mitigate this using pyramid overviews in the backend COG files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ndsampler-0.6.4.tar.gz (88.8 kB view details)

Uploaded Source

Built Distributions

ndsampler-0.6.4-py3-none-any.whl (91.3 kB view details)

Uploaded Python 3

ndsampler-0.6.4-py2.py3-none-any.whl (91.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file ndsampler-0.6.4.tar.gz.

File metadata

  • Download URL: ndsampler-0.6.4.tar.gz
  • Upload date:
  • Size: 88.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.7

File hashes

Hashes for ndsampler-0.6.4.tar.gz
Algorithm Hash digest
SHA256 f4d64a2780188e6e301a0da1056b1bab6d4cac8085af63c3e1dc50de33a2ba98
MD5 46d85180e742a1267411f86beb8c3357
BLAKE2b-256 662a1b77599c66ad4e8585e7c68ac92c0fa580e9eb491924ecd2d13cda352888

See more details on using hashes here.

File details

Details for the file ndsampler-0.6.4-py3-none-any.whl.

File metadata

  • Download URL: ndsampler-0.6.4-py3-none-any.whl
  • Upload date:
  • Size: 91.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.7

File hashes

Hashes for ndsampler-0.6.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d34142d5755bf1a383fbe464548f1413a43f3e874ef53d008e84f1290e55ed08
MD5 089e8134dce917c5f9ba0b205abd8c72
BLAKE2b-256 a9eead7a5d59efc27b2dbe3c4dc9151841469a17e86ef06c5873e9860bfb4279

See more details on using hashes here.

File details

Details for the file ndsampler-0.6.4-py2.py3-none-any.whl.

File metadata

  • Download URL: ndsampler-0.6.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 91.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.7

File hashes

Hashes for ndsampler-0.6.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 5fc9920e2496d83fb1673544c76e7eb2ba017cd5eaba64d272672091206291c6
MD5 809b0fc7e773de57ac4931bfddbd6574
BLAKE2b-256 b9d4ee8bc9f349ee7e7eb06c88fc4721ab5ed092e0e62c379da598aaefec015e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page