Skip to main content

Fast sampling from large images

Project description

GitlabCIPipeline GitlabCICoverage Pypi Downloads

Fast random access to small regions in large images.

Random access is amortized by converting images into an efficient backend format (current backends include cloud-optimized geotiffs (cog) or numpy array files (npy)). If images are already in COG format, then no conversion is needed.

The ndsampler module was built with detection, segmentation, and classification tasks in mind, but it is not limited to these use cases.

The basic idea is to ensure your data is in MS-coco format, and then the CocoSampler class will let you sample positive and negative regions.

For classification tasks the MS-COCO data could just be that every image has an annotation that takes up the entire image.

Features

  • CocoDataset for managing and manipulating annotated image datasets

  • Amortized O(1) sampling of N-dimension space-time data (wrt to constant window size) (e.g. images and video).

  • Hierarchical or mutually exclusive category management.

  • Random negative window sampling.

  • Coverage-based positive sampling.

  • Dynamic toydata generator.

Also installs the kwcoco package and CLI tool.

Example

This example shows how you can efficiently load subregions from images.

>>> # Imagine you have some images
>>> import kwimage
>>> image_paths = [
>>>     kwimage.grab_test_image_fpath('astro'),
>>>     kwimage.grab_test_image_fpath('carl'),
>>>     kwimage.grab_test_image_fpath('airport'),
>>> ]  # xdoc: +IGNORE_WANT
['~/.cache/kwimage/demodata/KXhKM72.png',
 '~/.cache/kwimage/demodata/flTHWFD.png',
 '~/.cache/kwimage/demodata/Airport.jpg']
>>> # And you want to randomly load subregions of them in O(1) time
>>> import ndsampler
>>> # First make a COCO dataset that refers to your images (and possibly annotations)
>>> dataset = {
>>>     'images': [{'id': i, 'file_name': fpath} for i, fpath in enumerate(image_paths)],
>>>     'annotations': [],
>>>     'categories': [],
>>> }
>>> coco_dset = ndsampler.CocoDataset(dataset)
>>> print(coco_dset)
<CocoDataset(tag=None, n_anns=0, n_imgs=3, n_cats=0)>
>>> # Now pass the dataset to a sampler and tell it where it can store temporary files
>>> workdir = ub.ensure_app_cache_dir('ndsampler/demo')
>>> sampler = ndsampler.CocoSampler(coco_dset, workdir=workdir)
>>> # Now you can load arbirary samples by specifing a target dictionary
>>> # with an image_id (gid) center location (cx, cy) and width, height.
>>> target = {'gid': 0, 'cx': 200, 'cy': 200, 'width': 100, 'height': 100}
>>> sample = sampler.load_sample(target)
>>> # The sample contains the image data, any visible annotations, a reference
>>> # to the original target, and params of the transform used to sample this
>>> # patch
>>> print(sorted(sample.keys()))
['annots', 'im', 'params', 'tr']
>>> im = sample['im']
>>> print(im.shape)
(100, 100, 3)
>>> # The load sample function is at the core of what ndsampler does
>>> # There are other helper functions like load_positive / load_negative
>>> # which deal with annotations. See those for more details.
>>> # For random negative sampling see coco_regions.

TODO

  • [ ] Currently only supports image-based detection tasks, but not much work is needed to extend to video. The code was originally based on sampling code for video, so ndimensions is builtin to most places in the code. However, there are currently no test cases that demonstrate that this library does work with video. So we should (a) port the video toydata code from irharn to test ndcases and (b) fix the code to work for both still images and video where things break.

  • [ ] Currently we are good at loading many small objects in 2d images. However, we are bad at loading images with one single large object that needs to be downsampled (e.g. loading an entire 1024x1024 image and downsampling it to 224x224). We should find a way to mitigate this using pyramid overviews in the backend COG files.

NOTES

There is a GDAL backend for FramesSampler

Installing gdal is a pain though.

https://gist.github.com/cspanring/5680334

Using conda is relatively simple

conda install gdal

# Test that this works
python -c "from osgeo import gdal; print(gdal)"

Also possible to use system packages

# References:
# https://gis.stackexchange.com/questions/28966/python-gdal-package-missing-header-file-when-installing-via-pip
# https://gist.github.com/cspanring/5680334


# Install GDAL system libs
sudo apt install libgdal-dev

GDAL_VERSION=`gdal-config --version`
echo "GDAL_VERSION = $GDAL_VERSION"
pip install --global-option=build_ext --global-option="-I/usr/include/gdal" GDAL==$GDAL_VERSION


# Test that this works
python -c "from osgeo import gdal; print(gdal)"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ndsampler-0.5.6.tar.gz (110.9 kB view details)

Uploaded Source

Built Distribution

ndsampler-0.5.6-py2.py3-none-any.whl (117.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file ndsampler-0.5.6.tar.gz.

File metadata

  • Download URL: ndsampler-0.5.6.tar.gz
  • Upload date:
  • Size: 110.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for ndsampler-0.5.6.tar.gz
Algorithm Hash digest
SHA256 f4055eb75a2ef997d198ef721a798cc04b16c834538ed4dd0d3adc5f83861a67
MD5 5e9a2b318aef676e19892a11c95c3053
BLAKE2b-256 4f4a66d3db46fd8ea9620086538631f27ac19bb951d8b27b82807f5ac48b3367

See more details on using hashes here.

File details

Details for the file ndsampler-0.5.6-py2.py3-none-any.whl.

File metadata

  • Download URL: ndsampler-0.5.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 117.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for ndsampler-0.5.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c1c3f623c2a6bcdced3a8f93b847923a24fd7f6b9595d3510d0aa3fc96d5ee57
MD5 64534defa54293fec215796e4bc33f6a
BLAKE2b-256 325bedbfce3d5dcfb96a70ee68f80fa83aecd5dea84c54b9d70a05c69ff8bdc5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page