Skip to main content

Windowed multiprocessing wrapper for rasterio

Project description

Parallel processing wrapper for rasterio

Build Status

Install

From pypi:

pip install rio-mucho --pre

From github (usually for a branch / dev):

pip install pip install git+ssh://git@github.com/mapbox/rio-mucho.git@<branch>

Development:

git clone git@github.com:mapbox/rio-mucho.git
cd rio-mucho
pip install -e .

Usage

with riomucho.RioMucho([{inputs}], {output}, {run function},
    windows={windows},
    global_args={global arguments},
    kwargs={kwargs to write}) as rios:

    rios.run({processes})

Arguments

inputs

An list of file paths to open and read.

output

What file to write to.

run_function

A function to be applied to each window chunk. This should have input arguments of:

  1. A data input, which can be one of:

  • A list of numpy arrays of shape (x,y,z), one for each file as specified in input file list mode="simple_read" [default]

  • A numpy array of shape ({n input files x n band count}, {window rows}, {window cols}) mode=array_read"

  • A list of open sources for reading mode="manual_read"

  1. A rasterio window tuple

  2. A rasterio window index (ij)

  3. A global arguments object that you can use to pass in global arguments

This should return:

  1. An output array of ({count}, {window rows}, {window cols}) shape, and of the correct data type for writing

def basic_run({data}, {window}, {ij}, {global args}):
    ## do something
    return {out}

Keyword arguments

windows={windows}

A list of rasterio (window, ij) tuples to operate on. [Default = src[0].block_windows()]

global_args={global arguments}

Since this is working in parallel, any other objects / values that you want to be accessible in the run_function. [Default = {}]

global_args = {
    'divide_value': 2
}

kwargs={keyword args}

The kwargs to pass to the output. [Default = srcs[0].kwargs

Example

import riomucho, rasterio, numpy

def basic_run(data, window, ij, g_args):
    ## do something
    out = np.array(
        [d[0] /= global_args['divide'] for d in data]
        )
    return out

# get windows from an input
with rasterio.open('/tmp/test_1.tif') as src:
    ## grabbing the windows as an example. Default behavior is identical.
    windows = [[window, ij] for ij, window in src.block_windows()]
    kwargs = src.meta
    # since we are only writing to 2 bands
    kwargs.update(count=2)

global_args = {
    'divide': 2
}

processes = 4

# run it
with riomucho.RioMucho(['input1.tif','input2.tif'], 'output.tif', basic_run,
    windows=windows,
    global_args=global_args,
    kwargs=kwargs) as rm:

    rm.run(processes)

Utility functions

`riomucho.utils.array_stack([array, array, array,…])

Given a list of ({depth}, {rows}, {cols}) numpy arrays, stack into a single (l{list length * each image depth}, {rows}, {cols}) array. This is useful for handling variation between rgb inputs of a single file, or separate files for each.

One RGB file

files = ['rgb.tif']
open_files = [rasterio.open(f) for f in files]
rgb = `riomucho.utils.array_stack([src.read() for src in open_files])

Separate RGB files

files = ['r.tif', 'g.tif', 'b.tif']
open_files = [rasterio.open(f) for f in files]
rgb = `riomucho.utils.array_stack([src.read() for src in open_files])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rio-mucho-0.0.1.tar.gz (4.3 kB view details)

Uploaded Source

File details

Details for the file rio-mucho-0.0.1.tar.gz.

File metadata

  • Download URL: rio-mucho-0.0.1.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for rio-mucho-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9efca675f1b1cb2b898dae836eede3fca106f8802b6681ceae9b0b6bcd322774
MD5 d48b340330cb4cc6e203ccb8c5262701
BLAKE2b-256 1313951b200616c832ec884b9e64e629913e3c7c114a4c9e8cdfe100a9c87485

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page