Download data from URLs quickly, with integrity
Project description
getm: Fast reads with integrity for data URLs
getm provides fast binary reads for HTTP URLs using multiprocessing and shared memory.
Data is downloaded in background processes and made availabe as references to shared memory. There are no buffer copies, but memory references must be released by the caller, which makes working with getm a bit different than typical Python IO streams. But still easy, and fast.
Python API methods accept a parameter, concurrency
, which controls the mode of operation of mget:
- Default
concurrency == 1
: Download data in a single background process, using a single HTTP request that is kept alive during the course of the download. concurrency > 1
: Up toconcurrency
HTTP range requests will be made concurrently, each in a separate background process.concurrency == None
: Data is read on the main process. In this mode, getm is a wrapper for requests.
Python API
import getm
# Readable stream:
with getm.urlopen(url) as fh:
data = fh.read(size)
data.release()
# Process data in parts:
for part in getm.iter_content(url, chunk_size=1024 * 1024):
my_chunk_processor(part)
part.release()
CLI
getm https://my-cool-url my-local-file
Testing
During tests, signed URLs are generated that point to data in S3 and GS buckets. The data is repopulated during each test.
Credentials
To run tests you must be properly credentialed. For S3 you must have access to the test bucket, and sufficient privliages to upload data and generate signed URLs. For GS you must have service account credentials privlidged with similar access to the GS bucket. The service account are made available by setting the environment variable
export GOOGLE_APPLICATION_CREDENTIALS=my-creds.json
Installation
pip install getm
Links
Project home page GitHub
Package distribution PyPI
Bugs
Please report bugs, issues, feature requests, etc. on GitHub.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file getm-0.0.1.tar.gz
.
File metadata
- Download URL: getm-0.0.1.tar.gz
- Upload date:
- Size: 19.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3add74305eccf0cff8785fc9480a6d0dd48dc167672e1867940c53da37a972e0 |
|
MD5 | fd7b54756574c37217e1ffc84277b835 |
|
BLAKE2b-256 | e12f115d217fe76b52e7286d368daa442085b839be88ad07366b85f4f328672d |