A utility library for working with Table Schema in Python
Project description
dataflows-aws
Dataflows's processors to work with AWS
Features
dump_to_s3
processorchange_acl_on_s3
processor
Contents
Getting Started
Installation
The package use semantic versioning. It means that major versions could include breaking changes. It's recommended to specify package
version range in your setup/requirements
file e.g. package>=1.0,<2.0
.
$ pip install dataflows-aws
Examples
These processors have to be used as a part of data flow. For example:
flow = Flow(
load('data/data.csv'),
dump_to_s3(
bucket=bucket,
acl='private',
path='my/datapackage',
endpoint_url=os.environ['S3_ENDPOINT_URL'],
),
)
flow.process()
Documentation
dump_to_s3
Saves the DataPackage to AWS S3.
Parameters
bucket
- Name of the bucket where DataPackage will be stored (should already be created!)acl
- ACL to provide the uploaded files. Default is 'public-read' (see boto3 docs for more info).path
- Path (key/prefix) to the DataPackage. May contain format string available fordatapackage.json
Eg:my/example/path/{owner}/{name}/{version}
content_type
- content type to use when storing files in S3. Defaults to text/plain (usual S3 default is binary/octet-stream but we prefer text/plain).endpoint_url
- api endpoint to allow using S3 compatible services (e.g. 'https://ams3.digitaloceanspaces.com')
change_acl_on_s3
Changes ACL of object in given Bucket with given path aka prefix.
Parameters
bucket
- Name of the bucket where objects are storedacl
- Available options'private'|'public-read'|'public-read-write'|'authenticated-read'|'aws-exec-read'|'bucket-owner-read'|'bucket-owner-full-control'
path
- Path (key/prefix) to the DataPackage.endpoint_url
- api endpoint to allow using S3 compatible services (e.g. 'https://ams3.digitaloceanspaces.com')
Contributing
The project follows the Open Knowledge International coding standards.
The recommended way to get started is to create and activate a project virtual environment. To install package and development dependencies into your active environment:
$ make install
To run tests with linting and coverage:
$ make test
For linting, pylama
(configured in pylama.ini
) is used. At this stage it's already
installed into your environment and could be used separately with more fine-grained control
as described in documentation - https://pylama.readthedocs.io/en/latest/.
For example to sort results by error type:
$ pylama --sort <path>
For testing, tox
(configured in tox.ini
) is used.
It's already installed into your environment and could be used separately with more fine-grained control as described in documentation - https://testrun.org/tox/latest/.
For example to check subset of tests against Python 2 environment with increased verbosity.
All positional arguments and options after --
will be passed to py.test
:
tox -e py37 -- -v tests/<path>
Under the hood tox
uses pytest
(configured in pytest.ini
), coverage
and mock
packages. These packages are available only in tox envionments.
Changelog
Here described only breaking and the most important changes. The full changelog and documentation for all released versions can be found in the nicely formatted commit history.
v0.x
- an initial processors implementation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dataflows_aws-0.2.4-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b151cffbe88feeb3bb2fab75c9f12a69d7a9718d0b493475332f052c2b396a7 |
|
MD5 | 9c5daf923d7817171ec8adcc4f891927 |
|
BLAKE2b-256 | d38771adc1fad3f2d10cdce95cf100f4fe54cc43eb8d2b4233e36f9cf375a393 |