UNKNOWN
Project description
Exporters provide a flexible way to export data from multiple sources to multiple destinations, allowing filtering and transforming the data.
This Github repository is used as a central repository.
Getting Started
Install exporters
First of all, we recommend to create a virtualenv:
virtualenv exporters source exporters/bin/activate
Exporters can be cloned from its Github repository:
git clone git@github.com:scrapinghub/exporters.git
Then, we install the requirements:
cd exporters pip install -r requirements.txt
Creating a configuration
- Then, we can create our first configuration object and store it in a file called config.json.
This configuration will read from an s3 bucket and store it in our filesystem, exporting only the records which have United States in field country:
{
"reader": {
"name": "exporters.readers.s3_reader.S3Reader",
"options": {
"bucket": "YOUR_BUCKET",
"aws_access_key_id": "YOUR_ACCESS_KEY",
"aws_secret_access_key": "YOUR_SECRET_KEY",
"prefix": "exporters-tutorial/sample-dataset"
}
},
"filter": {
"name": "exporters.filters.key_value_regex_filter.KeyValueRegexFilter",
"options": {
"keys": [
{"name": "country", "value": "United States"}
]
}
},
"writer":{
"name": "exporters.writers.fs_writer.FSWriter",
"options": {
"filebase": "/tmp/output/"
}
}
}
Export with script
We can use the provided script to run this export:
python bin/export.py --config config.json
Use it as a library
The export can be run using exporters as a library:
from exporters.export_managers.basic_exporter import BasicExporter
exporter = BasicExporter.from_file_configuration('config.json')
exporter.export()
Resuming an export job
Let’s suppose we have a pickle file with a previously failed export job. If we want to resume it we must run the export script:
python bin/export.py --resume pickle://pickle-file.pickle
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for exporters-0.4.12-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82ac7e9ece1a46843e179f723bfac8832859b85bbc12f84bf9f013840a6f86fc |
|
MD5 | b24e3bfa46429610c2133c991f69747f |
|
BLAKE2b-256 | 10c6b7712e44b7473439b034ef61a767bcc9085c3d57c18a82ac52ce4128fee8 |