Local version of scraperwiki scraperlibs
Project description
Local ScraperWiki Python Library
====
This library aims to be a drop-in replacement for the Python `scraperwiki` library
for use locally. That is, functions will work the same way, and data will go
into a local SQLite database; a targeted bombing of ScraperWiki's servers
will not stop this local library from working, unless you happen to be running
it on one of ScraperWiki's servers.
## Installing
This will soon be in PyPI, but for now you can just install from the git repository.
## Documentation
Read the standard ScraperWiki Python library's [documentation](https://scraperwiki.com/docs/python/python_help_documentation/),
then look below for some quirks about the local version.
## Quirks
The local library aims to be a drop-in replacement.
In reality, the local version sometimes works better,
though not all of the features have been implemented.
## Differences
### Datastore differences
The local `scraperwiki.sqlite` is powered by
[DumpTruck](http://dumptruck.io), so some things
work a bit differently.
Data is stored to a local sqlite database named `scraperwiki.sqlite`.
Bizarre table and column names are supported.
`scraperwiki.sqlite.execute` will return an empty list of keys on an
empty select statement result.
`scraperwiki.sqlite.attach` downloads the whole datastore from ScraperWiki, the first time it runs; it then uses the cached database
### Other Differences
## Status of implementation
In general, features that have not been implemented raise a `NotImplementedError`.
### Datastore
`scraperwiki.sqlite` is missing the following features.
* All of the `verbose` keyword arguments (These control what is printed on the ScraperWiki code editor)
### Geo
The UK geocoding helpers (`scraperwiki.geo`) documented on scraperwiki.com have been implemented. They partially depend on scraperwiki.com being available.
<!-- They have also been released as a separate library (ukgeo). Not true yet. -->
### Utils
`scraperwiki.utils` is implemented, as well as the following functions.
* `scraperwiki.log`
* `scraperwiki.scrape`
* `scraperwiki.pdftoxml`
* `scraperwiki.swimport`
### Deprecated
These submodules are deprecated and thus will not be implemented.
* `scraperwiki.apiwrapper`
* `scraperwiki.datastore`
* `scraperwiki.jsqlite`
* `scraperwiki.metadata`
* `scraperwiki.newsql`
### Development
Run tests with `./runtests`; this small wrapper cleans up after itself.
### Specs
Here are some ScraperWiki scrapers that demonstrate the non-local library's quirks.
https://scraperwiki.com/scrapers/scraperwiki_local/
https://scraperwiki.com/scrapers/cast/
https://scraperwiki.com/scrapers/things_happen_when_you_do_not_commit/
https://scraperwiki.com/scrapers/what_does_show_tables_return/
https://scraperwiki.com/scrapers/on_conflict/
https://scraperwiki.com/scrapers/spaces_in_table_names/
https://scraperwiki.com/scrapers/spaces_in_table_names_1/
====
This library aims to be a drop-in replacement for the Python `scraperwiki` library
for use locally. That is, functions will work the same way, and data will go
into a local SQLite database; a targeted bombing of ScraperWiki's servers
will not stop this local library from working, unless you happen to be running
it on one of ScraperWiki's servers.
## Installing
This will soon be in PyPI, but for now you can just install from the git repository.
## Documentation
Read the standard ScraperWiki Python library's [documentation](https://scraperwiki.com/docs/python/python_help_documentation/),
then look below for some quirks about the local version.
## Quirks
The local library aims to be a drop-in replacement.
In reality, the local version sometimes works better,
though not all of the features have been implemented.
## Differences
### Datastore differences
The local `scraperwiki.sqlite` is powered by
[DumpTruck](http://dumptruck.io), so some things
work a bit differently.
Data is stored to a local sqlite database named `scraperwiki.sqlite`.
Bizarre table and column names are supported.
`scraperwiki.sqlite.execute` will return an empty list of keys on an
empty select statement result.
`scraperwiki.sqlite.attach` downloads the whole datastore from ScraperWiki, the first time it runs; it then uses the cached database
### Other Differences
## Status of implementation
In general, features that have not been implemented raise a `NotImplementedError`.
### Datastore
`scraperwiki.sqlite` is missing the following features.
* All of the `verbose` keyword arguments (These control what is printed on the ScraperWiki code editor)
### Geo
The UK geocoding helpers (`scraperwiki.geo`) documented on scraperwiki.com have been implemented. They partially depend on scraperwiki.com being available.
<!-- They have also been released as a separate library (ukgeo). Not true yet. -->
### Utils
`scraperwiki.utils` is implemented, as well as the following functions.
* `scraperwiki.log`
* `scraperwiki.scrape`
* `scraperwiki.pdftoxml`
* `scraperwiki.swimport`
### Deprecated
These submodules are deprecated and thus will not be implemented.
* `scraperwiki.apiwrapper`
* `scraperwiki.datastore`
* `scraperwiki.jsqlite`
* `scraperwiki.metadata`
* `scraperwiki.newsql`
### Development
Run tests with `./runtests`; this small wrapper cleans up after itself.
### Specs
Here are some ScraperWiki scrapers that demonstrate the non-local library's quirks.
https://scraperwiki.com/scrapers/scraperwiki_local/
https://scraperwiki.com/scrapers/cast/
https://scraperwiki.com/scrapers/things_happen_when_you_do_not_commit/
https://scraperwiki.com/scrapers/what_does_show_tables_return/
https://scraperwiki.com/scrapers/on_conflict/
https://scraperwiki.com/scrapers/spaces_in_table_names/
https://scraperwiki.com/scrapers/spaces_in_table_names_1/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scraperwiki_local-0.1.0.tar.gz
(14.9 kB
view details)
File details
Details for the file scraperwiki_local-0.1.0.tar.gz
.
File metadata
- Download URL: scraperwiki_local-0.1.0.tar.gz
- Upload date:
- Size: 14.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95ee0e52e3e921ceb8376cf6888be8bfcb2906d6272770f9c5637b04fc23ff6c |
|
MD5 | 0b1c973e52f8bfa197f2d0f114b1c232 |
|
BLAKE2b-256 | 559b6c85d265ada1db175e2805f58a230b06b00e61ffca4912a2c443418a4cbf |