Web Scraping Framework
Project description
## IOWeb Framework
![pytest status](https://github.com/lorien/ioweb/workflows/pytest/badge.svg) ![pytype status](https://github.com/lorien/ioweb/workflows/pytype/badge.svg)
Python framework to build web crawlers.
Good things:
- system designed to run large number of network threads (like 100 or 500) on
single CPU core
- feature to combine things in chunks and then doing something with
chunks (like mongodb bulk write)
asynchronous network operations are powered by gevent
network requests are handled with urllib3
HTML is parsed with lxml
ability to do CSS/XPATh queries to DOM tree of downloaded HTML document
ability to extract cert details
ability to resolve particular domain to custom IP
stat module to count events
logging statistics to influxdb
retrying on network errors
Bad things:
not fully covered with tests
no documentation
## Feedback
[t.me/grablab](https://t.me/grablab) - English chat about web scraping
[t.me/grablab_ru](https://t.me/grablab_ru) - Russian chat about web scraping
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ioweb-0.0.23.tar.gz
.
File metadata
- Download URL: ioweb-0.0.23.tar.gz
- Upload date:
- Size: 26.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d2076bea52a3e9f0998c364a1a1a4edf14f240e3297bcfd43a6c4a6990d67c4 |
|
MD5 | 74e12335ac8eaebd5e707a15823d3cb9 |
|
BLAKE2b-256 | 18ece45f082e24c739740e9b9a2f2c7075595acd1a3da41bf85b63900deb1c75 |