Web Scraping Framework
Project description
## IOWeb Framework
Python framework to build web crawlers.
What we have at the moment:
- system designed to run large number network threads (like 100 or 500) on
on CPU core
- built-in feature to combine things in chunks and then doing something with
chunks (like mongodb bulk write)
asynchronous things are powered by gevent
network requests are handled with urllib3
urllib3 monkey-patched to extract cert details
urllib3 monkey-patched to not do domain resolving if domain IP has been provided
built-in stat module to count events, built-in logging into influxdb
retrying on errors
no tests
no documentation
I am using ioweb to do bulk web scraping like crawling 500M pages in few days.
## Places to talk
[t.me/grablab](https://t.me/grablab) - English chat about web scraping
[t.me/grablab_ru](https://t.me/grablab_ru) - Russian chat about web scraping
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ioweb-0.0.10.tar.gz
.
File metadata
- Download URL: ioweb-0.0.10.tar.gz
- Upload date:
- Size: 19.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/2.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62afdd1bd702d72e49d0ecc21dce8e48a418e72d72f62b6fec2ed01f82936d11 |
|
MD5 | 6c8293d28a1e6012c47020620a2d583f |
|
BLAKE2b-256 | bd32ba551027258d284dbacbd5f836b5c7d47af1e0b0bd41631620ca0333c8bd |