ioweb

Web Scraping Framework

These details have not been verified by PyPI

Project links

Project description

## IOWeb Framework

Python framework to build web crawlers.

Good things:

system designed to run large number of network threads (like 100 or 500) on

single CPU core

feature to combine things in chunks and then doing something with

chunks (like mongodb bulk write)

asynchronous network operations are powered by gevent

network requests are handled with urllib3

HTML is parsed with lxml

ability to do CSS/XPATh queries to DOM tree of downloaded HTML document

ability to extract cert details

ability to resolve particular domain to custom IP

stat module to count events

logging statistics to influxdb

retrying on network errors

Bad things:

not fully covered with tests

no documentation

## Feedback

[t.me/grablab](https://t.me/grablab) - English chat about web scraping

[t.me/grablab_ru](https://t.me/grablab_ru) - Russian chat about web scraping

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.29

Nov 19, 2020

0.0.28

Nov 13, 2020

0.0.24

May 7, 2020

0.0.23

May 3, 2020

0.0.22

Mar 31, 2020

0.0.21

Jan 4, 2020

0.0.20

Jan 4, 2020

This version

0.0.19

Dec 23, 2019

0.0.18

Dec 23, 2019

0.0.17

Dec 12, 2019

0.0.16

Dec 7, 2019

0.0.15

Dec 6, 2019

0.0.14

Dec 6, 2019

0.0.13

Dec 2, 2019

0.0.12

Nov 17, 2019

0.0.11

Nov 6, 2019

0.0.10

Nov 5, 2019

0.0.9

Nov 5, 2019

0.0.8

Nov 5, 2019

0.0.7

Oct 24, 2019

0.0.6

Jul 15, 2019

0.0.5

Apr 12, 2019

0.0.4

Apr 10, 2019

0.0.3

Mar 28, 2019

0.0.2

Mar 26, 2019

0.0.1

Mar 19, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ioweb-0.0.19.tar.gz (24.3 kB view details)

Uploaded Dec 23, 2019 Source

File details

Details for the file ioweb-0.0.19.tar.gz.

File metadata

Download URL: ioweb-0.0.19.tar.gz
Upload date: Dec 23, 2019
Size: 24.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: Python-urllib/2.7

File hashes

Hashes for ioweb-0.0.19.tar.gz
Algorithm	Hash digest
SHA256	`ef6b3fa4f2fec4ec6931a9523d89442fe1e384e9ef9bf96c507d22d157deda37`
MD5	`bba083cd47d87b3b5f3680d05f3d9db9`
BLAKE2b-256	`1a77940ed3040e1c345e956dd8cd1c883249b4038c42d6c2d20113bdbdbc69fb`