ioweb

Web Scraping Framework

These details have not been verified by PyPI

Project links

Project description

## IOWeb Framework

Python framework to build web crawlers.

What we have at the moment:

system designed to run large number network threads (like 100 or 500) on

on CPU core

built-in feature to combine things in chunks and then doing something with

chunks (like mongodb bulk write)

asynchronous things are powered by gevent

network requests are handled with urllib3

urllib3 monkey-patched to extract cert details

urllib3 monkey-patched to not do domain resolving if domain IP has been provided

built-in stat module to count events, built-in logging into influxdb

retrying on errors

no tests

no documentation

I am using ioweb to do bulk web scraping like crawling 500M pages in few days.

## Places to talk

[t.me/grablab](https://t.me/grablab) - English chat about web scraping

[t.me/grablab_ru](https://t.me/grablab_ru) - Russian chat about web scraping

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.29

Nov 19, 2020

0.0.28

Nov 13, 2020

0.0.24

May 7, 2020

0.0.23

May 3, 2020

0.0.22

Mar 31, 2020

0.0.21

Jan 4, 2020

0.0.20

Jan 4, 2020

0.0.19

Dec 23, 2019

0.0.18

Dec 23, 2019

0.0.17

Dec 12, 2019

0.0.16

Dec 7, 2019

0.0.15

Dec 6, 2019

0.0.14

Dec 6, 2019

0.0.13

Dec 2, 2019

0.0.12

Nov 17, 2019

0.0.11

Nov 6, 2019

0.0.10

Nov 5, 2019

This version

0.0.9

Nov 5, 2019

0.0.8

Nov 5, 2019

0.0.7

Oct 24, 2019

0.0.6

Jul 15, 2019

0.0.5

Apr 12, 2019

0.0.4

Apr 10, 2019

0.0.3

Mar 28, 2019

0.0.2

Mar 26, 2019

0.0.1

Mar 19, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ioweb-0.0.9.tar.gz (18.6 kB view details)

Uploaded Nov 5, 2019 Source

File details

Details for the file ioweb-0.0.9.tar.gz.

File metadata

Download URL: ioweb-0.0.9.tar.gz
Upload date: Nov 5, 2019
Size: 18.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: Python-urllib/2.7

File hashes

Hashes for ioweb-0.0.9.tar.gz
Algorithm	Hash digest
SHA256	`db6729c877bd2f33b1235f06d6fb1fb5ae4222285f33381f1e67a38f757e9b0d`
MD5	`a6bf412ac0e6a43f92a0232d2968a4c2`
BLAKE2b-256	`e878382d9952a4b81c7a6c25829b1304d827817c7821735b9838869758880f4e`