Skip to main content

Web Scraping Framework

Project description

## IOWeb Framework

Python framework to build web crawlers.

What we have at the moment:

  • system designed to run large number network threads (like 100 or 500) on

    on CPU core

  • built-in feature to combine things in chunks and then doing something with

    chunks (like mongodb bulk write)

  • asynchronous things are powered by gevent

  • network requests are handled with urllib3

  • urllib3 monkey-patched to extract cert details

  • urllib3 monkey-patched to not do domain resolving if domain IP has been provided

  • built-in stat module to count events, built-in logging into influxdb

  • retrying on errors

  • no tests

  • no documentation

I am using ioweb to do bulk web scraping like crawling 500M pages in few days.

## Places to talk

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ioweb-0.0.9.tar.gz (18.6 kB view details)

Uploaded Source

File details

Details for the file ioweb-0.0.9.tar.gz.

File metadata

  • Download URL: ioweb-0.0.9.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/2.7

File hashes

Hashes for ioweb-0.0.9.tar.gz
Algorithm Hash digest
SHA256 db6729c877bd2f33b1235f06d6fb1fb5ae4222285f33381f1e67a38f757e9b0d
MD5 a6bf412ac0e6a43f92a0232d2968a4c2
BLAKE2b-256 e878382d9952a4b81c7a6c25829b1304d827817c7821735b9838869758880f4e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page