Web Scraping Framework based on py3 asyncio
Project description
Web scraping framework based on py3 asyncio & aiohttp libraries.
Usage Example
import re
from itertools import islice
from crawler import Crawler, Request
RE_TITLE = re.compile(r'<title>([^<]+)</title>', re.S | re.I)
class TestCrawler(Crawler):
def task_generator(self):
for host in islice(open('var/domains.txt'), 100):
host = host.strip()
if host:
yield Request('http://%s/' % host, tag='page')
def handler_page(self, req, res):
print('Result of request to {}'.format(req.url))
try:
title = RE_TITLE.search(res.body).group(1)
except AttributeError:
title = 'N/A'
print('Title: {}'.format(title))
bot = TestCrawler(concurrency=10)
bot.run()
Installation
pip install crawler
Dependencies
Python>=3.4
aiohttp
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
crawler-0.0.2.tar.gz
(6.0 kB
view details)
File details
Details for the file crawler-0.0.2.tar.gz
.
File metadata
- Download URL: crawler-0.0.2.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6b5bcc2f2a64ac60251bee1494bd7ea98605ef1a8bf87db5194bea4bdd420d2 |
|
MD5 | 272f2a88e1376ac09f2d310405ff2bb8 |
|
BLAKE2b-256 | 8d422b042beebf63f6d490d38b698f06ee4fdd16a1d32fa2373a6b662a37a33d |