Skip to main content

Web Scraping Framework based on py3 asyncio

Project description

https://travis-ci.org/lorien/crawler.png?branch=master https://coveralls.io/repos/lorien/crawler/badge.svg?branch=master https://pypip.in/download/crawler/badge.svg?period=month https://pypip.in/version/crawler/badge.svg https://landscape.io/github/lorien/crawler/master/landscape.png

Web scraping framework based on py3 asyncio & aiohttp libraries.

Usage Example

import re
from itertools import islice

from crawler import Crawler, Request

RE_TITLE = re.compile(r'<title>([^<]+)</title>', re.S | re.I)

class TestCrawler(Crawler):
    def task_generator(self):
        for host in islice(open('var/domains.txt'), 100):
            host = host.strip()
            if host:
                yield Request('http://%s/' % host, tag='page')

    def handler_page(self, req, res):
        print('Result of request to {}'.format(req.url))
        try:
            title = RE_TITLE.search(res.body).group(1)
        except AttributeError:
            title = 'N/A'
        print('Title: {}'.format(title))

bot = TestCrawler(concurrency=10)
bot.run()

Installation

pip install crawler

Dependencies

  • Python>=3.4

  • aiohttp

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawler-0.0.2.tar.gz (6.0 kB view details)

Uploaded Source

File details

Details for the file crawler-0.0.2.tar.gz.

File metadata

  • Download URL: crawler-0.0.2.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for crawler-0.0.2.tar.gz
Algorithm Hash digest
SHA256 b6b5bcc2f2a64ac60251bee1494bd7ea98605ef1a8bf87db5194bea4bdd420d2
MD5 272f2a88e1376ac09f2d310405ff2bb8
BLAKE2b-256 8d422b042beebf63f6d490d38b698f06ee4fdd16a1d32fa2373a6b662a37a33d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page