18 projects
scrapy-zyte-api
Client library to process URLs through Zyte API
zyte-common-items
Item definitions for Zyte API schema
formasaurus
Formasaurus tells you the types of HTML forms and their fields using machine learning
duplicate-url-discarder-rules
The rules for duplicate-url-discarder.
zyte-parsers
Parsing of data from web pages.
scrapy-spider-metadata
Utilities to extend Scrapy spiders with usable metadata.
zyte-spider-templates
Spider templates for automatic crawlers.
scrapy-settings-log
An extension that allows a user to display all or some of their scrapy spider settings at runtime.
duplicate-url-discarder
Discarding duplicate URLs based on rules.
sklearn-crfsuite
CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn
zyte-api
Python interface to Zyte API
html-text
Extract text from HTML
clear-html
Clean and normalize HTML.
url-matcher
URL matching rules library to connect URLs with resources
onefile
Merge multiple files into one!
scrapy-time-machine
A downloader middleware that stores the current request chain to be crawled at another time.
zyte-autoextract
Python interface to Zyte Automatic Extraction API