Skip to main content

A modern Python library for writing maintainable web scrapers.

Project description

Overview

spatula is a modern Python library for writing maintainable web scrapers.

Source: https://github.com/jamesturk/spatula

Documentation: https://jamesturk.github.io/spatula/

Issues: https://github.com/jamesturk/spatula/issues

PyPI badge Test badge

Features

  • Page-oriented design: Encourages writing understandable & maintainable scrapers.
  • Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
  • Fast HTML parsing: Uses lxml.html for fast, consistent, and reliable parsing of HTML.
  • Flexible Data Model Support: Compatible with dataclasses, attrs, pydantic, or bring your own data model classes for storing & validating your scraped data.
  • CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
  • Fully Typed: Makes full use of Python 3 type annotations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spatula-0.9.0.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

spatula-0.9.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file spatula-0.9.0.tar.gz.

File metadata

  • Download URL: spatula-0.9.0.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.9.10 Darwin/20.6.0

File hashes

Hashes for spatula-0.9.0.tar.gz
Algorithm Hash digest
SHA256 9e4f71c40a4e7cd0b7c5979de6538ff955340015dde87cc89f76b88dd7e7589c
MD5 654c50e0edc07526c4adf5cf7f5badc2
BLAKE2b-256 aa1c6424da7f5c2fad47c41714d4441f84e97f8a78b70fa07cb53734929fe25f

See more details on using hashes here.

File details

Details for the file spatula-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: spatula-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.9.10 Darwin/20.6.0

File hashes

Hashes for spatula-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 de3eba7b40a11b0759406e5e383dd6511f46a30ad984ca476ee81fb247c59560
MD5 bd6eb95c435aa4bacb3048039ad9c73b
BLAKE2b-256 9411d1d2d3cbf191eb8b8f32673a452d3ee0f4aa998d47a818bc665f8b547dd3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page