Skip to main content

A modern Python library for writing maintainable web scrapers.

Project description

Overview

spatula is a modern Python library for writing maintainable web scrapers.

Source: https://github.com/jamesturk/spatula

Documentation: https://jamesturk.github.io/spatula/

Issues: https://github.com/jamesturk/spatula/issues

PyPI badge Test badge

Features

  • Page-oriented design: Encourages writing understandable & maintainable scrapers.
  • Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
  • Fast HTML parsing: Uses lxml.html for fast, consistent, and reliable parsing of HTML.
  • Flexible Data Model Support: Compatible with dataclasses, attrs, pydantic, or bring your own data model classes for storing & validating your scraped data.
  • CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
  • Fully Typed: Makes full use of Python 3 type annotations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spatula-0.8.6.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

spatula-0.8.6-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file spatula-0.8.6.tar.gz.

File metadata

  • Download URL: spatula-0.8.6.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.9.1 Darwin/20.6.0

File hashes

Hashes for spatula-0.8.6.tar.gz
Algorithm Hash digest
SHA256 962404a5499af01814544be8371145d0f6ce54e30f379c2fdbff945bde3d1496
MD5 9a3c8b15961b7c775a51f04818b01abe
BLAKE2b-256 9b6a4ed5618bda26bfa6ad4c32ba68e8a1273bc78be61b08b4460c9f869954fd

See more details on using hashes here.

File details

Details for the file spatula-0.8.6-py3-none-any.whl.

File metadata

  • Download URL: spatula-0.8.6-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.9.1 Darwin/20.6.0

File hashes

Hashes for spatula-0.8.6-py3-none-any.whl
Algorithm Hash digest
SHA256 d23f179d0519733132d65e6ca1792b6689ee828a37a14c9c1415fec7a457297b
MD5 e64ad7f208c25fccd4899288abc416ab
BLAKE2b-256 774550db2142d7a876e991467dbd005ae29130ba94eb73817d706921d9f601d4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page