Skip to main content

A modern Python library for writing maintainable web scrapers.

Project description

Overview

spatula is a modern Python library for writing maintainable web scrapers.

Source: https://github.com/jamesturk/spatula

Documentation: https://jamesturk.github.io/spatula/

Issues: https://github.com/jamesturk/spatula/issues

PyPI badge Test badge

Features

  • Page-oriented design: Encourages writing understandable & maintainable scrapers.
  • Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
  • Fast HTML parsing: Uses lxml.html for fast, consistent, and reliable parsing of HTML.
  • Flexible Data Model Support: Compatible with dataclasses, attrs, pydantic, or bring your own data model classes for storing & validating your scraped data.
  • CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
  • Fully Typed: Makes full use of Python 3 type annotations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spatula-0.8.9.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

spatula-0.8.9-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file spatula-0.8.9.tar.gz.

File metadata

  • Download URL: spatula-0.8.9.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.9.7 Darwin/20.6.0

File hashes

Hashes for spatula-0.8.9.tar.gz
Algorithm Hash digest
SHA256 065b83c4bc1464e49e069c0f722cd675c63da0f6f4de58d6b969dd42896350aa
MD5 955d31095ca025b2956b5887db685fe3
BLAKE2b-256 b43d15776c569ca33f95e368839bec110a3cbd8f65b8b6837d33afc50df92167

See more details on using hashes here.

File details

Details for the file spatula-0.8.9-py3-none-any.whl.

File metadata

  • Download URL: spatula-0.8.9-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.9.7 Darwin/20.6.0

File hashes

Hashes for spatula-0.8.9-py3-none-any.whl
Algorithm Hash digest
SHA256 a6e941e540d5d6525fae976983faf616f5e3824efe484465f590bc9380363637
MD5 7f5f1fb76227fad507ea35703e42871b
BLAKE2b-256 2bc7fee45a473976be9d63bf37851cad7fc7feb6a06d9ce20ed6a7fa4cec6a4f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page