Skip to main content

A modern Python library for writing maintainable web scrapers.

Project description

Overview

spatula is a modern Python library for writing maintainable web scrapers.

Source: https://github.com/jamesturk/spatula

Documentation: https://jamesturk.github.io/spatula/

Issues: https://github.com/jamesturk/spatula/issues

PyPI badge Test badge

Features

  • Page-oriented design: Encourages writing understandable & maintainable scrapers.
  • Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
  • Fast HTML parsing: Uses lxml.html for fast, consistent, and reliable parsing of HTML.
  • Flexible Data Model Support: Compatible with dataclasses, attrs, pydantic, or bring your own data model classes for storing & validating your scraped data.
  • CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
  • Fully Typed: Makes full use of Python 3 type annotations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spatula-0.8.2.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

spatula-0.8.2-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file spatula-0.8.2.tar.gz.

File metadata

  • Download URL: spatula-0.8.2.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.1 Darwin/20.3.0

File hashes

Hashes for spatula-0.8.2.tar.gz
Algorithm Hash digest
SHA256 b50e944c99524cbc43a172d4bf11faaf65cd4e4be6f96f1d3a1e5bf6748027a8
MD5 d95c28f77104df60261631e7aa833e31
BLAKE2b-256 7554cd1dfa85a5ca98e83626c46c328fdf3ea3e2fc8f140b8c8b366de4597794

See more details on using hashes here.

File details

Details for the file spatula-0.8.2-py3-none-any.whl.

File metadata

  • Download URL: spatula-0.8.2-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.1 Darwin/20.3.0

File hashes

Hashes for spatula-0.8.2-py3-none-any.whl
Algorithm Hash digest
SHA256 44390bbe54ad2178001abe76ed8067a48be0333ed2ee7900375a3697a7187db7
MD5 203c28faef4f79362e9d5c79a48e626a
BLAKE2b-256 a93983e35ccd3e80c538bec7ba8c888734b98e8072e81fe72c9d11c2cffbb304

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page