Skip to main content

A modern Python library for writing maintainable web scrapers.

Project description

Overview

spatula is a modern Python library for writing maintainable web scrapers.

Source: https://github.com/jamesturk/spatula

Documentation: https://jamesturk.github.io/spatula/

Issues: https://github.com/jamesturk/spatula/issues

PyPI badge Test badge

Features

  • Page-oriented design: Encourages writing understandable & maintainable scrapers.
  • Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
  • Fast HTML parsing: Uses lxml.html for fast, consistent, and reliable parsing of HTML.
  • Flexible Data Model Support: Compatible with dataclasses, attrs, pydantic, or bring your own data model classes for storing & validating your scraped data.
  • CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
  • Fully Typed: Makes full use of Python 3 type annotations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spatula-0.8.10.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

spatula-0.8.10-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file spatula-0.8.10.tar.gz.

File metadata

  • Download URL: spatula-0.8.10.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.9.10 Darwin/20.6.0

File hashes

Hashes for spatula-0.8.10.tar.gz
Algorithm Hash digest
SHA256 4d55bdae9f9eae5df692656b4beb837ac426529605462d472bd11e068211f680
MD5 d69b9b804a9807b2e640b76ab1e16ea9
BLAKE2b-256 5b13b9d56036d864a20f7cc448cf0c878398821fbf379c4bc98409053176717f

See more details on using hashes here.

File details

Details for the file spatula-0.8.10-py3-none-any.whl.

File metadata

  • Download URL: spatula-0.8.10-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.9.10 Darwin/20.6.0

File hashes

Hashes for spatula-0.8.10-py3-none-any.whl
Algorithm Hash digest
SHA256 13d962f4c54851114ad2d6edd71e59badb890912f493f0050ddd5b186692a0a0
MD5 e1ca19a6a19bcee1fc2a0e44e1533e05
BLAKE2b-256 c8732f7f33c05bc8635fb1b8ead721be39d281c1679afa43af109107f731ad9a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page