Skip to main content

A kitchen for (beautiful) soup

Project description

https://travis-ci.org/visaplan/kitchen.svg?branch=master

visaplan.kitchen

This package tackles “soup”, i.e. trees which are created by the well-known beautifulsoup4 package from parsed HTML or XML sources. It might be possible to accomplish the same by using lxml directly, but it might have been more difficult, and thus it is left to another package.

Features

  • spoons module, for tackling “soup”, e.g.

    • has_any_class (a filter function to check for one of the given classes)

  • forks module (named mainly for historical reasons; for poking around in the soup), e.g. extract_linktext, convert_dimension_styles

  • ids module, for creation of new ids for HTML elements

    • id_factory:

      new_id = id_factory(...)
      id = new_id(prefix)

Tests remark

The modules are documented and tested by doctests. However, they currently don’t fully work because of import problems; see the issue tracker.

Help is appreciated.

Examples

This add-on can be seen in action at the following sites:

Documentation

For now, the functions are documented by doctests.

Installation

Install visaplan.kitchen by adding it to your buildout:

[buildout]

...

eggs =
    visaplan.kitchen

and then running bin/buildout

Contribute

Support

If you are having issues, please let us know; please use the issue tracker mentioned above.

License

The project is licensed under the GPLv2.

To Do

  • .extract module:

    • implement head(words=N) constraint

    • Create generic wordcount facility? (after the wc program; count words, characters, and probably lines as well)

Contributors

Changelog

1.1.0 (unreleased)

Improvements:

  • Improved Python 3 compatibility

New Features:

  • .spoons.swap_classes, supporting both add and remove options, and by default removing an emptry class attribute

Requirements:

Miscellaneous:

  • Zope/Plone entry point and configure.zcml removed, which didn’t do anything interesting

[tobiasherp]

1.0.5 (2024-04-09)

New Features:

  • .extract.head supports the verbose option to aid processing of multiple fields; code example included.

Improvements:

  • Added a doctest for .extract.head: yes, we accept text/plain as well.

Miscellaneous:

  • .extract._head_kwargs: when injecting the fuzz default value, we ignore a words restriction now, which may be given additionally; only the chars restriction is needed.

[tobiasherp]

1.0.4 (2023-12-21)

Bugfixes:

  • .spoons.stripped_soup raises an IndexError when called with empty content.

[tobiasherp]

1.0.3 (2022-09-20)

New Features:

  • New function .spoons.generate_image_infos

[tobiasherp]

1.0.2 (2021-10-27)

Improvements:

  • Imports sorted by isort

New Features:

  • New extract module to create extracts of HTML text (e.g. a head, containing the first NN visible characters)

Requirements:

[tobiasherp]

1.0.1 (2020-02-25)

  • Python 3 compatibility (python-modernize) [tobiasherp]

1.0 (2018-09-17)

  • Initial release. [tobiasherp]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visaplan.kitchen-1.1.0.dev1.tar.gz (52.2 kB view details)

Uploaded Source

File details

Details for the file visaplan.kitchen-1.1.0.dev1.tar.gz.

File metadata

  • Download URL: visaplan.kitchen-1.1.0.dev1.tar.gz
  • Upload date:
  • Size: 52.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.19.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/2.7.5

File hashes

Hashes for visaplan.kitchen-1.1.0.dev1.tar.gz
Algorithm Hash digest
SHA256 823698b36eead70468dee550d9210772601aeca7dae43a6625e83a8e26bf1e51
MD5 29fa1fdb16c5db0055c1fdbea3118014
BLAKE2b-256 e59d587a4fd8bc4b8ba8e2a818ef04ea87520ee53e3f1d189548089d9b78344b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page