Skip to main content

Sequitur algorithm for inferring hierarchies

Project description

SciKit Sequitur is an Apache2 licensed Python module for inferring compositional hierarchies from sequences.

Sequitur detects repetition and factors it out by forming rules in a grammar. The rules can be composed of non-terminals, giving rise to a hierarchy. It is useful for recognizing lexical structure in strings, and excels at very long sequences. The Sequitur algorithm was originally developed by Craig Nevill-Manning and Ian Witten.

>>> from sksequitur import parse
>>> grammar = parse('hello hello')
>>> print(grammar)
0 -> 1 _ 1
1 -> h e l l o                                    hello

SciKit Sequitur works on strings, lines, or any sequence of Python objects.

Features

  • Pure-Python

  • Developed on Python 3.10

  • Tested on CPython 3.6, 3.7, 3.8, 3.9, 3.10

  • Tested using GitHub Actions on Linux, Mac, and Windows

https://github.com/grantjenks/scikit-sequitur/workflows/integration/badge.svg

Quickstart

Installing scikit-sequitur is simple with pip:

$ pip install scikit-sequitur

You can access documentation in the interpreter with Python’s built-in help function:

>>> import sksequitur
>>> help(sksequitur)                    # doctest: +SKIP

Tutorial

The scikit-sequitur module provides utilities for parsing sequences and understanding grammars.

>>> from sksequitur import parse
>>> print(parse('abcabc'))
0 -> 1 1
1 -> a b c                                        abc

The parse function is a shortcut for Parser and Grammar objects.

>>> from sksequitur import Parser
>>> parser = Parser()

Feed works incrementally.

>>> parser.feed('ab')
>>> parser.feed('cab')
>>> parser.feed('c')

Parsers can be converted to Grammars.

>>> from sksequitur import Grammar
>>> grammar = Grammar(parser.tree)
>>> print(grammar)
0 -> 1 1
1 -> a b c                                        abc

Grammars are keyed by Productions.

>>> from sksequitur import Production
>>> grammar[Production(0)]
[Production(1), Production(1)]

Mark symbols can be used to store metadata about a sequence. The mark symbol is printed as a pipe character “|”.

>>> from sksequitur import Mark
>>> mark = Mark()
>>> mark
Mark()
>>> print(mark)
|

Attributes can be added to mark symbols using keyword arguments.

>>> mark = Mark(kind='start', name='foo.py')
>>> mark
Mark(kind='start', name='foo.py')
>>> mark.kind
'start'

Mark symbols can not be made part of a rule.

>>> parser = Parser()
>>> parser.feed('ab')
>>> parser.feed([Mark()])
>>> parser.feed('cab')
>>> parser.feed([Mark()])
>>> parser.feed('c')
>>> grammar = Grammar(parser.tree)
>>> print(grammar)
0 -> 1 | c 1 | c
1 -> a b                                          ab

Reference

License

Copyright 2021 Grant Jenks

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit-sequitur-0.4.0.tar.gz (6.6 kB view details)

Uploaded Source

Built Distributions

scikit_sequitur-0.4.0-cp310-cp310-win_amd64.whl (42.7 kB view details)

Uploaded CPython 3.10 Windows x86-64

scikit_sequitur-0.4.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (245.0 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

scikit_sequitur-0.4.0-cp310-cp310-macosx_10_14_x86_64.whl (45.1 kB view details)

Uploaded CPython 3.10 macOS 10.14+ x86-64

scikit_sequitur-0.4.0-cp39-cp39-win_amd64.whl (42.6 kB view details)

Uploaded CPython 3.9 Windows x86-64

scikit_sequitur-0.4.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (243.2 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

scikit_sequitur-0.4.0-cp39-cp39-macosx_10_14_x86_64.whl (45.0 kB view details)

Uploaded CPython 3.9 macOS 10.14+ x86-64

scikit_sequitur-0.4.0-cp38-cp38-win_amd64.whl (42.6 kB view details)

Uploaded CPython 3.8 Windows x86-64

scikit_sequitur-0.4.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (241.0 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

scikit_sequitur-0.4.0-cp38-cp38-macosx_10_14_x86_64.whl (45.8 kB view details)

Uploaded CPython 3.8 macOS 10.14+ x86-64

scikit_sequitur-0.4.0-cp37-cp37m-win_amd64.whl (41.6 kB view details)

Uploaded CPython 3.7m Windows x86-64

scikit_sequitur-0.4.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (206.2 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

scikit_sequitur-0.4.0-cp37-cp37m-macosx_10_14_x86_64.whl (44.7 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

scikit_sequitur-0.4.0-cp36-cp36m-win_amd64.whl (41.5 kB view details)

Uploaded CPython 3.6m Windows x86-64

scikit_sequitur-0.4.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (205.8 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

scikit_sequitur-0.4.0-cp36-cp36m-macosx_10_14_x86_64.whl (47.7 kB view details)

Uploaded CPython 3.6m macOS 10.14+ x86-64

File details

Details for the file scikit-sequitur-0.4.0.tar.gz.

File metadata

  • Download URL: scikit-sequitur-0.4.0.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for scikit-sequitur-0.4.0.tar.gz
Algorithm Hash digest
SHA256 992150b759d818cd6612942b5904fa72c4c0216f0046f01e0e113cea6e39c023
MD5 02bb24830db9fc5a9d1d2d1bfe37ee04
BLAKE2b-256 983231c28ce7441b16b6cf2f4a552ecd07a0b6741c8f628001ea853663efeb12

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.4.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 42.7 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for scikit_sequitur-0.4.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 26477d51f3a920a8ced57f4f7f75ea788c6b03ff3e0454674b9246278349a9e2
MD5 a4879851a2b93ec13e0191e52674e21a
BLAKE2b-256 cf4837b18fbdbebf40581c04a62998cf72e8d69c9e05907f73584d69d425a520

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_sequitur-0.4.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 778873ee484592a7c1c4e26b06f83dcc59bb1dd85ffd0531bb2467903ba06bd3
MD5 b4ec1094ece9e8e341103e88c251ef2c
BLAKE2b-256 7e69468ba31813db10f900042067b18fec693bb1b12352ad77b9309888986cf2

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.4.0-cp310-cp310-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 45.1 kB
  • Tags: CPython 3.10, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for scikit_sequitur-0.4.0-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 7a77b3c1e09ba1d4a1740099236c4bb37e8e8648211c3e69fa106e4117e603b9
MD5 e304ec62cac49cabe3c65b3785ca2f17
BLAKE2b-256 d52d7d14accfce5bdf94333a29c3defc0f71c9d6124e3aa360ecbd3f66a331f1

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.4.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 42.6 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for scikit_sequitur-0.4.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 669a86e6d9af093ad01a017521227cd86211cbcafdd9175e077baa1d3a6a2158
MD5 5e49c6ebd2c07f7da3b79c9804e95dc8
BLAKE2b-256 89d71c66e7638f4acc1c49bf2dc8090576391bc2df60e48324ad09cdee18af2e

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_sequitur-0.4.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 86933e9728ed8c323e41a61e78f5de3daa3bda04fc7be02bd239f967c87d67ed
MD5 8843065228cd075ea8dc57c5e0381e69
BLAKE2b-256 446d7c025a9fe6d236d51a5af6fa289007fc6c3b536d5ac51188fc027a1b4763

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.4.0-cp39-cp39-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 45.0 kB
  • Tags: CPython 3.9, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for scikit_sequitur-0.4.0-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 ee82585484dc86f5308adcdfa0cf893cd5ef2c5043dc3f16e45f18d0248603a1
MD5 31c62cd5f1787e77ff38e6c83b925488
BLAKE2b-256 d78ad76cc320f8c7876847d1291fb2e69a4597b730e9e672705385b9a2ac59ee

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.4.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 42.6 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for scikit_sequitur-0.4.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 6d6fc4030bc92f9b9b8f97ae1f250502d4db50402cb652d8cf140750742176c7
MD5 77dc07447b96649a58cdd52d4da5c91d
BLAKE2b-256 3575548a5a032dccc1d14d61919fa4463c6f3028a79a06af10c0c36fb0101f7c

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_sequitur-0.4.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8641bc51eb2f029270045a4b4eab1062176f2f738b7c49394894743d254f28ca
MD5 dc8519871405d773e194a65bcf96e500
BLAKE2b-256 86783c61cab03c6c22a2f4d6f86b4fbfc97d1c47276272995216c421567a64ef

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.4.0-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 45.8 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for scikit_sequitur-0.4.0-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 be0e2708140ee28233ee815e17566affbd22b9c47cbb767534716bb51be0f975
MD5 3950000b9dc644d822bf1511ceba433f
BLAKE2b-256 79209c020f825a764cf1412d656bc3c9c4ea8bccc48416b3c3159a5a87059b2b

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.4.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 41.6 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for scikit_sequitur-0.4.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 f8c0bd7683d9410697b5581d328c0d1aa2c04849738d99fdd4ca6873303afcd3
MD5 c9dcd6612bfab2d6a3b103c1c449e073
BLAKE2b-256 e040e716991f5fba17696cf01a3e270946a928fb7c61fe6eda551b10b33a4611

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_sequitur-0.4.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5a8b2066007ce20bc714334c60e3e053672506a970b430121de639f06de18067
MD5 cc98691c0d5684783658e85403440416
BLAKE2b-256 e256aaa5d9baf4042eaec62fbdb23f2a9d2a146c862a6469b45bdf3ae7aae9b3

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.4.0-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 44.7 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for scikit_sequitur-0.4.0-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 83f1dd0a73130bc569ce899612059a4ce8afe32adf04ee7dc0d8d87fd3f891aa
MD5 685f28852488b360d283cdf72d5d54ce
BLAKE2b-256 98893ac057619c3cd66bd896c5e4a636ef3369918b748509571c045cd5d77445

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.4.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 41.5 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for scikit_sequitur-0.4.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 15e0a45264cd008ca9f2f72f8fed4fa62e706ee74f786388f84b6f8a4df7dde8
MD5 54566d88605e1f45d748eaa0da2c0494
BLAKE2b-256 d3748ba24ed1b6fa8953b3715f5bde3b7789424bf80ed8a820b0204595297fef

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_sequitur-0.4.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7b933d237b40a0c240300f1f03c0f8a452b27eb4e4976850e1bba102da38d989
MD5 2651f6ac54091c5c06ada70aac7aafcc
BLAKE2b-256 cff77778352b83cfbfcfdf2caa32373bf29f52852a36474e548913c4b330a3a8

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.4.0-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.4.0-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 47.7 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for scikit_sequitur-0.4.0-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 09ec24f330a1930d0b8d8d8647f9e824402cd8baf004183128a3b3ea3ee08150
MD5 95a313394e53430a330fed150e5a2183
BLAKE2b-256 eade2ee240c65808cf4743883f9ec992ae15b938c7bf5c55b3b02a51bb001027

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page