Skip to main content

Sequitur algorithm for inferring hierarchies

Project description

SciKit Sequitur is an Apache2 licensed Python module for inferring compositional hierarchies from sequences.

Sequitur detects repetition and factors it out by forming rules in a grammar. The rules can be composed of non-terminals, giving rise to a hierarchy. It is useful for recognizing lexical structure in strings, and excels at very long sequences. The Sequitur algorithm was originally developed by Craig Nevill-Manning and Ian Witten.

>>> from sksequitur import parse
>>> grammar = parse('hello hello')
>>> print(grammar)
0 -> 1 _ 1
1 -> h e l l o                                    hello

SciKit Sequitur works on strings, lines, or any sequence of Python objects.

Features

  • Pure-Python

  • Developed on Python 3.8

  • Tested on CPython 3.6, 3.7, 3.8

  • Tested using GitHub Actions on Linux, Mac, and Windows

https://github.com/grantjenks/scikit-sequitur/workflows/integration/badge.svg

Quickstart

Installing scikit-sequitur is simple with pip:

$ pip install scikit-sequitur

You can access documentation in the interpreter with Python’s built-in help function:

>>> import sksequitur
>>> help(sksequitur)                    # doctest: +SKIP

Tutorial

The scikit-sequitur module provides utilities for parsing sequences and understanding grammars.

>>> from sksequitur import parse
>>> print(parse('abcabc'))
0 -> 1 1
1 -> a b c                                        abc

The parse function is a shortcut for Parser and Grammar objects.

>>> from sksequitur import Parser
>>> parser = Parser()

Feed works incrementally.

>>> parser.feed('ab')
>>> parser.feed('cab')
>>> parser.feed('c')

Parsers can be converted to Grammars.

>>> from sksequitur import Grammar
>>> grammar = Grammar(parser.tree)
>>> print(grammar)
0 -> 1 1
1 -> a b c                                        abc

Grammars are keyed by Productions.

>>> from sksequitur import Production
>>> grammar[Production(0)]
[Production(1), Production(1)]

Mark symbols can be used to store metadata about a sequence. The mark symbol is printed as a pipe character “|”.

>>> from sksequitur import Mark
>>> mark = Mark()
>>> mark
Mark()
>>> print(mark)
|

Attributes can be added to mark symbols using keyword arguments.

>>> mark = Mark(kind='start', name='foo.py')
>>> mark
Mark(kind='start', name='foo.py')
>>> mark.kind
'start'

Mark symbols can not be made part of a rule.

>>> parser = Parser()
>>> parser.feed('ab')
>>> parser.feed([Mark()])
>>> parser.feed('cab')
>>> parser.feed([Mark()])
>>> parser.feed('c')
>>> grammar = Grammar(parser.tree)
>>> print(grammar)
0 -> 1 | c 1 | c
1 -> a b                                          ab

Reference

License

Copyright 2020 Grant Jenks

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit-sequitur-0.3.1.tar.gz (6.9 kB view details)

Uploaded Source

Built Distributions

scikit_sequitur-0.3.1-cp39-cp39-win_amd64.whl (42.2 kB view details)

Uploaded CPython 3.9 Windows x86-64

scikit_sequitur-0.3.1-cp39-cp39-manylinux1_x86_64.whl (238.5 kB view details)

Uploaded CPython 3.9

scikit_sequitur-0.3.1-cp39-cp39-macosx_10_14_x86_64.whl (44.7 kB view details)

Uploaded CPython 3.9 macOS 10.14+ x86-64

scikit_sequitur-0.3.1-cp38-cp38-win_amd64.whl (42.3 kB view details)

Uploaded CPython 3.8 Windows x86-64

scikit_sequitur-0.3.1-cp38-cp38-manylinux1_x86_64.whl (255.2 kB view details)

Uploaded CPython 3.8

scikit_sequitur-0.3.1-cp38-cp38-macosx_10_14_x86_64.whl (45.4 kB view details)

Uploaded CPython 3.8 macOS 10.14+ x86-64

scikit_sequitur-0.3.1-cp37-cp37m-win_amd64.whl (41.3 kB view details)

Uploaded CPython 3.7m Windows x86-64

scikit_sequitur-0.3.1-cp37-cp37m-manylinux1_x86_64.whl (209.5 kB view details)

Uploaded CPython 3.7m

scikit_sequitur-0.3.1-cp37-cp37m-macosx_10_14_x86_64.whl (44.5 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

scikit_sequitur-0.3.1-cp36-cp36m-win_amd64.whl (41.1 kB view details)

Uploaded CPython 3.6m Windows x86-64

scikit_sequitur-0.3.1-cp36-cp36m-manylinux1_x86_64.whl (209.8 kB view details)

Uploaded CPython 3.6m

scikit_sequitur-0.3.1-cp36-cp36m-macosx_10_14_x86_64.whl (47.4 kB view details)

Uploaded CPython 3.6m macOS 10.14+ x86-64

File details

Details for the file scikit-sequitur-0.3.1.tar.gz.

File metadata

  • Download URL: scikit-sequitur-0.3.1.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit-sequitur-0.3.1.tar.gz
Algorithm Hash digest
SHA256 43de1d3f172b82e991c6740b020333c2ddc15588fb85191617d0e562384a0c02
MD5 f2f78b1b5b801f8db2d63c28b0862fe4
BLAKE2b-256 e60c1924b9ba58fdf2a313a9b61ad077d603118ce3b2de43254c87baf797a5f5

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 42.2 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 2e61eca37c352fa7732ae6b695dd000937b0f77c1b3b3d91e464960e35da1c0f
MD5 2cd1821bf61fbd55110d3c65ac390869
BLAKE2b-256 583ee3afc94530f5252ff859372abc867debef8440e855b647484eb3443b3729

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp39-cp39-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp39-cp39-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 238.5 kB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp39-cp39-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3d40bcc638fd59162848c5ec96aa75aac61f979c062de83f7a272d91903e302a
MD5 016b08021d3323afc95bc683bf52c7a5
BLAKE2b-256 78c87496e50d67d13a658d5f35900e705b94c3fc63dfc1553973b15c34eabfb5

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp39-cp39-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp39-cp39-manylinux1_x86_64.whl
  • Upload date:
  • Size: 238.5 kB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 81f140e2416080b27b03e42859b400467af7742997ee1f14e6d4a1e2f608bd73
MD5 266e7a4aa4c9cc9bc07d933d0483db58
BLAKE2b-256 e1cdd031ebfcffbff5aff16fd2a3f8e3f56cbddcf3d1cfaf72348a190d1e497f

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp39-cp39-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 44.7 kB
  • Tags: CPython 3.9, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 f863f58a453fb832e08f1025a92c2b1d344cb9e2d85095d79c1219f2ccabeb8e
MD5 88b9e348b8d03bf400fc534e6de71998
BLAKE2b-256 b71934a88ccc00bc4ee20414684a750fcc5bafbd8ad638acfef83ad9edea418e

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 42.3 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 1728b13ba8ea169ec20b8fef2bdec582bb4b656af1f123d814fefccce6e326b1
MD5 59a5a963f039e63661777b647519be2b
BLAKE2b-256 6b4bc0fc3584266e7a9996e6d4769871cc4a379849102c36219b2fb67a7330c0

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 255.2 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9930d4381b2dcdb0005e927efcbee28460f2d95b99ecae1f75c5eb549c614b19
MD5 daf0e862e8be85743a6bfa7e0e82267d
BLAKE2b-256 674c5203739952c5aa36f173592c8321de624e83d5d13f42925b565b713cafd4

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 255.2 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d640cfff4e56bfc83bea2de06fb3eec71d66c81a54bcdb28c085d6a691f4c4c7
MD5 78b1d2c1601bfa53d38c1f6c07e900fd
BLAKE2b-256 afa6cbfcba8779688eb60db29cfa1256df94abe27ab372d1e093846684ab1bff

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 45.4 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 e752d00650ebe3be814dc953060aa8df0ce3c61cee5ea1b36fe886db38e85392
MD5 08ab0e174b5b5ed57ea514c0991f9ad2
BLAKE2b-256 bb63289c7fc6154801521c787c06b7ee1b1848fcc505f6ba3f6ade13354008ef

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 41.3 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 1b432a08ea82d9e9d1e89972292f576c5c9e7495bd9b21b62d6ec7fd6ba2bbf7
MD5 e56311377c31592ff8e4ee90fd8b892f
BLAKE2b-256 2801e7153b9f227214249b5ca6a99b12284e1eac975c54176df10a8aca659bb8

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp37-cp37m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 209.5 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1beee9b97c0b02c44d2af27e8138310365f42fad3bbb7c2f1852064e12b1b468
MD5 4da5641693b1020d059ca161b9955bdf
BLAKE2b-256 b8f7fac80378f069f5d0612dcbfd0cdb23e8c916d7e4f3d131c6883f8abab277

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 209.5 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2ffdd4df28650948fb63b136471006f8a744fab5df4e85cd46f91c76bd605049
MD5 a08009287d97d08a73afcefc934dc172
BLAKE2b-256 f423a8e5a6765aed52fbc84bac53a5895df4c92923b3973dc6a4d7caf67ccf08

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 44.5 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 a04cafd8dad16f788ea20a2aa31542138f2668012bfb20cfc7252740ba415da3
MD5 5fc953b138a02ecbb505d01b1ee5e26a
BLAKE2b-256 38939600585062e49de7a210c3b39f74729ecd998478c02de3789b75d353306e

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 41.1 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 e6b3182f8cc48c05f44cbe712fc9d8e793106e7c1347f19d3caa0feb05837c88
MD5 3ca9547121edcdd581124eb582d011ee
BLAKE2b-256 244b784dc17f3e6c938b7243b1dbe9289d2a683e8758e979df22573802e13367

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp36-cp36m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp36-cp36m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 209.8 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ef4708e74f7d3ca1ad84d06cf17d1f605944e0f4765fc7be2e8ce6b121f5a52a
MD5 0102c4f0fd75eedb380ad17beb485f5a
BLAKE2b-256 e166dd738dc54a49db6416d4c192af89c9e61c7d5e16a7e0d73f9f5a8ff35a31

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 209.8 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9ba01e59cfd1260c0ddb006257a1bc594720d46ad1eda4bf1e731c57babb7e47
MD5 e3dd0d919e361b82a8ca615a5341676d
BLAKE2b-256 31e398170460cdd3e6718933f7710cfc3e5ff43e4fe830a98b4955995239171f

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.1-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.1-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 47.4 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.1-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 d278e7aa6861d01e325ed8e757f2eac2d227734ff592e6c54d84d6939379394d
MD5 fed92bccb9ef5b8e7742439d3eddaffd
BLAKE2b-256 b04ac6c72c745b545070cbd6e515a7110f5981a8d8a7b75c347225b2959e6a82

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page