Skip to main content

Sequitur algorithm for inferring hierarchies

Project description

SciKit Sequitur is an Apache2 licensed Python module for inferring compositional hierarchies from sequences.

Sequitur detects repetition and factors it out by forming rules in a grammar. The rules can be composed of non-terminals, giving rise to a hierarchy. It is useful for recognizing lexical structure in strings, and excels at very long sequences. The Sequitur algorithm was originally developed by Craig Nevill-Manning and Ian Witten.

>>> from sksequitur import parse
>>> grammar = parse('hello hello')
>>> print(grammar)
0 -> 1 _ 1
1 -> h e l l o                                    hello

SciKit Sequitur works on strings, lines, or any sequence of Python objects.

Features

  • Pure-Python

  • Developed on Python 3.8

  • Tested on CPython 3.6, 3.7, 3.8

  • Tested using GitHub Actions on Linux, Mac, and Windows

https://github.com/grantjenks/scikit-sequitur/workflows/integration/badge.svg

Quickstart

Installing scikit-sequitur is simple with pip:

$ pip install scikit-sequitur

You can access documentation in the interpreter with Python’s built-in help function:

>>> import sksequitur
>>> help(sksequitur)                    # doctest: +SKIP

Tutorial

The scikit-sequitur module provides utilities for parsing sequences and understanding grammars.

>>> from sksequitur import parse
>>> print(parse('abcabc'))
0 -> 1 1
1 -> a b c                                        abc

The parse function is a shortcut for Parser and Grammar objects.

>>> from sksequitur import Parser
>>> parser = Parser()

Feed works incrementally.

>>> parser.feed('ab')
>>> parser.feed('cab')
>>> parser.feed('c')

Parsers can be converted to Grammars.

>>> from sksequitur import Grammar
>>> grammar = Grammar(parser.tree)
>>> print(grammar)
0 -> 1 1
1 -> a b c                                        abc

Grammars are keyed by Productions.

>>> from sksequitur import Production
>>> grammar[Production(0)]
[Production(1), Production(1)]

Mark symbols can be used to store metadata about a sequence. The mark symbol is printed as a pipe character “|”.

>>> from sksequitur import Mark
>>> mark = Mark()
>>> mark
Mark()
>>> print(mark)
|

Attributes can be added to mark symbols using keyword arguments.

>>> mark = Mark(kind='start', name='foo.py')
>>> mark
Mark(kind='start', name='foo.py')
>>> mark.kind
'start'

Mark symbols can not be made part of a rule.

>>> parser = Parser()
>>> parser.feed('ab')
>>> parser.feed([Mark()])
>>> parser.feed('cab')
>>> parser.feed([Mark()])
>>> parser.feed('c')
>>> grammar = Grammar(parser.tree)
>>> print(grammar)
0 -> 1 | c 1 | c
1 -> a b                                          ab

Reference

License

Copyright 2020 Grant Jenks

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit-sequitur-0.3.0.tar.gz (6.9 kB view details)

Uploaded Source

Built Distributions

scikit_sequitur-0.3.0-cp39-cp39-win_amd64.whl (42.2 kB view details)

Uploaded CPython 3.9 Windows x86-64

scikit_sequitur-0.3.0-cp39-cp39-manylinux1_x86_64.whl (238.4 kB view details)

Uploaded CPython 3.9

scikit_sequitur-0.3.0-cp39-cp39-macosx_10_14_x86_64.whl (44.6 kB view details)

Uploaded CPython 3.9 macOS 10.14+ x86-64

scikit_sequitur-0.3.0-cp38-cp38-win_amd64.whl (42.2 kB view details)

Uploaded CPython 3.8 Windows x86-64

scikit_sequitur-0.3.0-cp38-cp38-manylinux1_x86_64.whl (255.1 kB view details)

Uploaded CPython 3.8

scikit_sequitur-0.3.0-cp38-cp38-macosx_10_14_x86_64.whl (45.3 kB view details)

Uploaded CPython 3.8 macOS 10.14+ x86-64

scikit_sequitur-0.3.0-cp37-cp37m-win_amd64.whl (41.2 kB view details)

Uploaded CPython 3.7m Windows x86-64

scikit_sequitur-0.3.0-cp37-cp37m-manylinux1_x86_64.whl (209.4 kB view details)

Uploaded CPython 3.7m

scikit_sequitur-0.3.0-cp37-cp37m-macosx_10_14_x86_64.whl (44.4 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

scikit_sequitur-0.3.0-cp36-cp36m-win_amd64.whl (41.0 kB view details)

Uploaded CPython 3.6m Windows x86-64

scikit_sequitur-0.3.0-cp36-cp36m-manylinux1_x86_64.whl (209.7 kB view details)

Uploaded CPython 3.6m

scikit_sequitur-0.3.0-cp36-cp36m-macosx_10_14_x86_64.whl (47.3 kB view details)

Uploaded CPython 3.6m macOS 10.14+ x86-64

File details

Details for the file scikit-sequitur-0.3.0.tar.gz.

File metadata

  • Download URL: scikit-sequitur-0.3.0.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit-sequitur-0.3.0.tar.gz
Algorithm Hash digest
SHA256 84669ce48adbde2956e04115425a94a8ae03be72b1b64434c470ca1340d74215
MD5 aedafbd8e05c40acddb84b86b6b7ded0
BLAKE2b-256 a386b3c046f5e435f6309bc2c15e15d2148da3aefe72765c72a136f187b3baf7

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 42.2 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 88656378a3e08695a3ef8c13a4b8308b4b4494e51888a58064dc2d9e1332df01
MD5 adde1a1e73f0f6bd9f85550903533c75
BLAKE2b-256 4d486b1271a56db4c00a7ba0e17aa9e4245dee1eb66e74caa1812186b2afda7b

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp39-cp39-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.0-cp39-cp39-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 238.4 kB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.0-cp39-cp39-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e64522e459690667cb37df897491952b7699df29885f967760737e75dd7ca103
MD5 04249d0d7f07d89d201aa5e3b758029b
BLAKE2b-256 68a7e4d6b222258e940895460cfdd56a60543b356034cf2c8866c24f32d52890

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp39-cp39-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.0-cp39-cp39-manylinux1_x86_64.whl
  • Upload date:
  • Size: 238.4 kB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 e099fecf2a30710261063066f765e3f4ac2e524955999387fb948a353632baec
MD5 60cd14ab9d996d016aea6bc765b766db
BLAKE2b-256 17ca1d4cfbd8b9685d9726f468794f6d059fec7b6477106580b1821c05dc3b3d

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.0-cp39-cp39-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 44.6 kB
  • Tags: CPython 3.9, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.0-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 49d68aa21993de2298c616e523e579f9f8c0d11b4377cc28ea3950269d4902cd
MD5 4964c5bb938ba3b0d03dd56a7fa018f1
BLAKE2b-256 5b9166f9244b0bc2511dd30fa7d5719e410dbc156539d919bda6ee031393dae8

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 42.2 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 223317989d70b0cec8e47de0fc9d8688aa0c5ecdbc85a5b0a6d27f7e17819d26
MD5 1ad110927784a1cd3f4acac5c55f3534
BLAKE2b-256 0e0d12e9902a07fa79eba796e80cd224d7ac504cd126e8df2ec80a5271c49fe7

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.0-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 255.1 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.0-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 da22555fdca2f5ab2c71a587c9e870374102ea73146f25f1069ad2906692cfb1
MD5 73f7cd22807104df2488a1a6d7e1e264
BLAKE2b-256 812af3b7c686ee54a85c862e2170e1c0fa2f32daf488e29e18824581893a07b2

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.0-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 255.1 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 e3718cf726ed7f0e843ef5ccfbd966b9a9a359170c3802ecd4bd90c9e27fac19
MD5 8e440b53cde3f7a79d10334c86b12ed5
BLAKE2b-256 9ac2d660fea278fd27f3708a755df4955d92637626290d640fea317003c60be9

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.0-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 45.3 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.0-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 f00fa6f6c21af6210b96054ba092750f642d707dee47f1f4eff365dea8a246ba
MD5 12cb91f6ff40716384202732630af4f1
BLAKE2b-256 fedf8f9ed93bf6caa054fb1c8b8c1860b8de657e65770c02405b8e21912fee63

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 41.2 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 857434ea7482d018f857e6418f82fd61f6bca18cd24a3b4ca0bfad446056b7e8
MD5 b6c55b66946cb61edf89531ca22efe5d
BLAKE2b-256 e31bbbcced3b20e501d9858111f204399ab95cd4034dc4c3cb447eea9631b163

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_sequitur-0.3.0-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2853249ca4744d6cd489842070849fb780d58699257678a6a80a94ca835cc33e
MD5 1d6b42ec6bfc9b0e4dd82e1637973101
BLAKE2b-256 699300ead85799491885719cf8bab61f7ddc2794ae6e7232d51fb74e6989939e

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.0-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 209.4 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d9910c46853b98f02bf22dfea34b9df49ac73ec855d1d8ecdfb018c3f04f5b42
MD5 94f7de7cfb0a5e915597d9d807597f08
BLAKE2b-256 66f026c0bacbf310e9faacb4ddba587f20b38eccb398e0e4f03721ce629404e6

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.0-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 44.4 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.0-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 9e630a4435fd9ab61593dccb9255257247feb0a118aafe351ad6833f6b5c0b4f
MD5 ac0c2a03244d20aa7335321875d836e3
BLAKE2b-256 165e590a6c513487f62fbcf6a9aef4bb61188cb59af4fbbe64a05a295c4660ac

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 41.0 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 51471b6d590568f29099a5374a609a1a4fa6cc017a47c525e377d15431104aaf
MD5 8c84219c12950839767e4f22a8955302
BLAKE2b-256 e4c44602d3f5503740841334f44915b615ac2e2fc57ae7d31c55ab0f06dd967d

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp36-cp36m-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_sequitur-0.3.0-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9cd1e4829c21b5e810a7a8957196fcfc8b9a7f00ab4abcab389f94fbdfd3cd3a
MD5 9353655f685ffab2fb5df0773321b64a
BLAKE2b-256 3d9658c56914f058908b2ca36f4e7ef5020366b93491deabe483e7ca1cb8ea69

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 209.7 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d01e34ca60f587b153281699191ef96b75215920eb0a9779694e7c08ab887504
MD5 09891f35def399abe4708b2f7f0d1aea
BLAKE2b-256 4b06ff4b699dfb654cca9ab18c3ded7492c6defdaf6313f411d02945ccb7fe19

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.3.0-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.3.0-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 47.3 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.3.0-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 d1ffeb0c2c8fe548d0f002f47a8ab48ac14ac3b29f8b7174ef1aeff8303db796
MD5 68a7102197eb4e8f3d25c4a4939ac7a1
BLAKE2b-256 17103a62a4bbd3fd5726831a6126d5e2c3bd0bca83a1ba92ca5cd1148ebc537c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page