Skip to main content

Sequitur algorithm for inferring hierarchies

Project description

SciKit Sequitur is an Apache2 licensed Python module for inferring compositional hierarchies from sequences.

Sequitur detects repetition and factors it out by forming rules in a grammar. The rules can be composed of non-terminals, giving rise to a hierarchy. It is useful for recognizing lexical structure in strings, and excels at very long sequences. The Sequitur algorithm was originally developed by Craig Nevill-Manning and Ian Witten.

>>> from sksequitur import parse
>>> grammar = parse('hello hello')
>>> print(grammar)
0 -> 1 _ 1
1 -> h e l l o                                    hello

SciKit Sequitur works on strings, lines, or any sequence of Python objects.

Features

  • Pure-Python

  • Developed on Python 3.8

  • Tested on CPython 3.6, 3.7, 3.8

  • Tested using GitHub Actions on Linux, Mac, and Windows

https://github.com/grantjenks/scikit-sequitur/workflows/integration/badge.svg

Quickstart

Installing scikit-sequitur is simple with pip:

$ pip install scikit-sequitur

You can access documentation in the interpreter with Python’s built-in help function:

>>> import sksequitur
>>> help(sksequitur)                    # doctest: +SKIP

Tutorial

The scikit-sequitur module provides utilities for parsing sequences and understanding grammars.

>>> from sksequitur import parse
>>> print(parse('abcabc'))
0 -> 1 1
1 -> a b c                                        abc

The parse function is a shortcut for Parser and Grammar objects.

>>> from sksequitur import Parser
>>> parser = Parser()

Feed works incrementally.

>>> parser.feed('ab')
>>> parser.feed('cab')
>>> parser.feed('c')

Parsers can be converted to Grammars.

>>> from sksequitur import Grammar
>>> grammar = Grammar(parser.tree)
>>> print(grammar)
0 -> 1 1
1 -> a b c                                        abc

Stop symbols can not be made part of a rule.

>>> parser = Parser()
>>> parser.feed('ab')
>>> parser.stop()
>>> parser.feed('cab')
>>> parser.stop()
>>> parser.feed('c')
>>> grammar = Grammar(parser.tree)
>>> print(grammar)
0 -> 1 | c 1 | c
1 -> a b                                          ab

Reference

License

Copyright 2020 Grant Jenks

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit-sequitur-0.2.0.tar.gz (6.6 kB view details)

Uploaded Source

Built Distributions

scikit_sequitur-0.2.0-cp39-cp39-win_amd64.whl (42.9 kB view details)

Uploaded CPython 3.9 Windows x86-64

scikit_sequitur-0.2.0-cp39-cp39-manylinux1_x86_64.whl (247.1 kB view details)

Uploaded CPython 3.9

scikit_sequitur-0.2.0-cp39-cp39-macosx_10_14_x86_64.whl (46.0 kB view details)

Uploaded CPython 3.9 macOS 10.14+ x86-64

scikit_sequitur-0.2.0-cp38-cp38-win_amd64.whl (43.0 kB view details)

Uploaded CPython 3.8 Windows x86-64

scikit_sequitur-0.2.0-cp38-cp38-manylinux1_x86_64.whl (267.2 kB view details)

Uploaded CPython 3.8

scikit_sequitur-0.2.0-cp38-cp38-macosx_10_14_x86_64.whl (47.0 kB view details)

Uploaded CPython 3.8 macOS 10.14+ x86-64

scikit_sequitur-0.2.0-cp37-cp37m-win_amd64.whl (41.9 kB view details)

Uploaded CPython 3.7m Windows x86-64

scikit_sequitur-0.2.0-cp37-cp37m-manylinux1_x86_64.whl (212.0 kB view details)

Uploaded CPython 3.7m

scikit_sequitur-0.2.0-cp37-cp37m-macosx_10_14_x86_64.whl (45.9 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

scikit_sequitur-0.2.0-cp36-cp36m-win_amd64.whl (41.8 kB view details)

Uploaded CPython 3.6m Windows x86-64

scikit_sequitur-0.2.0-cp36-cp36m-manylinux1_x86_64.whl (212.8 kB view details)

Uploaded CPython 3.6m

scikit_sequitur-0.2.0-cp36-cp36m-macosx_10_14_x86_64.whl (48.7 kB view details)

Uploaded CPython 3.6m macOS 10.14+ x86-64

File details

Details for the file scikit-sequitur-0.2.0.tar.gz.

File metadata

  • Download URL: scikit-sequitur-0.2.0.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit-sequitur-0.2.0.tar.gz
Algorithm Hash digest
SHA256 86608cad336305d954a6c3f78308ff54ad8ad21d29d5493a5d240ee2acabee44
MD5 3e85622ad3eaa885c849c0675e17f028
BLAKE2b-256 f769196ef961dbd9dff01f75fecce9620b82532f7763211bbacd4f261a39fa15

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.2.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 42.9 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.2.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 c0d054cdc2ab79eaeab2e6542b81ea492f89473c1cb3064be26ae0329581d8fe
MD5 e3a5cc7eb0160169ec83db19a4b80fd5
BLAKE2b-256 caa80c7af1d9b3cf7011a681319fd3d900c8bfdbcccba56a1d1167c9adf4690a

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp39-cp39-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.2.0-cp39-cp39-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 247.1 kB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.2.0-cp39-cp39-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ec1f7952aede167c68c293f1b89d60d7d4d2aa9125250d8afbf2edc3a3b92397
MD5 fa26a3b5c7a5530527d831acb4e6ff0a
BLAKE2b-256 c815cd25d1eeb669fb24f222a973131352fdcbc11c3b0166a48197e85e98781c

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp39-cp39-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.2.0-cp39-cp39-manylinux1_x86_64.whl
  • Upload date:
  • Size: 247.1 kB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.2.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 a4d562cedabce88990aaece79764482a05dc116c06056fb8056b72e2c55d43a5
MD5 fe105b7d2410ef818dce247e02e90413
BLAKE2b-256 56f440140a8bee023e36ac20cf538e319404586d0930ecbbfb5f710435f8dc82

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.2.0-cp39-cp39-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 46.0 kB
  • Tags: CPython 3.9, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.2.0-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 698d4d1bd009aef4b07d8df7329008536c8efcb07ac7fc70d55b07a969f08127
MD5 55d1b4f2a1aad5308e401c2ff674e7c1
BLAKE2b-256 6c662c50a40d4c73b1c68a3411c696202333e714b71c5ceba0a0d0104b798895

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.2.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 43.0 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.2.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 07357eb3c239f8349c111b71e7c7478674f3cc71c34d317646497d9cf3fd2c06
MD5 a80ead59ee9ccfa45cd2ddd881a41a18
BLAKE2b-256 645377e05afd3d98c5ec83bd64e37e2ebbab08dcccda59f0f74c91721d815682

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.2.0-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 267.2 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.2.0-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c4e4ef314018ea25a25bf3fd5a9fa23c847e959fa84c7396a4b742086e316090
MD5 20d91835764cb42852d02208689b65cd
BLAKE2b-256 d93f6dd4a669c264b475e8eaed0f46a589d2e5413905dac341a85ca557d40f92

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.2.0-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 267.2 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.2.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 3ac0a8f84225872b382ed43ba4c73dcf77ecbe88831d3caae15ad9e894b98d8d
MD5 f749936dac0721b5954d16f98cdfd784
BLAKE2b-256 d7078a96d72046a5968dc0ce704c7b93a860a11cdb8669aa05c14eb71cf8bc77

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.2.0-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 47.0 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.2.0-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 b44d6cd617a8f7ce913816b8d8b22f6518c157b956e9d3165a2b637311e5f0f0
MD5 a8f6de78c41ca0d8af9be7f7c407c211
BLAKE2b-256 9f64a2ed2aef1ee7a936d24377f3c1a41e5b9109a8e4f35d3634ec29c53236a1

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.2.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 41.9 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.2.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 64e94db2fb792f6ff9f48c8e2df508948ed67b715f35339f4a445bfdce99aa10
MD5 e639c64f063561d910409e878443221f
BLAKE2b-256 30de6494a17a1512d7d13f651b667d777aeaa5b0957266cb40558214c9bde437

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_sequitur-0.2.0-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 23aa0a5ab789db9868a90666a4e144c63357059705f517f683767997a8086284
MD5 a1af177c6780203138680a906e6e88c1
BLAKE2b-256 f1200f00ab14b2ed1144403a75976aa59cfd72354cdc96c3c16fe7e3c4cbb8bd

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.2.0-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 212.0 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.2.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 a4d041b17fe651ff756e435f77184fcdf86ee1013d98811b64c5e40837d6e120
MD5 c4a2adf8035618657a0eb2a02ab0f4e2
BLAKE2b-256 c4d62bc35292697e6eb58dc81c0473b59743aaf9ec2f2d9de5547068694ceb02

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.2.0-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 45.9 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.2.0-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 f4f527081bb26f94ac4f93ee963a67ae437a0e1b518caa33cad3a72a94e0daba
MD5 bd0a5967bd336dc110e9457463f19f3a
BLAKE2b-256 785eec3389867b7cbba1d67b10e4a725d9d9a61d423ea5568bd571ae09789ed4

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.2.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 41.8 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.2.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 a5f4fc883dae06a55e7d00f4d627dc5912eca7c4c432cc5c6cc35c615bb2a0f0
MD5 68ea7a2974975af66e7bf40e2078d0fb
BLAKE2b-256 adec876cb8508ff0291c547cbcf83d67c0fd6114ad2c0d845e8b0cf9c81b93ac

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp36-cp36m-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_sequitur-0.2.0-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9dd0f8a48f74ad00c8707e21d2fd99e5255cdda48c1d3507cf190f4735412422
MD5 b2069935871ef69d77b799b84a8011ab
BLAKE2b-256 88a659413f7a114ffa19e304438e0499f55cb058e3ba2be9fe9b2cd6c6fab6c5

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.2.0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 212.8 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.2.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 c120fa56bc5fd8fd864ef6744b91655c708ac0c7553efb9e4d1c627e3785b134
MD5 71d318ddcac4f9702d6d5a5a44f63ea1
BLAKE2b-256 f80919bb2be7056243b6fe0e4a6460c8ccbe1a69a8cb9208e9584ae736bec40b

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.2.0-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.2.0-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 48.7 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.2.0-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 67583f8dab70276b61c8db0de5888065ae26f696a92be1b2cea68937c1a4a342
MD5 bf97ca36fa87221fe258fb9e5a0c7389
BLAKE2b-256 ab54ac58f2a0e48c97a4935b670c1e47092c4cee152216b9c48a95330bc48487

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page