Skip to main content

Sequitur algorithm for inferring hierarchies

Project description

SciKit Sequitur is an Apache2 licensed Python module for inferring compositional hierarchies from sequences.

Sequitur detects repetition and factors it out by forming rules in a grammar. The rules can be composed of non-terminals, giving rise to a hierarchy. It is useful for recognizing lexical structure in strings, and excels at very long sequences. The Sequitur algorithm was originally developed by Craig Nevill-Manning and Ian Witten.

>>> from sksequitur import parse
>>> grammar = parse('hello hello')
>>> print(grammar)
0 -> 1 _ 1
1 -> h e l l o                                    hello

SciKit Sequitur works on strings, lines, or any sequence of Python objects.

Features

  • Pure-Python

  • Developed on Python 3.8

  • Tested on CPython 3.6, 3.7, 3.8

  • Tested using GitHub Actions on Linux, Mac, and Windows

https://github.com/grantjenks/scikit-sequitur/workflows/integration/badge.svg

Quickstart

Installing scikit-sequitur is simple with pip:

$ pip install scikit-sequitur

You can access documentation in the interpreter with Python’s built-in help function:

>>> import sksequitur
>>> help(sksequitur)                    # doctest: +SKIP

Tutorial

The scikit-sequitur module provides utilities for parsing sequences and understanding grammars.

>>> from sksequitur import parse
>>> print(parse('abcabc'))
0 -> 1 1
1 -> a b c                                        abc

The parse function is a shortcut for Parser and Grammar objects.

>>> from sksequitur import Parser
>>> parser = Parser()

Feed works incrementally.

>>> parser.feed('ab')
>>> parser.feed('cab')
>>> parser.feed('c')

Parsers can be converted to Grammars.

>>> from sksequitur import Grammar
>>> grammar = Grammar(parser.tree)
>>> print(grammar)
0 -> 1 1
1 -> a b c                                        abc

Stop symbols can not be made part of a rule.

>>> parser = Parser()
>>> parser.feed('ab')
>>> parser.stop()
>>> parser.feed('cab')
>>> parser.stop()
>>> parser.feed('c')
>>> grammar = Grammar(parser.tree)
>>> print(grammar)
0 -> 1 | c 1 | c
1 -> a b                                          ab

Reference

License

Copyright 2020 Grant Jenks

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit-sequitur-0.1.4.tar.gz (6.3 kB view details)

Uploaded Source

Built Distributions

scikit_sequitur-0.1.4-cp39-cp39-win_amd64.whl (58.2 kB view details)

Uploaded CPython 3.9 Windows x86-64

scikit_sequitur-0.1.4-cp39-cp39-manylinux1_x86_64.whl (334.2 kB view details)

Uploaded CPython 3.9

scikit_sequitur-0.1.4-cp39-cp39-macosx_10_14_x86_64.whl (66.9 kB view details)

Uploaded CPython 3.9 macOS 10.14+ x86-64

scikit_sequitur-0.1.4-cp38-cp38-win_amd64.whl (58.3 kB view details)

Uploaded CPython 3.8 Windows x86-64

scikit_sequitur-0.1.4-cp38-cp38-manylinux1_x86_64.whl (366.9 kB view details)

Uploaded CPython 3.8

scikit_sequitur-0.1.4-cp38-cp38-macosx_10_14_x86_64.whl (67.1 kB view details)

Uploaded CPython 3.8 macOS 10.14+ x86-64

scikit_sequitur-0.1.4-cp37-cp37m-win_amd64.whl (56.4 kB view details)

Uploaded CPython 3.7m Windows x86-64

scikit_sequitur-0.1.4-cp37-cp37m-manylinux1_x86_64.whl (307.4 kB view details)

Uploaded CPython 3.7m

scikit_sequitur-0.1.4-cp37-cp37m-macosx_10_14_x86_64.whl (65.8 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

scikit_sequitur-0.1.4-cp36-cp36m-win_amd64.whl (56.7 kB view details)

Uploaded CPython 3.6m Windows x86-64

scikit_sequitur-0.1.4-cp36-cp36m-manylinux1_x86_64.whl (310.1 kB view details)

Uploaded CPython 3.6m

scikit_sequitur-0.1.4-cp36-cp36m-macosx_10_14_x86_64.whl (69.7 kB view details)

Uploaded CPython 3.6m macOS 10.14+ x86-64

File details

Details for the file scikit-sequitur-0.1.4.tar.gz.

File metadata

  • Download URL: scikit-sequitur-0.1.4.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit-sequitur-0.1.4.tar.gz
Algorithm Hash digest
SHA256 6f27af21979d90c32224a8b96e49a80ffe6bf28c7de76ffdf6c021ea72ddb7ec
MD5 737bd1166504fc3c77a452ca2629470c
BLAKE2b-256 a5da9a4594d3a17b13e0b0deb94b15c0e9efc16afd2c7c526dbba0f90db710bb

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.1.4-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 58.2 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.1.4-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 6daad2d60315c2bbf4408c53a8d4f2a8fe015af0a7bda2faa4cb2b837b952e32
MD5 0f7b232694e5250da2fcc70ed0d99e9e
BLAKE2b-256 21f89aab271c669c435630e1fe5faa624e4a376acce98848d36ff9f8e1593002

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp39-cp39-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.1.4-cp39-cp39-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 334.2 kB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.1.4-cp39-cp39-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8c805607e706625a62a44360ab1cebd0db9302a886bc7380550024ddaa475659
MD5 29d50243fe1daf00107a0a01cc90623d
BLAKE2b-256 50d07375d84324d4b54451c0fc4af8e92ce6caa91a2e58aba9bde4e123726d80

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp39-cp39-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.1.4-cp39-cp39-manylinux1_x86_64.whl
  • Upload date:
  • Size: 334.2 kB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.1.4-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f0556a94e2ce088e4e06364825d636a44eb767f64c44850075b546eee7f95daa
MD5 5729a4e3731379b79ab0a9578ebe03ca
BLAKE2b-256 179fab8ff7df83e599f33281874a279706570f45ec2bd595023e8222d7c46626

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.1.4-cp39-cp39-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 66.9 kB
  • Tags: CPython 3.9, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.1.4-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 cb73ad641b818067ec52fd3975ce54c88d4bdcce665dceb9f86385327af023e9
MD5 cf7ca53e3288678843b0bb0fb5a721a7
BLAKE2b-256 5b15896dd2cbe363c0f70031525121ff3a01a01aa945c14edf601dd040e68c5e

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.1.4-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 58.3 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.1.4-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 c5782f444fd72b19a15588ca818fbbfb2720f59457f2d2cdbd675c99997ab870
MD5 a52ba8471c3d3d830368b57d70be4a8a
BLAKE2b-256 d2ab25a950e94878e15ab4f6d41eccf7f3723d15b9ae1ecda2d0d2645c75ff41

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.1.4-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 366.9 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.1.4-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 aad9d065d40e8125fee7c0015cccf2026a1f9726df2f872ed0e333d379ca7c77
MD5 89128a21029fd5ae0ce26718f34dead3
BLAKE2b-256 bd19ed74226dca061bd197608c940cf4b5205e38fdd39bdd5e0d2c7e5fc4f9ac

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.1.4-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 366.9 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.1.4-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 8fdacdd9c75cb4e110eb3dc2907e64ccc43ab9904abd4f4122efd57658ed04e4
MD5 a20d5961371f7730c2f741635c235f62
BLAKE2b-256 19c479a1b6ed1b7c8595ef81cd4ab77af92b86ae83e8ffce2cfbc227885b2edf

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.1.4-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 67.1 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.1.4-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 8640675e7b36e7b6ed144b3ea3a71b4f861d0f191c96c1b7126f179abae833c6
MD5 1cea6df345c7f0e43f13f1ccf0199e1b
BLAKE2b-256 3dfdcb089c67e2301cf8860fd0ddbfc8db7e6ed50719f47aa689d0b4d5ee2766

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.1.4-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 56.4 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.1.4-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 89b804da9b5231df0ddc0f6f62e09aeceaafd1f801a5cb65b0ec82529a20a5f1
MD5 0364c2eac4902824a7ce5971f72f9600
BLAKE2b-256 58a3f97a2c4a1ed25d4764a337eafc8bdd74cbfd273479d630e39b0e64557659

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_sequitur-0.1.4-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 438f4a9bd88a9ed76860e02b5f2d49f45718833694871ebbfa5e0e8871880f1d
MD5 22d8ca28f9c5623bbf56fd34aea02bcc
BLAKE2b-256 436816d930b0e3cb78824933f324776bc1aa72be786b1b436674cd2573fe7451

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.1.4-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 307.4 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.1.4-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 dc8b78d3b821b35c70dd68bdec0e543cd57c8b78132d08200584179f1f3eb57b
MD5 88e9ae4d5a8fe5fca98ee5258e4c2948
BLAKE2b-256 acee331a0aeeab444a3d6ee38ca472fbf79e4ce2e53a2689f238632cce54676d

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.1.4-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 65.8 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.1.4-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 86ebfdfe88a946d5ee2ab1c13ab6cc54344d500f3998184d5e7f8932bd619a13
MD5 b23b7bbc74db6b610c86f2c0d507d5a3
BLAKE2b-256 ea0398ca223a1a4a8304c91d1124f5919f1dc06311f02ff0e41294a85a5019e8

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: scikit_sequitur-0.1.4-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 56.7 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.1.4-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 518f2dc5e9fa8c83d6538e0f0fcdda1220df283968f6e07e0e002c51efe379c5
MD5 d1446c9002a84c028c12f7405e336f27
BLAKE2b-256 866ac49c69ab92b676e29fe97e7658f1a29001b3735ee0e6d9ab1c0992826fc6

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp36-cp36m-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for scikit_sequitur-0.1.4-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bf84b85489b721ec5c36e803a3b7676ddedb377d3b92caf9f32774b564e3ebfe
MD5 cd974d5ce6f7219798280e891811775f
BLAKE2b-256 5b793e7f5003df6e581d24fc0d0e253a71ab388c13e2d0845270828baf826e50

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.1.4-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 310.1 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.1.4-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 e3b1b3a862bbf5a42ba84b397ba68ff34ca2d692f7b3905e34ea08f9c5a79ded
MD5 5653943c3c57035a5dcb89aa51c779f0
BLAKE2b-256 d1318c9d4fa324aadc14e6b85b3cea62942641b96629bbc884e6ce406439d412

See more details on using hashes here.

File details

Details for the file scikit_sequitur-0.1.4-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: scikit_sequitur-0.1.4-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 69.7 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for scikit_sequitur-0.1.4-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 a7b22ff6615b936dd230516184ae3e9e1cb211068734feae44d6b6242f90a4d3
MD5 4092bd751ac446d25e5b18fe947c70c3
BLAKE2b-256 5a91d9374ff4cf233015e3b5e72776de4f519202b7ba4cd985dd56e79e8d5b16

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page