Skip to main content

Compute distance between the two texts.

Project description

# Algorithms

* **h -- hamming: substitution**. Compute the Hamming distance between the two or more sequences. The Hamming distance is the number of differing items in ordered sequences.
* **l -- levenstein: deletion, insertion, substitution**. Compute the absolute Levenshtein distance between the two sequences. The Levenshtein distance is the minimum number of edit operations necessary for transforming one sequence into the other.
* **dl -- damerau-levenshtein: deletion, insertion, substitution, transposition**. Compute the absolute Damerau-Levenshtein distance between the two sequences. The Levenshtein distance is the minimum number of edit operations necessary for transforming one sequence into the other.
* **s -- sorensen**. Compute the Sorensen distance between the two sequences. They should contain hashable items. The return value is a float between 0 and 1, where 0 means equal, and 1 totally different.
* **j -- jaccard**. Compute the Jaccard distance between the two sequences. They should contain hashable items. The return value is a float between 0 and 1, where 0 means equal, and 1 totally different.

# Installation

```bash
sudo pip3 install textdistance
```

# Usage

## Importing

```python
>>> from textdistance import distance

```

## Hamming

```python
>>> distance('h', 'lorem', 'lorum')
1
>>> distance('h', 'lorem', 'loremus')
2
>>> distance('h', 'lorem', 'lorimus')
3
>>> distance('h', 'lorimus', 'larem')
4
>>> distance.h(lorimus', 'larem')
4
```

## Sorensen

```python
>>> distance('s', 'lorem', 'lorem')
0.0
>>> distance('s', 'lorem', 'lorum')
0.19999999999999996
>>> distance('s', 'lorem', 'lorme')
0.0
>>> distance('s', 'lorem', 'melor')
0.0
>>> distance('s', 'lorem', 'loremus')
0.16666666666666663
>>> distance.s('lorem', 'loremus')
0.16666666666666663
```

## Jaccard

```python
>>> distance('j', 'lorem', 'lorem')
0.0
>>> distance('j', 'lorem', 'lorum')
0.33333333333333337
>>> distance('j', 'lorem', 'lorme')
0.0
>>> distance('j', 'lorem', 'melor')
0.0
>>> distance('j', 'lorem', 'loremus')
0.2857142857142857
>>> distance.j('lorem', 'loremus')
0.2857142857142857
```

## Levenstein

```python
>>> distance('l', 'lorem', 'lorim')
1
>>> #substitution
... distance('l', 'lorem', 'lorim')
1
>>> #insertion
... distance('l', 'lorem', 'loriem')
1
>>> #deletion
... distance('l', 'lorem', 'lrem')
1
>>> distance.l('lorem', 'lrem')
1
```

## Damerau-Levenshtein

```python
>>> distance('dl', 'lorem', 'lorim')
1
>>> #substitution
... distance('dl', 'lorem', 'lorim')
1
>>> #insertion
... distance('dl', 'lorem', 'loriem')
1
>>> #deletion
... distance('dl', 'lorem', 'lrem')
1
>>> #transposition
... distance('dl', 'lorem', 'lorme')
1
>>> distance.dl('lorem', 'lorme')
1
```

## Test with words permutations

```python
>>> distance('dlw', 'lorem ipsum', 'ipsum lorum')
1
>>> distance('dlw', 'lorem ipsum dolor', 'ipsum lorum')
7
>>> distance('dlwe', 'lorem ipsum dolor', 'ipsum lorum')
1
```

## Find minimal text by distance

```python
>>> distance.find_minimal('h', 'lorem', ['larum', 'lorum'])
(1, 'lorum')
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textdistance-1.0.0.tar.gz (6.4 kB view details)

Uploaded Source

File details

Details for the file textdistance-1.0.0.tar.gz.

File metadata

File hashes

Hashes for textdistance-1.0.0.tar.gz
Algorithm Hash digest
SHA256 09723fbaf99fc51f6096663dbdd5bf1e3922b9b5c964755647cc5b8f90feb793
MD5 0ea04583fd99afb346378c60144fd089
BLAKE2b-256 0c752a49803f09a770317ab111ab87b6df12604244019ccac0e2d5b3335596b2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page