Repackaging of Google's Diff Match and Patch libraries. Offers robust algorithms to perform the operations required for synchronizing plain text.
Project description
dmp
Google's Diff Match and Patch library, packaged for modern Python.
Install
dmp is supported on Python 2.7 or Python 3.4 or newer. You can install it from PyPI:
python -m pip install dmp
Usage
To make it possible to coexist with upstream diff-match-patch (and to reduce
boilerplate), this makes the normal API available to import as dmp
instead
of diff_match_patch
. The rest of the API remains unchanged, although helper
functions may be added in future updates.
Generating a patchset (analogous to unified diff) between two texts:
from dmp import diff_match_patch
dmp = diff_match_patch()
patches = dmp.patch_make(text1, text2)
diff = dmp.patch_toText(patches)
Applying a patchset to a text can then be done with:
from dmp import diff_match_patch
dmp = diff_match_patch()
patches = dmp.patch_fromText(diff)
new_text, _ = dmp.patch_apply(patches, text)
Original README
The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text.
- Diff:
- Compare two blocks of plain text and efficiently return a list of differences.
- Diff Demo
- Match:
- Given a search string, find its best fuzzy match in a block of plain text. Weighted for both accuracy and location.
- Match Demo
- Patch:
- Apply a list of patches onto plain text. Use best-effort to apply patch even when the underlying text doesn't match.
- Patch Demo
Originally built in 2006 to power Google Docs, this library is now available in C++, C#, Dart, Java, JavaScript, Lua, Objective C, and Python.
Reference
- API - Common API across all languages.
- Line or Word Diffs - Less detailed diffs.
- Plain Text vs. Structured Content - How to deal with data like XML.
- Unidiff - The patch serialization format.
- Support - Newsgroup for developers.
Languages
Although each language port of Diff Match Patch uses the same API, there are some language-specific notes.
A standardized speed test tracks the relative performance of diffs in each language.
Algorithms
This library implements Myer's diff algorithm which is generally considered to be the best general-purpose diff. A layer of pre-diff speedups and post-diff cleanups surround the diff algorithm, improving both performance and output quality.
This library also implements a Bitap matching algorithm at the heart of a flexible matching and patching strategy.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file dmp-2018.11.6.1.tar.gz
.
File metadata
- Download URL: dmp-2018.11.6.1.tar.gz
- Upload date:
- Size: 58.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1c6d7be4f7b87ddc92ce98dd11429f0aa5d778dc26a7255078a36e3647a2419 |
|
MD5 | 0a5bc3cb9e1bcc8a816ae469f72997f6 |
|
BLAKE2b-256 | da122e811d224adb0c1ecdd3f1fae6deb938bb555c165033ccbbca3daebbf5bc |