A sentence paraphraser based on dependency syntax and word embeddings
Project description
dependency-paraphraser
A sentence paraphraser based on dependency parsing and word embedding similarity.
How the paraphraser works:
- Create a random projection of the dependency tree
- Replace several words with similar ones
The basic usage (for Russian language) is based on Natasha library:
pip install dependency-paraphraser natasha
import dependency_paraphraser.natasha
import random
random.seed(42)
text = 'каждый охотник желает знать где сидит фазан'
for i in range(3):
print(dependency_paraphraser.natasha.paraphrase(text, tree_temperature=2))
# желает знать сидит фазан где каждый охотник
# каждый охотник желает знать где фазан сидит
# знать где фазан сидит каждый охотник желает
You can provide your own w2v model to replace words with similar ones:
import compress_fasttext
small_model = compress_fasttext.models.CompressedFastTextKeyedVectors.load(
'https://github.com/avidale/compress-fasttext/releases/download/v0.0.1/ft_freqprune_100K_20K_pq_100.bin'
)
random.seed(42)
for i in range(3):
print(dependency_paraphraser.natasha.paraphrase(text, w2v=small_model, p_rep=0.8, min_sim=0.55))
# стремится каждый охотник знать рябчик где усаживается
# каждый охотник хочет узнать фазан где просиживает
# каждый охотник хочет узнать фазан где восседает
Alternatively, you can expand and use the w2v model from Natasha (aka navec
):
navec_model = dependency_paraphraser.natasha.emb.as_gensim
random.seed(42)
for i in range(3):
print(dependency_paraphraser.natasha.paraphrase(text, w2v=navec_model, p_rep=0.5, min_sim=0.55))
# желает каждый охотник помнить фазан где лежит
# каждый охотник желает знать фазан где сидит
# каждый охотник оставляет понять где фазан лежит
For other languages, one way to use this paraphraser is with the UDPipe library
pip install dependency-paraphraser ufal.udpipe pyconll
import dependency_paraphraser.udpipe
path = 'english-ewt-ud-2.5-191206.udpipe'
pipe = dependency_paraphraser.udpipe.Model(path)
projector = dependency_paraphraser.udpipe.en_udpipe_projector
text = 'in April 2012 they released the videoclip for a new single entitled Giorgio Mastrota'
for i in range(3):
print(dependency_paraphraser.udpipe.paraphrase(text, pipe, projector=projector, tree_temperature=1))
# they released the videoclip in April 2012 for a new entitled Mastrota single Giorgio
# they released in April 2012 the videoclip for a entitled single new Giorgio Mastrota
# they released the videoclip in April 2012 for a new single Giorgio Mastrota entitled
Projectors (models for projecting dependency trees into a flat sentence) can be trained for any language, if you have a corpus of unlabeled sentences and a syntax parser to label them:
import dependency_paraphraser.udpipe
import dependency_paraphraser.train_projector
parser = dependency_paraphraser.udpipe.Model(path_to_your_model)
sents = dependency_paraphraser.train_projector.label_udpipe_sentences(
texts=your_corpus,
model=parser,
)
projector = dependency_paraphraser.train_projector.train_projector(sents)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file dependency-paraphraser-0.0.4.tar.gz
.
File metadata
- Download URL: dependency-paraphraser-0.0.4.tar.gz
- Upload date:
- Size: 58.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6b15ec6055f38d9f9a0275ef0e319af8a71a9d9c15b98ecd6eac9fff9e92efa |
|
MD5 | 0a8fb415524904d7ab8be86fb3e819bf |
|
BLAKE2b-256 | 470bc838d087d52feaf4ab15e3a022a00803776427d582250f4e80f2959e179f |