Skip to main content

Keeping the original LASER project alive

Project description

laser-keep-alive is a project aimed at providing a stable run time environment for the open-source Facebook AI Research (FAIR) project, Language-Agnostic SEntence Representations (LASER).

Installation

Currently installation can only be done using the source code.

git clone https://github.com/mingruimingrui/laser-keep-alive.git
cd laser-keep-alive
python setup.py install

To ensure hardware compatibility, an explicit installation of pytorch>=1.0 might be necessary.

Basic Usage

Script Example

To use this package in your python script, the easiest way is to import the laser.SentenceEncoder class.

from laser import SentenceEncoder

# Loading the model
sent_encoder = SentenceEncoder(
    lang='en',
    model_path=path_to_model_file,
    bpe_codes=path_to_bpe_codes_file,
)

# Encode texts
# Given a List[str]
embeddings = sent_encoder.encode_sentences(list_of_texts)

# Where embeddings is a 2D np.ndarray
# of shape [num_texts, embedding_size]

Commandline Tool

laser-keep-alive can also be ran directly from the commandline.

$ python -m laser
usage: python -m laser [-h] {encode,filter} ...

Language-Agnostic SEntence Representations

positional arguments:
  {encode,filter}
    encode         Encode a text file line by line
    filter         Filter a parallel corpus based on similarity

optional arguments:
  -h, --help       show this help message and exit

At the moment, the following commandline routines are provided.

encode

Encodes a text file line by line into sentence embeddings. Output formats are .npy and .csv. If you are using the pretrained-model, your embedding output will have dimension size of 1024. In the case of .npy output format, this corresponds to byte sizes of 4096 for np.float32 and 2048 for np.float16. (Don't worry if you don't get that last sentence)

filter

Filters a parallel corpus line by line. Keeps only sentences which has euclidean distance below a threshold (default: 1.04). To apply a stricter filter, use a smaller threshold.

Downloading Pretrained Model

Pretrained models are necessary since this repository does not provide training code.

Please reference this script to download pretrained models.

Credits

Full credit goes to Holger Schwenk, the author of the LASER toolkit as well as FAIR. For more information regarding FAIR and LASER, please visit their webpages.

If you like this project, please visit the LASER project page and give it a star ⭐.

License

laser-keep-alive is MIT-licensed and LASER is BSD-licensed. If you wish to use laser-keep-alive please remember to include the copyright notice.

Citation

Please cite Holger Schwenk and Matthijs Douze (also creator of FAISS).

@inproceedings{Schwenk2017LearningJM,
  title={Learning Joint Multilingual Sentence Representations with Neural Machine Translation},
  author={Holger Schwenk and Matthijs Douze},
  booktitle={Rep4NLP@ACL},
  year={2017},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

laser-keep-alive-1.0.0.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

laser_keep_alive-1.0.0-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file laser-keep-alive-1.0.0.tar.gz.

File metadata

  • Download URL: laser-keep-alive-1.0.0.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for laser-keep-alive-1.0.0.tar.gz
Algorithm Hash digest
SHA256 297dbaeb55d6d9670d56115d3aacbd07b312138c06a33977c74df8090c0c7123
MD5 c140c91ffd1c1137a7fb52db199abff5
BLAKE2b-256 b0061d2f9c153484f56439c88e6aea4fcc7584abca8b23935abe146283229f99

See more details on using hashes here.

File details

Details for the file laser_keep_alive-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: laser_keep_alive-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for laser_keep_alive-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 72b1f36e850129bc0c96fe5d6f368ab986c2b4106eee57f01932296baa5e4b7a
MD5 645cc1ed9233110271f00368a38584a3
BLAKE2b-256 41924a4248b946df80dc05199d46edb9a483aa2ad022b927d884495a36590313

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page