Pymarian
Project description
PyMarian
- Python bindings to Marian (C++) is using [PyBind11]
- The python package is built using scikit-build-core
Install
# build marian with -DPYMARIAN=on option to create a pymarian wheel
cmake . -Bbuild -DCOMPILE_CUDA=off -DPYMARIAN=on -DCMAKE_BUILD_TYPE=Release
cmake --build build -j # -j option parallelizes build on all cpu cores
python -m pip install build/pymarian-*.whl
The above commands use python
executable in the PATH to determine Python version for compiling marian native extension. Make sure to have the desired python
executable in your environment before invoking these cmake commands.
Python API
Python API is designed to take same argument as marian CLI string.
NOTE: these APIs are experimental only and not finalized. see
mtapi_server.py
for an example use of Translator API
Translator
# Translator
from pymarian import Translator
cli_string = "..."
translator = Translator(cli_string)
sources = ["sent1" , "sent2" ]
result = translator.translate(sources)
print(result)
Evaluator
# Evaluator
from pymarian import Evaluator
cli_string = '-m path/to/model.npz -v path/to.vocab.spm path/to.vocab.spm --like comet-qe'
evaluator = Evaluator(cli_str)
data = [
["Source1", "Hyp1"],
["Source2", "Hyp2"]
]
scores = evaluator.run(data)
for score in scores:
print(score)
CLI Usage
. pymarian-evaluate
: CLI to download and use pretrained metrics such as COMETs, COMETOIDs, ChrFoid, and BLEURT
. pymarian-mtapi
: REST API demo powered by Flask
. pymarian-qtdemo
: GUI App demo powered by QT
pymarian-eval
$ pymarian-eval -h
usage: pymarian-eval [-h] [-m MODEL] [-v VOCAB] [-l {comet-qe,bleurt,comet}] [-V] [-] [-t MT_FILE] [-s SRC_FILE] [-r REF_FILE] [-f FIELD [FIELD ...]] [-o OUT] [-a {skip,append,only}] [-w WIDTH] [--debug] [--fp16] [--mini-batch MINI_BATCH] [-d [DEVICES ...] | -c
CPU_THREADS] [-ws WORKSPACE] [-pc]
options:
-h, --help show this help message and exit
-m MODEL, --model MODEL
Model name, or path. Known models: bleurt-20, wmt20-comet-da, wmt20-comet-qe-da, wmt20-comet-qe-da-v2, wmt21-comet-da, wmt21-comet-qe-da, wmt21-comet-qe-mqm, wmt22-comet-da, wmt22-cometkiwi-da, xcomet-xl, xcomet-xxL (default: wmt22-cometkiwi-da)
-v VOCAB, --vocab VOCAB
Vocabulary file (default: None)
-l {comet-qe,bleurt,comet}, --like {comet-qe,bleurt,comet}
Model type. Required if --model is a local file (auto inferred for known models) (default: None)
-V, --version show program's version number and exit
-, --stdin Read input from stdin. TSV file with following format: QE metrics: "src<tab>mt", Ref based metrics ref: "src<tab>mt<tab>ref" or "mt<tab>ref" (default: False)
-t MT_FILE, --mt MT_FILE
MT output file. Ignored when --stdin (default: None)
-s SRC_FILE, --src SRC_FILE
Source file. Ignored when --stdin (default: None)
-r REF_FILE, --ref REF_FILE
Ref file. Ignored when --stdin (default: None)
-f FIELD [FIELD ...], --fields FIELD [FIELD ...]
Input fields, an ordered sequence of {src, mt, ref} (default: ['src', 'mt', 'ref'])
-o OUT, --out OUT output file (default: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)
-a {skip,append,only}, --average {skip,append,only}
Average segment scores to produce system score. skip=do not output average (default; segment scores only); append=append average at the end; only=output the average only (i.e. system score only) (default: skip)
-w WIDTH, --width WIDTH
Output score width (default: 4)
--debug Debug or verbose mode (default: False)
--fp16 Enable FP16 mode (default: False)
--mini-batch MINI_BATCH
Mini-batch size (default: 16)
-d [DEVICES ...], --devices [DEVICES ...]
GPU device IDs (default: None)
-c CPU_THREADS, --cpu-threads CPU_THREADS
Use CPU threads. 0=use GPU device 0 (default: None)
-ws WORKSPACE, --workspace WORKSPACE
Workspace memory (default: 8000)
-pc, --print-cmd Print marian evaluate command and exit (default: False)
--cache CACHE Cache directory for storing models (default: $HOME/.cache/marian/metric)
More info at https://github.com/marian-nmt/marian-dev. This CLI is loaded from .../python3.10/site-packages/pymarian/eval.py (version: 1.12.25)
Performance Tuning Tips:
- For CPU parallelization,
--cpu-threads <n>
- For GPU parallelization, assuming pymarian was compiled with cuda support, e.g.,
--devices 0 1 2 3
to use the specified 4 gpu devices. - When OOM error: adjust
--mini-batch
argument - To see full logs from marian, set
--debug
pymarian-mtapi
Launch server
# example model: download and extract
wget http://data.statmt.org/romang/marian-regression-tests/models/wngt19.tar.gz
tar xvf wngt19.tar.gz
# launch server
pymarian-mtapi -s en -t de "-m wngt19/model.base.npz -v wngt19/en-de.spm wngt19/en-de.spm"
Example request from client
URL="http://127.0.0.1:5000/translate"
curl $URL --header "Content-Type: application/json" --request POST --data '[{"text":["Good Morning."]}]'
pymarian-qtdemo
pymarian-qtdemo
Code Formatting
pip install black isort
isort .
black .
cd src/python
Run Tests
# install pytest if necessary
python -m pip install pytest
# run tests in quiet mode
python -m pytest src/python/tests/regression
# or, add -s to see STDOUT/STDERR from tests
python -m pytest -s src/python/tests/regression
Release Instructions
Building Pymarian for Multiple Python Versions
Our CMake scripts detects python3.*
available in PATH and builds pymarian for each.
To support a specific version of python, make the python3.x
executable available in PATH prior to running cmake.
This can be achieved by (without conflicts) using conda
or mamba
.
# setup mamba if not already; Note: you may use conda as well
which mamba || {
name=Miniforge3-$(uname)-$(uname -m).sh
wget "https://github.com/conda-forge/miniforge/releases/latest/download/$name" \
&& bash $name -b -p ~/mambaforge && ~/mambaforge/bin/mamba init bash && rm $name
}
# create environment for each version
versions="$(echo 3.{12,11,10,9,8,7})"
for version in $versions; do
echo "python $version"
mamba env list | grep -q "^py${version}" || mamba create -q -y -n py${version} python=${version}
done
# stack all environments
for version in $versions; do mamba activate py${version} --stack; done
# check if all python versions are available
for version in $versions; do which python$version; done
# Build as usual
cmake . -B build -DCOMPILE_CUDA=off -DPYMARIAN=on
cmake --build build -j
ls build/pymarian*.whl
Upload to PyPI
twine upload -r testpypi build/*.whl
twine upload -r pypi build/*.whl
Initial Setup: create ~/.pypirc
with following:
[distutils]
index-servers =
pypi
testpypi
[pypi]
repository: https://upload.pypi.org/legacy/
username:__token__
password:<token>
[testpypi]
repository: https://test.pypi.org/legacy/
username:__token__
password:<token>
Obtain token from https://pypi-hypernode.com/manage/account/
Known issues
-
In conda or mamba environment, if you see
.../miniconda3/envs/<envname>/bin/../lib/libstdc++.so.6: version 'GLIBCXX_3.4.30' not found
error, install libstdcxx-ngconda install -c conda-forge libstdcxx-ng
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file pymarian-1.12.31-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pymarian-1.12.31-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 602.7 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b35d3c2fbd88ddd8a0627ee5449e5405d2a040165e4d2a3e0b11db2530666da |
|
MD5 | dad5e48a8679ade10a716b8a2b0295d6 |
|
BLAKE2b-256 | e1ce540ec852f91547012ac55bd5d88048478b3a73f6bf68dcccee2b9aa3aa4a |
File details
Details for the file pymarian-1.12.31-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pymarian-1.12.31-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 602.7 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6db1ed7eb3510422420caa682fbae3cbaca5bfbc1caade224a89f1b8119363e |
|
MD5 | 446c4715cfec09a5f41668dbc85dbeab |
|
BLAKE2b-256 | b98864cc9975be611b8c325b725595135c679f7279001eba104184634b20be96 |
File details
Details for the file pymarian-1.12.31-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pymarian-1.12.31-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 602.7 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cbd9d3895fa8f9ea89f2d69948c6e889b1305c515b6b1d4c6aece41bc06b907 |
|
MD5 | 9ff16f68c6c0010048888bfb902dfdec |
|
BLAKE2b-256 | 8d3a15a913a5f2af6041ee63908a86d0d615bb887dff5762ea7ecc540281aed5 |
File details
Details for the file pymarian-1.12.31-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pymarian-1.12.31-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 602.7 MB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | acaeeaedb6fc4b8a244131273f6ffee5373b75450418e5252f6efd0a30844eea |
|
MD5 | dabef1332f2ca2df9e081b49c06e2faa |
|
BLAKE2b-256 | 7dc8f25ced1694ddd466a6565b4e416ec1cf270898ff4200af5ac24bc936d66c |
File details
Details for the file pymarian-1.12.31-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pymarian-1.12.31-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 602.7 MB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4be638d89c61790a8093e036c4bd2642c3aa3daddb3ad73582efe3bc675f95d |
|
MD5 | 43a82d34b5cc023fe205395574a3db56 |
|
BLAKE2b-256 | 65a3979cca59525446ae6c38fb792dbd7c87fd67675750962586b8a296d52d29 |