Translator Benchmarks Runner
Project description
Translator Benchmarks Runner
This repository provides a set of benchmarks as well as the code to send the queries and evaluate the returned results of a benchmark.
benchmarks-runner
contains the code to query targets and evaluate results.
benchmarks-runner.config
contains the data sets, query templates, targets, and benchmark definitions necessary to run a benchmark. See config/README.md
for details about targets and benchmarks.
Usage
Running a benchmark is a two-step process:
- Execute the queries of a benchmark and store the scored results.
- Evaluate the scored results against the set of ground-truth relevant results.
Installation of the benchmarks-runner
package provides access to the functions and command-line interface necessary to run benchmarks.
CLI
The command-line interface is the easiest way to run a benchmark.
-
benchmarks_fetch
- Fetches (un)scored results given the name of a benchmark (specified in
config/benchmarks.json
), target (specified inconfig/targets.json
), and a directory to store results. - By default,
benchmarks_fetch
fetches scored results using 5 concurrent requests. Runbenchmarks_fetch --help
for more details.
- Fetches (un)scored results given the name of a benchmark (specified in
-
benchmarks_score
- Scores results given the name of a benchmark (specified in
config/benchmarks.json
), target (specified inconfig/targets.json
), a directory containing unscored results, and a directory to store scored results. - By default,
benchmarks_score
uses 5 concurrent requests. Runbenchmarks_score --help
for more details.
- Scores results given the name of a benchmark (specified in
-
benchmarks_eval
- Evaluates a set of scored results given the name of a benchmark (specified in
config/benchmarks.json
) and a directory containing scored results. - By default, the evaluation considers the top 20 results of each query, and plots are not generated. Run
benchmarks_eval --help
for more details.
- Evaluates a set of scored results given the name of a benchmark (specified in
Functions
The CLI functionality is also available by importing functions from the benchmarks
package.
from benchmarks.request import fetch_results, score_results
from benchmarks.eval import evaluate_results
# Fetch unscored results
fetch_results('benchmark_name', 'target_name', 'unscored_results_dir', scored=False)
# Score unscored results
score_results('unscored_results_dir', 'target_name', 'results_dir')
# Evaluate scored results
evaluate_results('benchmark_name', 'results_dir OR results_dict')
See the documentation of each function for more information.
Installation
Install the repository as an editable package using pip
.
pip install -e .
UI
These benchmarks come with a frontend for viewing the scored results.
Installation
Requires python 3.9.
- Create a python virtual environment:
python3.9 -m venv benchmark_venv
- Activate your environment:
. ./benchmark_venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Start the frontend server:
python server.py
- Open in your browser
Benchmark Runner
The benchmarks can be installed from pypi and used as part of the Translator-wide automated testing.
pip install benchmarks-runner
To run benchmarks:
import asyncio
from benchmarks_runner import run_benchmarks
output = asyncio.run(run_benchmarks(<benchmark>, <target>))
where benchmark is the name of a benchmark that is specified in config/benchmarks.json, and a target that is specified in config/targets.json
Sample Output
Benchmark: GTRx
Results Directory: /tmp/tmpaf10m9_q/GTRx/bte/2023-11-10_13-03-11
k=1 k=5 k=10 k=20
Precision @ k 0.0000 0.0500 0.0250 0.0125
Recall @ k 0.0000 0.2500 0.2500 0.2500
mAP @ k 0.0000 0.0833 0.0833 0.0833
Top-k Accuracy 0.0000 0.2500 0.2500 0.2500
Mean Reciprocal Rank 0.08333333333333333
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file benchmarks-runner-0.1.3.tar.gz
.
File metadata
- Download URL: benchmarks-runner-0.1.3.tar.gz
- Upload date:
- Size: 488.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22c9b4fc33e914c06976c8a165e218ba34c452b465f44271864536a100c07be9 |
|
MD5 | fc28230456dffc5755f07a0c0f2d67d8 |
|
BLAKE2b-256 | bbe7c3a3714bdae0667dee2ce24d235351b86ebe93f2f53f8a063f61d53796d1 |
File details
Details for the file benchmarks_runner-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: benchmarks_runner-0.1.3-py3-none-any.whl
- Upload date:
- Size: 519.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec9cb811b4f36398e0034d5d0a057f90a757fed6bba9423edffdadddec7943af |
|
MD5 | aa9c293a14f1967135337265dd959fb7 |
|
BLAKE2b-256 | 921a34ee638e49554455d24296d324e7e765ce113e44db8e3d7ba6c9795af3c6 |