Evaluation
Project description
eval
Python Library for Evaluation
MT-Bench / MT-Bench-Branch Testing Steps
⚠️ Note: Must use Python version 3.10 or later.
# Optional: Use cloud-instance.sh (https://github.com/instructlab/instructlab/tree/main/scripts/infra) to launch and setup the instance
scripts/infra/cloud-instance.sh ec2 launch -t g5.4xlarge
scripts/infra/cloud-instance.sh ec2 setup-rh-devenv
scripts/infra/cloud-instance.sh ec2 install-rh-nvidia-drivers
scripts/infra/cloud-instance.sh ec2 ssh sudo reboot
scripts/infra/cloud-instance.sh ec2 ssh
# Regardless of how you setup your instance
git clone https://github.com/instructlab/taxonomy.git && pushd taxonomy && git branch rc && popd
git clone --bare https://github.com/instructlab/eval.git && git clone eval.git/ && cd eval && git remote add syncrepo ../eval.git
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
pip install -e .
pip install vllm
python -m vllm.entrypoints.openai.api_server --model instructlab/granite-7b-lab --tensor-parallel-size 1
In another shell window
export INSTRUCTLAB_EVAL_FIRST_N_QUESTIONS=10 # Optional if you want to shorten run times
# Commands relative to eval directory
python3 tests/test_gen_answers.py
python3 tests/test_branch_gen_answers.py
Example output tree
eval_output/
├── mt_bench
│ └── model_answer
│ └── instructlab
│ └── granite-7b-lab.jsonl
└── mt_bench_branch
├── main
│ ├── model_answer
│ │ └── instructlab
│ │ └── granite-7b-lab.jsonl
│ ├── question.jsonl
│ └── reference_answer
│ └── instructlab
│ └── granite-7b-lab.jsonl
└── rc
├── model_answer
│ └── instructlab
│ └── granite-7b-lab.jsonl
├── question.jsonl
└── reference_answer
└── instructlab
└── granite-7b-lab.jsonl
python3 tests/test_judge_answers.py
python3 tests/test_branch_judge_answers.py
Example output tree
eval_output/
├── mt_bench
│ ├── model_answer
│ │ └── instructlab
│ │ └── granite-7b-lab.jsonl
│ └── model_judgment
│ └── instructlab
│ └── granite-7b-lab_single.jsonl
└── mt_bench_branch
├── main
│ ├── model_answer
│ │ └── instructlab
│ │ └── granite-7b-lab.jsonl
│ ├── model_judgment
│ │ └── instructlab
│ │ └── granite-7b-lab_single.jsonl
│ ├── question.jsonl
│ └── reference_answer
│ └── instructlab
│ └── granite-7b-lab.jsonl
└── rc
├── model_answer
│ └── instructlab
│ └── granite-7b-lab.jsonl
├── model_judgment
│ └── instructlab
│ └── granite-7b-lab_single.jsonl
├── question.jsonl
└── reference_answer
└── instructlab
└── granite-7b-lab.jsonl
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
instructlab_eval-0.2.1.tar.gz
(96.9 kB
view details)
Built Distribution
File details
Details for the file instructlab_eval-0.2.1.tar.gz
.
File metadata
- Download URL: instructlab_eval-0.2.1.tar.gz
- Upload date:
- Size: 96.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc53410d0c72342923c0634e53fa59cb82ce9ff1bffd2b721b8112ac4e72785e |
|
MD5 | aa6bc4f56c834e49bd46f09eef80efe0 |
|
BLAKE2b-256 | 2084ed24f72d87f1a26f95356aa34d28550186f196c0ba7e76281792c447e540 |
File details
Details for the file instructlab_eval-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: instructlab_eval-0.2.1-py3-none-any.whl
- Upload date:
- Size: 63.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3760528c9a451fd4f416645d457754b3f441e2c3c3976b772339c5770906e118 |
|
MD5 | 9c62f5626a71b0cd52295997d5021869 |
|
BLAKE2b-256 | 8da3d8822eb3ab8646746b307543b1d597f2df06f2c55e9e43bd8d1b16bad64d |