Skip to main content

Evaluation

Project description

eval

Lint Build Release License

Python Library for Evaluation

MT-Bench / MT-Bench-Branch Testing Steps

# Optional: Use cloud-instance.sh to launch and setup the instance
./cloud-instance.sh ec2 launch -t g5.4xlarge
./cloud-instance.sh ec2 setup-rh-devenv
./cloud-instance.sh ec2 install-rh-nvidia-drivers
./cloud-instance.sh ec2 ssh sudo reboot
./cloud-instance.sh ec2 ssh


# Regardless of how you setup your instance
git clone https://github.com/instructlab/taxonomy.git && pushd taxonomy && git branch rc && popd
git clone --bare https://github.com/instructlab/eval.git && git clone eval.git/ && cd eval && git remote add syncrepo ../eval.git
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
pip install -e .
pip install vllm
python -m vllm.entrypoints.openai.api_server --model instructlab/granite-7b-lab --tensor-parallel-size 1

In another shell window

python3 tests/test_gen_answers.py
python3 tests/test_branch_gen_answers.py

Example output tree

eval_output/
├── mt_bench
│   └── model_answer
│       └── instructlab
│           └── granite-7b-lab.jsonl
└── mt_bench_branch
    ├── main
       ├── model_answer
          └── instructlab
              └── granite-7b-lab.jsonl
       ├── question.jsonl
       └── reference_answer
           └── instructlab
               └── granite-7b-lab.jsonl
    └── rc
        ├── model_answer
           └── instructlab
               └── granite-7b-lab.jsonl
        ├── question.jsonl
        └── reference_answer
            └── instructlab
                └── granite-7b-lab.jsonl
export INSTRUCTLAB_EVAL_FIRST_N_QUESTIONS=40 # Optional if you want to shorten run times
python3 tests/test_judge_answers.py
python3 tests/test_branch_judge_answers.py

Example output tree

eval_output/
├── mt_bench
│   ├── model_answer
│      └── instructlab
│          └── granite-7b-lab.jsonl
│   └── model_judgment
│       └── instructlab
│           └── granite-7b-lab_single.jsonl
└── mt_bench_branch
    ├── main
       ├── model_answer
          └── instructlab
              └── granite-7b-lab.jsonl
       ├── model_judgment
          └── instructlab
              └── granite-7b-lab_single.jsonl
       ├── question.jsonl
       └── reference_answer
           └── instructlab
               └── granite-7b-lab.jsonl
    └── rc
        ├── model_answer
           └── instructlab
               └── granite-7b-lab.jsonl
        ├── model_judgment
           └── instructlab
               └── granite-7b-lab_single.jsonl
        ├── question.jsonl
        └── reference_answer
            └── instructlab
                └── granite-7b-lab.jsonl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instructlab_eval-0.0.7.tar.gz (86.9 kB view details)

Uploaded Source

Built Distribution

instructlab_eval-0.0.7-py3-none-any.whl (56.9 kB view details)

Uploaded Python 3

File details

Details for the file instructlab_eval-0.0.7.tar.gz.

File metadata

  • Download URL: instructlab_eval-0.0.7.tar.gz
  • Upload date:
  • Size: 86.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for instructlab_eval-0.0.7.tar.gz
Algorithm Hash digest
SHA256 095f203cd521f53c8d958e35ee0aef54d2253049ca35b94282179213e1bdbac8
MD5 e31e845acdef11ecc89ac83fd48e5959
BLAKE2b-256 7326d9581f5a408091fc1c7e0ca2b34741e5014c039bb19f571c0c383098411c

See more details on using hashes here.

File details

Details for the file instructlab_eval-0.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for instructlab_eval-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 6bdabd8145301bf1666e6ad9f1b9bafa7697d0e430075d65111195bac5418a24
MD5 d4cb1d1a3facb66e0d021586a19e07d1
BLAKE2b-256 1e59c8bc9ea4b4d9a79adc095d84550bfadd7c3ddf5a9e0cb889c929c36370fe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page