Skip to main content

Evaluation

Project description

eval

Lint Build Release License

Python Library for Evaluation

MT-Bench / MT-Bench-Branch Testing Steps

⚠️ Note: Must use Python version 3.10 or later.

# Optional: Use cloud-instance.sh (https://github.com/instructlab/instructlab/tree/main/scripts/infra) to launch and setup the instance
scripts/infra/cloud-instance.sh ec2 launch -t g5.4xlarge
scripts/infra/cloud-instance.sh ec2 setup-rh-devenv
scripts/infra/cloud-instance.sh ec2 install-rh-nvidia-drivers
scripts/infra/cloud-instance.sh ec2 ssh sudo reboot
scripts/infra/cloud-instance.sh ec2 ssh


# Regardless of how you setup your instance
git clone https://github.com/instructlab/taxonomy.git && pushd taxonomy && git branch rc && popd
git clone --bare https://github.com/instructlab/eval.git && git clone eval.git/ && cd eval && git remote add syncrepo ../eval.git
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
pip install -e .
pip install vllm
python -m vllm.entrypoints.openai.api_server --model instructlab/granite-7b-lab --tensor-parallel-size 1

In another shell window

export INSTRUCTLAB_EVAL_FIRST_N_QUESTIONS=10 # Optional if you want to shorten run times
# Commands relative to eval directory
python3 tests/test_gen_answers.py
python3 tests/test_branch_gen_answers.py

Example output tree

eval_output/
├── mt_bench
│   └── model_answer
│       └── instructlab
│           └── granite-7b-lab.jsonl
└── mt_bench_branch
    ├── main
       ├── model_answer
          └── instructlab
              └── granite-7b-lab.jsonl
       ├── question.jsonl
       └── reference_answer
           └── instructlab
               └── granite-7b-lab.jsonl
    └── rc
        ├── model_answer
           └── instructlab
               └── granite-7b-lab.jsonl
        ├── question.jsonl
        └── reference_answer
            └── instructlab
                └── granite-7b-lab.jsonl
python3 tests/test_judge_answers.py
python3 tests/test_branch_judge_answers.py

Example output tree

eval_output/
├── mt_bench
│   ├── model_answer
│      └── instructlab
│          └── granite-7b-lab.jsonl
│   └── model_judgment
│       └── instructlab
│           └── granite-7b-lab_single.jsonl
└── mt_bench_branch
    ├── main
       ├── model_answer
          └── instructlab
              └── granite-7b-lab.jsonl
       ├── model_judgment
          └── instructlab
              └── granite-7b-lab_single.jsonl
       ├── question.jsonl
       └── reference_answer
           └── instructlab
               └── granite-7b-lab.jsonl
    └── rc
        ├── model_answer
           └── instructlab
               └── granite-7b-lab.jsonl
        ├── model_judgment
           └── instructlab
               └── granite-7b-lab_single.jsonl
        ├── question.jsonl
        └── reference_answer
            └── instructlab
                └── granite-7b-lab.jsonl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instructlab_eval-0.2.1.tar.gz (96.9 kB view details)

Uploaded Source

Built Distribution

instructlab_eval-0.2.1-py3-none-any.whl (63.6 kB view details)

Uploaded Python 3

File details

Details for the file instructlab_eval-0.2.1.tar.gz.

File metadata

  • Download URL: instructlab_eval-0.2.1.tar.gz
  • Upload date:
  • Size: 96.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for instructlab_eval-0.2.1.tar.gz
Algorithm Hash digest
SHA256 cc53410d0c72342923c0634e53fa59cb82ce9ff1bffd2b721b8112ac4e72785e
MD5 aa6bc4f56c834e49bd46f09eef80efe0
BLAKE2b-256 2084ed24f72d87f1a26f95356aa34d28550186f196c0ba7e76281792c447e540

See more details on using hashes here.

File details

Details for the file instructlab_eval-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for instructlab_eval-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3760528c9a451fd4f416645d457754b3f441e2c3c3976b772339c5770906e118
MD5 9c62f5626a71b0cd52295997d5021869
BLAKE2b-256 8da3d8822eb3ab8646746b307543b1d597f2df06f2c55e9e43bd8d1b16bad64d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page