Skip to main content

Evaluation

Project description

eval

Lint Build Release License

Python Library for Evaluation

MT-Bench / MT-Bench-Branch Testing Steps

# Optional: Use cloud-instance.sh (https://github.com/instructlab/instructlab/tree/main/scripts/infra) to launch and setup the instance
scripts/infra/cloud-instance.sh ec2 launch -t g5.4xlarge
scripts/infra/cloud-instance.sh ec2 setup-rh-devenv
scripts/infra/cloud-instance.sh ec2 install-rh-nvidia-drivers
scripts/infra/cloud-instance.sh ec2 ssh sudo reboot
scripts/infra/cloud-instance.sh ec2 ssh


# Regardless of how you setup your instance
git clone https://github.com/instructlab/taxonomy.git && pushd taxonomy && git branch rc && popd
git clone --bare https://github.com/instructlab/eval.git && git clone eval.git/ && cd eval && git remote add syncrepo ../eval.git
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
pip install -e .
pip install vllm
python -m vllm.entrypoints.openai.api_server --model instructlab/granite-7b-lab --tensor-parallel-size 1

In another shell window

export INSTRUCTLAB_EVAL_FIRST_N_QUESTIONS=10 # Optional if you want to shorten run times
python3 tests/test_gen_answers.py
python3 tests/test_branch_gen_answers.py

Example output tree

eval_output/
├── mt_bench
│   └── model_answer
│       └── instructlab
│           └── granite-7b-lab.jsonl
└── mt_bench_branch
    ├── main
       ├── model_answer
          └── instructlab
              └── granite-7b-lab.jsonl
       ├── question.jsonl
       └── reference_answer
           └── instructlab
               └── granite-7b-lab.jsonl
    └── rc
        ├── model_answer
           └── instructlab
               └── granite-7b-lab.jsonl
        ├── question.jsonl
        └── reference_answer
            └── instructlab
                └── granite-7b-lab.jsonl
python3 tests/test_judge_answers.py
python3 tests/test_branch_judge_answers.py

Example output tree

eval_output/
├── mt_bench
│   ├── model_answer
│      └── instructlab
│          └── granite-7b-lab.jsonl
│   └── model_judgment
│       └── instructlab
│           └── granite-7b-lab_single.jsonl
└── mt_bench_branch
    ├── main
       ├── model_answer
          └── instructlab
              └── granite-7b-lab.jsonl
       ├── model_judgment
          └── instructlab
              └── granite-7b-lab_single.jsonl
       ├── question.jsonl
       └── reference_answer
           └── instructlab
               └── granite-7b-lab.jsonl
    └── rc
        ├── model_answer
           └── instructlab
               └── granite-7b-lab.jsonl
        ├── model_judgment
           └── instructlab
               └── granite-7b-lab_single.jsonl
        ├── question.jsonl
        └── reference_answer
            └── instructlab
                └── granite-7b-lab.jsonl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instructlab_eval-0.1.2.tar.gz (88.1 kB view details)

Uploaded Source

Built Distribution

instructlab_eval-0.1.2-py3-none-any.whl (58.2 kB view details)

Uploaded Python 3

File details

Details for the file instructlab_eval-0.1.2.tar.gz.

File metadata

  • Download URL: instructlab_eval-0.1.2.tar.gz
  • Upload date:
  • Size: 88.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for instructlab_eval-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d4a2e35c073c618fa7d5f18abe35b761fa5902f24556f5ec274cb87d101a8dbd
MD5 d1f3873da0cb7de890264db521ef0687
BLAKE2b-256 9ef85f4cd0f4cf74e58eaf3e4895f9b83b83ccb423bfb8ef36ef9400acab9810

See more details on using hashes here.

File details

Details for the file instructlab_eval-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for instructlab_eval-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7c3d7583f6547f5d6b37688669fe05b55065aab62c6aa25b9edd8bb3196583c3
MD5 02ee141a7600bbeea2c6e02e7d743f1d
BLAKE2b-256 3c863f766278c87504b62667a050e5d83d7863d09c3b27475be7a10230eeef94

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page