lighteval

A lightweight and configurable evaluation package

These details have not been verified by PyPI

Project links

Project description

lighteval library logo

Your go-to toolkit for lightning-fast, flexible LLM evaluation, from Hugging Face's Leaderboard and Evals Team.

Unlock the Power of LLM Evaluation with Lighteval 🚀

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends—whether it's transformers, tgi, vllm, or nanotron—with ease. Dive deep into your model’s performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack-up.

Customization at your fingertips: letting you either browse all our existing tasks and metrics or effortlessly create your own, tailored to your needs.

Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.

🔑 Key Features

Speed: Use vllm as backend for fast evals.
Completeness: Use the accelerate backend to launch any models hosted on Hugging Face.
Seamless Storage: Save results in S3 or Hugging Face Datasets.
Python API: Simple integration with the Python API.
Custom Tasks: Easily add custom tasks.
Versatility: Tons of metrics and tasks ready to go.

⚡️ Installation

pip install lighteval[accelerate]

Lighteval allows for many extras when installing, see here for a complete list.

If you want to push results to the Hugging Face Hub, add your access token as an environment variable:

huggingface-cli login

🚀 Quickstart

Lighteval offers two main entry points for model evaluation:

lighteval accelerate: evaluate models on CPU or one or more GPUs using 🤗 Accelerate.
lighteval nanotron: evaluate models in distributed settings using ⚡️ Nanotron.

Here’s a quick command to evaluate using the Accelerate backend:

lighteval accelerate \
    --model_args "pretrained=gpt2" \
    --tasks "leaderboard|truthfulqa:mc|0|0" \
    --override_batch_size 1 \
    --output_dir="./evals/"

🙏 Acknowledgements

Lighteval started as an extension of the fantastic Eleuther AI Harness (which powers the Open LLM Leaderboard) and draws inspiration from the amazing HELM framework.

While evolving Lighteval into its own standalone tool, we are grateful to the Harness and HELM teams for their pioneering work on LLM evaluations.

🌟 Contributions Welcome 💙💚💛💜🧡

Got ideas? Found a bug? Want to add a task or metric? Contributions are warmly welcomed!

📜 Citation

@misc{lighteval,
  author = {Fourrier, Clémentine and Habib, Nathan and Wolf, Thomas and Tunstall, Lewis},
  title = {LightEval: A lightweight framework for LLM evaluation},
  year = {2023},
  version = {0.5.0},
  url = {https://github.com/huggingface/lighteval}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.6.2

Oct 23, 2024

0.5.0

Sep 24, 2024

0.4.0a0 pre-release

Sep 5, 2024

0.3.0

Mar 29, 2024

0.3.0a0 pre-release

Mar 29, 2024

0.2.0

Mar 1, 2024

0.1.1

Feb 9, 2024

0.1.1.dev0 pre-release

Feb 9, 2024

0.1.0

Feb 8, 2024

0.0.2

Feb 8, 2024

0.0.1

Oct 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lighteval-0.6.2.tar.gz (272.6 kB view details)

Uploaded Oct 23, 2024 Source

Built Distribution

lighteval-0.6.2-py3-none-any.whl (335.7 kB view details)

Uploaded Oct 23, 2024 Python 3

File details

Details for the file lighteval-0.6.2.tar.gz.

File metadata

Download URL: lighteval-0.6.2.tar.gz
Upload date: Oct 23, 2024
Size: 272.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for lighteval-0.6.2.tar.gz
Algorithm	Hash digest
SHA256	`e48caf17c4136f973b5b9ee0692171b797692e068bd6c8efed14657b81500956`
MD5	`3e2b3ef6351285f71147a20d0d4117e7`
BLAKE2b-256	`65dd547f2af88bc4c56ce19123f66701cdc5762f33cc9b49bedb19ae03b26fcd`

See more details on using hashes here.

File details

Details for the file lighteval-0.6.2-py3-none-any.whl.

File metadata

Download URL: lighteval-0.6.2-py3-none-any.whl
Upload date: Oct 23, 2024
Size: 335.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for lighteval-0.6.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1832fff4ca76d4ec617b5242c60e5dcaa1df8966f9b8352af105386fb6c910ba`
MD5	`cac0fcf853048c13b7eef068bc0f3f43`
BLAKE2b-256	`baf8c3f757064572b62ef63cb8e45e093245b0327e9f8e1c93f2aa57e227a33e`