Skip to main content

A lightweight and configurable evaluation package

Project description


lighteval library logo

Your go-to toolkit for lightning-fast, flexible LLM evaluation, from Hugging Face's Leaderboard and Evals Team.

Tests Quality Python versions License Version


Documentation: Lighteval's Wiki


Unlock the Power of LLM Evaluation with Lighteval 🚀

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends—whether it's transformers, tgi, vllm, or nanotron—with ease. Dive deep into your model’s performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack-up.

Customization at your fingertips: letting you either browse all our existing tasks and metrics or effortlessly create your own, tailored to your needs.

Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.

🔑 Key Features

⚡️ Installation

pip install lighteval[accelerate]

Lighteval allows for many extras when installing, see here for a complete list.

If you want to push results to the Hugging Face Hub, add your access token as an environment variable:

huggingface-cli login

🚀 Quickstart

Lighteval offers two main entry points for model evaluation:

  • lighteval accelerate: evaluate models on CPU or one or more GPUs using 🤗 Accelerate.
  • lighteval nanotron: evaluate models in distributed settings using ⚡️ Nanotron.

Here’s a quick command to evaluate using the Accelerate backend:

lighteval accelerate \
    --model_args "pretrained=gpt2" \
    --tasks "leaderboard|truthfulqa:mc|0|0" \
    --override_batch_size 1 \
    --output_dir="./evals/"

🙏 Acknowledgements

Lighteval started as an extension of the fantastic Eleuther AI Harness (which powers the Open LLM Leaderboard) and draws inspiration from the amazing HELM framework.

While evolving Lighteval into its own standalone tool, we are grateful to the Harness and HELM teams for their pioneering work on LLM evaluations.

🌟 Contributions Welcome 💙💚💛💜🧡

Got ideas? Found a bug? Want to add a task or metric? Contributions are warmly welcomed!

📜 Citation

@misc{lighteval,
  author = {Fourrier, Clémentine and Habib, Nathan and Wolf, Thomas and Tunstall, Lewis},
  title = {LightEval: A lightweight framework for LLM evaluation},
  year = {2023},
  version = {0.5.0},
  url = {https://github.com/huggingface/lighteval}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lighteval-0.6.2.tar.gz (272.6 kB view details)

Uploaded Source

Built Distribution

lighteval-0.6.2-py3-none-any.whl (335.7 kB view details)

Uploaded Python 3

File details

Details for the file lighteval-0.6.2.tar.gz.

File metadata

  • Download URL: lighteval-0.6.2.tar.gz
  • Upload date:
  • Size: 272.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for lighteval-0.6.2.tar.gz
Algorithm Hash digest
SHA256 e48caf17c4136f973b5b9ee0692171b797692e068bd6c8efed14657b81500956
MD5 3e2b3ef6351285f71147a20d0d4117e7
BLAKE2b-256 65dd547f2af88bc4c56ce19123f66701cdc5762f33cc9b49bedb19ae03b26fcd

See more details on using hashes here.

File details

Details for the file lighteval-0.6.2-py3-none-any.whl.

File metadata

  • Download URL: lighteval-0.6.2-py3-none-any.whl
  • Upload date:
  • Size: 335.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for lighteval-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1832fff4ca76d4ec617b5242c60e5dcaa1df8966f9b8352af105386fb6c910ba
MD5 cac0fcf853048c13b7eef068bc0f3f43
BLAKE2b-256 baf8c3f757064572b62ef63cb8e45e093245b0327e9f8e1c93f2aa57e227a33e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page