Extend LLMs to infinite length without sacrificing efficiency and performance, without retraining
Project description
Attention Sinks in Transformers for Infinite-length LLMs
Overview
- Extend existing LLMs (e.g. Llama 2) to infinite length without sacrificing efficiency and performance, without any retraining.
- The
attention_sinks
API allows for a drop-in replacement of thetransformers
API:from attention_sinks import AutoModel model = AutoModel.from_pretrained("meta-llama/Llama-2-7b-hf", device_map="auto")
- New parameters to
AutoModel....from_pretrained
:attention_sink_size
, int, defaults to 4: The number of initial tokens to use as the attention sink. These tokens are always included in the Attention Sink KV Cache.attention_sink_window_size
, int, defaults to 1020: The size of the sliding window, i.e. the number of "recent tokens" to include in the Attention Sink KV Cache.
Note
I've yet to replicate all of the experiments by the original paper, although I've replicated some. I can't confirm that this indeed allows for infinite-length LLMs in theory nor in practice.
More details coming soon.
Credits
Inspired by, and adapted from StreamingLLM.
Citation
@article{xiao2023streamingllm,
title={Efficient Streaming Language Models with Attention Sinks},
author={Xiao, Guangxuan and Tian, Yuandong and Chen, Beidi and Han, Song and Lewis, Mike},
journal={arXiv},
year={2023}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
attention_sinks-0.0.1.tar.gz
(11.5 kB
view details)
Built Distribution
File details
Details for the file attention_sinks-0.0.1.tar.gz
.
File metadata
- Download URL: attention_sinks-0.0.1.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1019c0a691374538acded0ffc8914f81c05bda92c1d3f898506797ea8d2ed62 |
|
MD5 | 94e8bc8071bafe8e11b4f1cc258ab89a |
|
BLAKE2b-256 | 0f65ce134e9bc0c6f0afa2fc924128d1ae2f6687480509f8708c7e2679e1c4dc |
File details
Details for the file attention_sinks-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: attention_sinks-0.0.1-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59fc883ae53eb807e45114e4e5b39b84a3e23d12126f2deff7195cd8d8c7bc9d |
|
MD5 | ecd42d7318d2a7cfc5b1a2e8192800e8 |
|
BLAKE2b-256 | 1bc661443d3cfdc7d1509d74526a0d25f50756e5878c3b712eac40c57879e1c1 |