LLM unified service
Project description
Modelz LLM
Modelz LLM is an inference server that facilitates the utilization of open source large language models (LLMs), such as FastChat, LLaMA, and ChatGLM, on either local or cloud-based environments with OpenAI compatible API.
Features
- OpenAI compatible API: Modelz LLM provides an OpenAI compatible API for LLMs, which means you can use the OpenAI python SDK to interact with the model.
- Self-hosted: Modelz LLM can be easily deployed on either local or cloud-based environments.
- Open source LLMs: Modelz LLM supports open source LLMs, such as FastChat, LLaMA, and ChatGLM.
- Cloud native: We provide docker images for different LLMs, which can be easily deployed on Kubernetes, or other cloud-based environments (e.g. Modelz)
Quick Start
Install
pip install modelz-llm[gpu]
# or install from source
pip install git+https://github.com/tensorchord/modelz-llm.git[gpu]
Run the self-hosted API server
Please first start the self-hosted API server by following the instructions:
modelz-llm -m "THUDM/chatglm-6b-int4"
Currently, we support the following models:
Model Name | Huggingface Model | Docker Image |
---|---|---|
FastChat T5 | lmsys/fastchat-t5-3b-v1.0 |
modelzai/llm-fastchat-t5-3b |
Vicuna 7B Delta V1.1 | lmsys/vicuna-7b-delta-v1.1 |
modelzai/llm-vicuna-7b |
LLaMA 7B | decapoda-research/llama-7b-hf |
modelzai/llm-llama-7b |
ChatGLM 6B INT4 | THUDM/chatglm-6b-int4 |
modelzai/llm-chatglm-6b-int4 |
ChatGLM 6B | THUDM/chatglm-6b |
modelzai/llm-chatglm-6b |
Bloomz 560M | bigscience/bloomz-560m |
|
Bloomz 1.7B | bigscience/bloomz-1b7 |
|
Bloomz 3B | bigscience/bloomz-3b |
|
Bloomz 7.1B | bigscience/bloomz-7b1 |
Use OpenAI python SDK
Then you can use the OpenAI python SDK to interact with the model:
import openai
openai.api_base="http://localhost:8000"
openai.api_key="any"
# create a chat completion
chat_completion = openai.ChatCompletion.create(model="any", messages=[{"role": "user", "content": "Hello world"}])
Supported APIs
app.add_route("/", Ping())
app.add_route("/completions", completion)
app.add_route("/chat/completions", chat_completion)
app.add_route("/embeddings", embeddings)
app.add_route("/engines/{engine}/embeddings", embeddings)
app.add_route("/v1/completions", completion)
app.add_route("/v1/chat/completions", chat_completion)
app.add_route("/v1/embeddings", embeddings)
app.add_route("/v1/engines/{engine}/embeddings", embeddings)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
modelz-llm-23.6.4.tar.gz
(15.3 kB
view details)
Built Distribution
File details
Details for the file modelz-llm-23.6.4.tar.gz
.
File metadata
- Download URL: modelz-llm-23.6.4.tar.gz
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0474e86dd718c8760ef3108e65a3e9fa48089a74d09590955187fd43cd63faf |
|
MD5 | 69d581d4722ed90ca29d03e83a0fe37b |
|
BLAKE2b-256 | 586c441c6673915f9e18256c6c127d996a73c58308a57bd177c039ee63f84d75 |
File details
Details for the file modelz_llm-23.6.4-py3-none-any.whl
.
File metadata
- Download URL: modelz_llm-23.6.4-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49d056b6d258acb68f0133f34d166693751a42055fe0c0e0b2668ca2ef356647 |
|
MD5 | 9495c21f63860ea253c9d71cc3237a5f |
|
BLAKE2b-256 | 2ccc9c381996421924a01b70ed0166dc90aa1358ef7e8c4be9a2dfa91a90afb4 |