Skip to main content

LLM unified service

Project description

Modelz LLM

discord invitation link trackgit-views

Modelz LLM is an inference server that facilitates the utilization of open source large language models (LLMs), such as FastChat, LLaMA, and ChatGLM, on either local or cloud-based environments with OpenAI compatible API.

Features

  • OpenAI compatible API: Modelz LLM provides an OpenAI compatible API for LLMs, which means you can use the OpenAI python SDK or LangChain to interact with the model.
  • Self-hosted: Modelz LLM can be easily deployed on either local or cloud-based environments.
  • Open source LLMs: Modelz LLM supports open source LLMs, such as FastChat, LLaMA, and ChatGLM.
  • Cloud native: We provide docker images for different LLMs, which can be easily deployed on Kubernetes, or other cloud-based environments (e.g. Modelz)

Quick Start

Install

pip install modelz-llm
# or install from source
pip install git+https://github.com/tensorchord/modelz-llm.git[gpu]

Run the self-hosted API server

Please first start the self-hosted API server by following the instructions:

modelz-llm -m bigscience/bloomz-560m --device cpu

Currently, we support the following models:

Model Name Huggingface Model Docker Image Recommended GPU
FastChat T5 lmsys/fastchat-t5-3b-v1.0 modelzai/llm-fastchat-t5-3b Nvidia L4(24GB)
Vicuna 7B Delta V1.1 lmsys/vicuna-7b-delta-v1.1 modelzai/llm-vicuna-7b Nvidia A100(40GB)
LLaMA 7B decapoda-research/llama-7b-hf modelzai/llm-llama-7b Nvidia A100(40GB)
ChatGLM 6B INT4 THUDM/chatglm-6b-int4 modelzai/llm-chatglm-6b-int4 Nvidia T4(16GB)
ChatGLM 6B THUDM/chatglm-6b modelzai/llm-chatglm-6b Nvidia L4(24GB)
Bloomz 560M bigscience/bloomz-560m modelzai/llm-bloomz-560m CPU
Bloomz 1.7B bigscience/bloomz-1b7 CPU
Bloomz 3B bigscience/bloomz-3b Nvidia L4(24GB)
Bloomz 7.1B bigscience/bloomz-7b1 Nvidia A100(40GB)

Use OpenAI python SDK

Then you can use the OpenAI python SDK to interact with the model:

import openai
openai.api_base="http://localhost:8000"
openai.api_key="any"

# create a chat completion
chat_completion = openai.ChatCompletion.create(model="any", messages=[{"role": "user", "content": "Hello world"}])

Integrate with Langchain

You could also integrate modelz-llm with langchain:

import openai
openai.api_base="http://localhost:8000"
openai.api_key="any"

from langchain.llms import OpenAI

llm = OpenAI()

llm.generate(prompts=["Could you please recommend some movies?"])

Deploy on Modelz

You could also deploy the modelz-llm directly on Modelz:

Supported APIs

Modelz LLM supports the following APIs for interacting with open source large language models:

  • /completions
  • /chat/completions
  • /embeddings
  • /engines/<any>/embeddings
  • /v1/completions
  • /v1/chat/completions
  • /v1/embeddings

Acknowledgements

  • FastChat for the prompt generation logic.
  • Mosec for the inference engine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelz-llm-23.7.4.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

modelz_llm-23.7.4-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file modelz-llm-23.7.4.tar.gz.

File metadata

  • Download URL: modelz-llm-23.7.4.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for modelz-llm-23.7.4.tar.gz
Algorithm Hash digest
SHA256 3a5a7fcde99f82a9d6c208fcb4bac683030cf7d5c14c9caad3507cf2554226f2
MD5 4990fb66128a4c0626969b45cdcc501d
BLAKE2b-256 121cb914663cf5932e2439b5af2caae24524c077b9e17aea01d9c23f9a99bc93

See more details on using hashes here.

File details

Details for the file modelz_llm-23.7.4-py3-none-any.whl.

File metadata

  • Download URL: modelz_llm-23.7.4-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for modelz_llm-23.7.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9e7ca6182421a49ba7e43309af0991426ab82a5f6a42e28773d10ed0f7369d1d
MD5 d3ab0c3ab03c3f0c04bea84e9cdb4351
BLAKE2b-256 188b8ed002926c89fc211b65f682d5e56fa22089a2ca06e19a3586ad803bd5e5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page