Full documentation update to match current codebase

This commit is contained in:
oobabooga 2026-03-05 12:46:21 -03:00
parent 1c2548fd89
commit 1ffe540c97
10 changed files with 388 additions and 326 deletions

View file

@ -39,7 +39,7 @@ curl http://127.0.0.1:5000/v1/completions \
#### Chat completions
Works best with instruction-following models. If the "instruction_template" variable is not provided, it will be guessed automatically based on the model name using the regex patterns in `models/config.yaml`.
Works best with instruction-following models. If the "instruction_template" variable is not provided, it will be guessed automatically based on the model name using the regex patterns in `user_data/models/config.yaml`.
```shell
curl http://127.0.0.1:5000/v1/chat/completions \
@ -476,51 +476,45 @@ OPENAI_API_KEY=sk-111111111111111111111111111111111111111111111111
OPENAI_API_BASE=http://127.0.0.1:5000/v1
```
With the [official python openai client](https://github.com/openai/openai-python), the address can be set like this:
With the [official python openai client](https://github.com/openai/openai-python) (v1.x), the address can be set like this:
```python
import openai
from openai import OpenAI
openai.api_key = "..."
openai.api_base = "http://127.0.0.1:5000/v1"
openai.api_version = "2023-05-15"
client = OpenAI(
api_key="sk-111111111111111111111111111111111111111111111111",
base_url="http://127.0.0.1:5000/v1"
)
response = client.chat.completions.create(
model="x",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
```
If using .env files to save the `OPENAI_API_BASE` and `OPENAI_API_KEY` variables, make sure the .env file is loaded before the openai module is imported:
```python
from dotenv import load_dotenv
load_dotenv() # make sure the environment variables are set before import
import openai
```
With the [official Node.js openai client](https://github.com/openai/openai-node) it is slightly more more complex because the environment variables are not used by default, so small source code changes may be required to use the environment variables, like so:
With the [official Node.js openai client](https://github.com/openai/openai-node) (v4.x):
```js
const openai = OpenAI(
Configuration({
apiKey: process.env.OPENAI_API_KEY,
basePath: process.env.OPENAI_API_BASE
})
);
```
import OpenAI from "openai";
For apps made with the [chatgpt-api Node.js client library](https://github.com/transitive-bullshit/chatgpt-api):
```js
const api = new ChatGPTAPI({
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
apiBaseUrl: process.env.OPENAI_API_BASE
baseURL: "http://127.0.0.1:5000/v1",
});
const response = await client.chat.completions.create({
model: "x",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
```
### Embeddings (alpha)
Embeddings requires `sentence-transformers` installed, but chat and completions will function without it loaded. The embeddings endpoint is currently using the HuggingFace model: `sentence-transformers/all-mpnet-base-v2` for embeddings. This produces 768 dimensional embeddings (the same as the text-davinci-002 embeddings), which is different from OpenAI's current default `text-embedding-ada-002` model which produces 1536 dimensional embeddings. The model is small-ish and fast-ish. This model and embedding size may change in the future.
Embeddings requires `sentence-transformers` installed, but chat and completions will function without it loaded. The embeddings endpoint is currently using the HuggingFace model: `sentence-transformers/all-mpnet-base-v2` for embeddings. This produces 768 dimensional embeddings. The model is small and fast. This model and embedding size may change in the future.
| model name | dimensions | input max tokens | speed | size | Avg. performance |
| ---------------------- | ---------- | ---------------- | ----- | ---- | ---------------- |
| text-embedding-ada-002 | 1536 | 8192 | - | - | - |
| text-davinci-002 | 768 | 2046 | - | - | - |
| all-mpnet-base-v2 | 768 | 384 | 2800 | 420M | 63.3 |
| all-MiniLM-L6-v2 | 384 | 256 | 14200 | 80M | 58.8 |
@ -528,50 +522,33 @@ In short, the all-MiniLM-L6-v2 model is 5x faster, 5x smaller ram, 2x smaller st
Warning: You cannot mix embeddings from different models even if they have the same dimensions. They are not comparable.
### Compatibility & not so compatibility
### Compatibility
Note: the table below may be obsolete.
| API endpoint | tested with | notes |
| ------------------------- | ---------------------------------- | --------------------------------------------------------------------------- |
| /v1/chat/completions | openai.ChatCompletion.create() | Use it with instruction following models |
| /v1/embeddings | openai.Embedding.create() | Using SentenceTransformer embeddings |
| /v1/images/generations | openai.Image.create() | Bare bones, no model configuration, response_format='b64_json' only. |
| /v1/moderations | openai.Moderation.create() | Basic initial support via embeddings |
| /v1/models | openai.Model.list() | Lists models, Currently loaded model first, plus some compatibility options |
| /v1/models/{id} | openai.Model.get() | returns whatever you ask for |
| /v1/edits | openai.Edit.create() | Removed, use /v1/chat/completions instead |
| /v1/text_completion | openai.Completion.create() | Legacy endpoint, variable quality based on the model |
| /v1/completions | openai api completions.create | Legacy endpoint (v0.25) |
| /v1/engines/\*/embeddings | python-openai v0.25 | Legacy endpoint |
| /v1/engines/\*/generate | openai engines.generate | Legacy endpoint |
| /v1/engines | openai engines.list | Legacy Lists models |
| /v1/engines/{model_name} | openai engines.get -i {model_name} | You can use this legacy endpoint to load models via the api or command line |
| /v1/images/edits | openai.Image.create_edit() | not yet supported |
| /v1/images/variations | openai.Image.create_variation() | not yet supported |
| /v1/audio/\* | openai.Audio.\* | supported |
| /v1/files\* | openai.Files.\* | not yet supported |
| /v1/fine-tunes\* | openai.FineTune.\* | not yet supported |
| /v1/search | openai.search, engines.search | not yet supported |
| API endpoint | notes |
| ------------------------- | --------------------------------------------------------------------------- |
| /v1/chat/completions | Use with instruction-following models. Supports streaming, tool calls. |
| /v1/completions | Text completion endpoint. |
| /v1/embeddings | Using SentenceTransformer embeddings. |
| /v1/images/generations | Image generation, response_format='b64_json' only. |
| /v1/moderations | Basic support via embeddings. |
| /v1/models | Lists models. Currently loaded model first. |
| /v1/models/{id} | Returns model info. |
| /v1/audio/\* | Supported. |
| /v1/images/edits | Not yet supported. |
| /v1/images/variations | Not yet supported. |
#### Applications
Almost everything needs the `OPENAI_API_KEY` and `OPENAI_API_BASE` environment variable set, but there are some exceptions.
Almost everything needs the `OPENAI_API_KEY` and `OPENAI_API_BASE` environment variables set, but there are some exceptions.
Note: the table below may be obsolete.
| Compatibility | Application/Library | Website | Notes |
| ------------- | ---------------------- | ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| ✅❌ | openai-python (v0.25+) | https://github.com/openai/openai-python | only the endpoints from above are working. OPENAI_API_BASE=http://127.0.0.1:5001/v1 |
| ✅❌ | openai-node | https://github.com/openai/openai-node | only the endpoints from above are working. environment variables don't work by default, but can be configured (see above) |
| ✅❌ | chatgpt-api | https://github.com/transitive-bullshit/chatgpt-api | only the endpoints from above are working. environment variables don't work by default, but can be configured (see above) |
| ✅ | anse | https://github.com/anse-app/anse | API Key & URL configurable in UI, Images also work |
| ✅ | shell_gpt | https://github.com/TheR1D/shell_gpt | OPENAI_API_HOST=http://127.0.0.1:5001 |
| ✅ | gpt-shell | https://github.com/jla/gpt-shell | OPENAI_API_BASE=http://127.0.0.1:5001/v1 |
| ✅ | gpt-discord-bot | https://github.com/openai/gpt-discord-bot | OPENAI_API_BASE=http://127.0.0.1:5001/v1 |
| ✅ | OpenAI for Notepad++ | https://github.com/Krazal/nppopenai | api_url=http://127.0.0.1:5001 in the config file, or environment variables |
| ✅ | vscode-openai | https://marketplace.visualstudio.com/items?itemName=AndrewButson.vscode-openai | OPENAI_API_BASE=http://127.0.0.1:5001/v1 |
| ✅❌ | langchain | https://github.com/hwchase17/langchain | OPENAI_API_BASE=http://127.0.0.1:5001/v1 even with a good 30B-4bit model the result is poor so far. It assumes zero shot python/json coding. Some model tailored prompt formatting improves results greatly. |
| ✅❌ | Auto-GPT | https://github.com/Significant-Gravitas/Auto-GPT | OPENAI_API_BASE=http://127.0.0.1:5001/v1 Same issues as langchain. Also assumes a 4k+ context |
| ✅❌ | babyagi | https://github.com/yoheinakajima/babyagi | OPENAI_API_BASE=http://127.0.0.1:5001/v1 |
| ❌ | guidance | https://github.com/microsoft/guidance | logit_bias and logprobs not yet supported |
| Compatibility | Application/Library | Website | Notes |
| ------------- | -------------------- | ------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------- |
| ✅❌ | openai-python | https://github.com/openai/openai-python | Use `OpenAI(base_url="http://127.0.0.1:5000/v1")`. Only the endpoints from above work. |
| ✅❌ | openai-node | https://github.com/openai/openai-node | Use `new OpenAI({baseURL: "http://127.0.0.1:5000/v1"})`. See example above. |
| ✅ | anse | https://github.com/anse-app/anse | API Key & URL configurable in UI, Images also work. |
| ✅ | shell_gpt | https://github.com/TheR1D/shell_gpt | OPENAI_API_HOST=http://127.0.0.1:5000 |
| ✅ | gpt-shell | https://github.com/jla/gpt-shell | OPENAI_API_BASE=http://127.0.0.1:5000/v1 |
| ✅ | gpt-discord-bot | https://github.com/openai/gpt-discord-bot | OPENAI_API_BASE=http://127.0.0.1:5000/v1 |
| ✅ | OpenAI for Notepad++ | https://github.com/Krazal/nppopenai | api_url=http://127.0.0.1:5000 in the config file, or environment variables. |
| ✅ | vscode-openai | https://marketplace.visualstudio.com/items?itemName=AndrewButson.vscode-openai | OPENAI_API_BASE=http://127.0.0.1:5000/v1 |
| ✅❌ | langchain | https://github.com/hwchase17/langchain | Use `base_url="http://127.0.0.1:5000/v1"`. Results depend on model and prompt formatting. |