Documentation Index
Fetch the complete documentation index at: https://wisdom-docs.juheapi.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
chat/completions is the most common API endpoint for LLMs, which takes a conversation list composed of multiple messages as input to get model responses. This endpoint follows the OpenAI Chat Completions API format, making it easy to integrate with existing OpenAI-compatible code.
Important Notes
Response Pass-through PrincipleWisGate typically does not modify model responses outside of reverse format, ensuring you receive response content consistent with the original API provider.
Auto-Generated DocumentationThe request parameters and response format are automatically generated from the OpenAPI specification. All parameters, their types, descriptions, defaults, and examples are pulled directly from
openapi.json. Scroll down to see the interactive API reference.FAQ
How to handle rate limits?
When encountering429 Too Many Requests, we recommend implementing exponential backoff retry:
How to maintain conversation context?
Include the complete conversation history in themessages array:
What does finish_reason mean?
| Value | Meaning |
|---|---|
stop | Natural completion |
length | Reached max_tokens limit |
content_filter | Triggered content filter |
function_call | Model called a function |
How to control costs?
- Use
max_tokensto limit output length - Choose appropriate models (e.g., GPT-3.5 Turbo is more economical than GPT-4)
- Streamline prompts, avoid redundant context
- Monitor token consumption in the
usagefield of responses
How to use streaming?
Enable streaming by settingstream: true: