OpenAI Chat Completions

Overview

chat/completions is the most common API endpoint for LLMs, which takes a conversation list composed of multiple messages as input to get model responses. This endpoint follows the OpenAI Chat Completions API format, making it easy to integrate with existing OpenAI-compatible code.

Important Notes

Model DifferencesDifferent model providers may support different request parameters and return different response fields. We strongly recommend consulting the model catalog for complete parameter lists and usage instructions for each model.

Response Pass-through PrincipleWisGate typically does not modify model responses outside of reverse format, ensuring you receive response content consistent with the original API provider.

Streaming SupportWisGate supports Server-Sent Events (SSE) for streaming responses. Set "stream": true in your request to enable real-time streaming, which is useful for chat applications.

Auto-Generated DocumentationThe request parameters and response format are automatically generated from the OpenAPI specification. All parameters, their types, descriptions, defaults, and examples are pulled directly from openapi.json. Scroll down to see the interactive API reference.

FAQ

How to handle rate limits?

When encountering 429 Too Many Requests, we recommend implementing exponential backoff retry:

import time
import random

def chat_with_retry(messages, max_retries=3):
    for i in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4",
                messages=messages
            )
            return response
        except RateLimitError:
            if i < max_retries - 1:
                wait_time = (2 ** i) + random.random()
                time.sleep(wait_time)
            else:
                raise

How to maintain conversation context?

Include the complete conversation history in the messages array:

conversation_history = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a programming language..."},
    {"role": "user", "content": "What are its advantages?"}
]

response = client.chat.completions.create(
    model="gpt-4",
    messages=conversation_history
)

What does finish_reason mean?

Value	Meaning
`stop`	Natural completion
`length`	Reached max_tokens limit
`content_filter`	Triggered content filter
`function_call`	Model called a function

How to control costs?

Use max_tokens to limit output length
Choose appropriate models (e.g., GPT-3.5 Turbo is more economical than GPT-4)
Streamline prompts, avoid redundant context
Monitor token consumption in the usage field of responses

How to use streaming?

Enable streaming by setting stream: true:

stream = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Text Models

Image Models

Video Models

User

Error Handling

OpenAI Chat Completions

Overview

Important Notes

FAQ

How to handle rate limits?

How to maintain conversation context?

What does finish_reason mean?

How to control costs?

How to use streaming?

Text Models

Image Models

Video Models

User

Error Handling

Documentation Index

​Overview

​Important Notes

​FAQ

​How to handle rate limits?

​How to maintain conversation context?

​What does finish_reason mean?

​How to control costs?

​How to use streaming?

Overview

Important Notes

FAQ

How to handle rate limits?

How to maintain conversation context?

What does finish_reason mean?

How to control costs?

How to use streaming?