Documentation Index
Fetch the complete documentation index at: https://wisdom-docs.juheapi.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Gemini API endpoint allows you to generate content using Google’s Gemini models in their native format. This endpoint follows the official Gemini API specification, making it easy to integrate with existing Gemini-compatible code.Quick Start
Simply replace the Base URL and API Key in the official SDK or requests to use it:- Base URL:
https://api.wisgate.ai(replacegenerativelanguage.googleapis.com) - API Key: Replace
$GEMINI_API_KEYwith your$WISDOM_GATE_KEY
Basic Example: Text Generation
Important Notes
Response Pass-through PrincipleWisGate typically does not modify model responses outside of reverse format, ensuring you receive response content consistent with the original Gemini API provider.
Auto-Generated DocumentationThe request parameters and response format are automatically generated from the OpenAPI specification. All parameters, their types, descriptions, defaults, and examples are pulled directly from
openapi.json. Scroll down to see the interactive API reference.FAQ
How to control Thinking?
Gemini models support a “thinking” process to improve reasoning capabilities. The control method depends on the model version. For details, please refer to the official documentation: Gemini Thinking GuideGemini 3 Series (e.g., gemini-3-pro-preview)
Use the thinkingLevel parameter to control thinking intensity ("LOW" or "HIGH").
Gemini 2.5 Series (e.g., gemini-2.5-pro)
Use the thinkingBudget parameter to control the Token budget for thinking.
0: Disable thinking.-1: Dynamic thinking (model decides automatically, default).> 0: Set a specific Token limit (e.g., 1024).
How to use Streaming Responses?
Streaming responses allow you to receive results incrementally as the model generates content, reducing perceived latency. For details, please refer to the official documentation: Gemini Text Generation - Streaming Responses Note: The URL must point tostreamGenerateContent and it is recommended to add ?alt=sse to use the Server-Sent Events format.
How to maintain conversation context?
Include the complete conversation history in thecontents array:
What does finishReason mean?
ThefinishReason field in the response indicates why the model stopped generating:
| Value | Meaning |
|---|---|
STOP | Natural completion |
MAX_TOKENS | Reached maxOutputTokens limit |
SAFETY | Triggered safety filter |
RECITATION | Detected recitation of training data |
OTHER | Other reason |
How to control costs?
- Use
maxOutputTokensingenerationConfigto limit output length - Choose appropriate models (e.g.,
gemini-2.5-flashis more economical thangemini-3-pro-preview) - Streamline prompts, avoid redundant context
- Monitor token consumption in the
usageMetadatafield of responses - Use thinking budgets wisely for reasoning models to control reasoning token usage
How to use multimodal input (text and images)?
Gemini supports multimodal input through theparts array. You can include both text and images in a single request: