Welcome to the Chat Completions API. This endpoint provides an OpenAI-compatible interface for text and code generation. Requests are securely proxied without requiring API credentials from the client side.
POST https://ai.lumiltc.dev/api.php
Authorization: Bearer ...) are required in your request. The proxy handles server-side authentication seamlessly.
You can execute the request using any HTTP client. Here is a standard curl example configured for streaming data:
curl -X POST "https://ai.lumiltc.dev/api.php" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{
"model": "z-ai/glm5",
"messages": [
{
"role": "user",
"content": "Hello! Can you help me write some code?"
}
],
"temperature": 1,
"top_p": 1,
"max_tokens": 16384,
"seed": 42,
"stream": true,
"chat_template_kwargs": {
"enable_thinking": true,
"clear_thinking": false
}
}'
| Parameter | Type | Description |
|---|---|---|
model |
string | Required. The specific model to run routing for (e.g., z-ai/glm5). |
messages |
array | Required. Array of message objects describing the conversation. Each object must contain role (user/assistant/system) and content. |
temperature |
number | Optional. Defaults to 1. Higher values make output more random, lower values make it more focused and deterministic. |
top_p |
number | Optional. Defaults to 1. Controls diversity via nucleus sampling. |
max_tokens |
integer | Optional. The maximum tokens allowed in the model's generated response. |
stream |
boolean | Optional. Defaults to false. When set to true, the service streams back partial response deltas using Server-Sent Events (SSE). |
chat_template_kwargs |
object | Optional. Special flags to configure model behavior (e.g., enable_thinking, clear_thinking). |
If stream: false is requested, the endpoint returns a single standard JSON payload upon generation completion.
If stream: true is requested, the endpoint streams continuous chunks over a kept-alive connection (SSE), terminating the stream with data: [DONE] when processing is complete.