API Documentation Open Test Sandbox →

Welcome to the Chat Completions API. This endpoint provides an OpenAI-compatible interface for text and code generation. Requests are securely proxied without requiring API credentials from the client side.

Endpoint URL

POST https://ai.lumiltc.dev/api.php

Note: No authentication headers (like Authorization: Bearer ...) are required in your request. The proxy handles server-side authentication seamlessly.

Example Usage

You can execute the request using any HTTP client. Here is a standard curl example configured for streaming data:

curl -X POST "https://ai.lumiltc.dev/api.php" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
    "model": "z-ai/glm5",
    "messages": [
      {
        "role": "user",
        "content": "Hello! Can you help me write some code?"
      }
    ],
    "temperature": 1,
    "top_p": 1,
    "max_tokens": 16384,
    "seed": 42,
    "stream": true,
    "chat_template_kwargs": {
      "enable_thinking": true,
      "clear_thinking": false
    }
  }'

Payload Parameters

Parameter Type Description
model string Required. The specific model to run routing for (e.g., z-ai/glm5).
messages array Required. Array of message objects describing the conversation. Each object must contain role (user/assistant/system) and content.
temperature number Optional. Defaults to 1. Higher values make output more random, lower values make it more focused and deterministic.
top_p number Optional. Defaults to 1. Controls diversity via nucleus sampling.
max_tokens integer Optional. The maximum tokens allowed in the model's generated response.
stream boolean Optional. Defaults to false. When set to true, the service streams back partial response deltas using Server-Sent Events (SSE).
chat_template_kwargs object Optional. Special flags to configure model behavior (e.g., enable_thinking, clear_thinking).

Responses format

If stream: false is requested, the endpoint returns a single standard JSON payload upon generation completion.

If stream: true is requested, the endpoint streams continuous chunks over a kept-alive connection (SSE), terminating the stream with data: [DONE] when processing is complete.