API Documentation Open Test Sandbox →

Welcome to the Chat Completions API. This endpoint provides an OpenAI-compatible interface for text and code generation. Requests are securely proxied without requiring API credentials from the client side.

Endpoint URL

POST https://ai.lumiltc.dev/api.php

Note: No authentication headers (like Authorization: Bearer ...) are required in your request. The proxy handles server-side authentication seamlessly.

Example Usage

You can execute the request using any HTTP client. Here is a standard curl example configured for streaming data:

curl -X POST "https://ai.lumiltc.dev/api.php" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
    "model": "z-ai/glm5",
    "messages": [
      {
        "role": "user",
        "content": "Hello! Can you help me write some code?"
      }
    ],
    "temperature": 1,
    "top_p": 1,
    "max_tokens": 16384,
    "seed": 42,
    "stream": true,
    "chat_template_kwargs": {
      "enable_thinking": true,
      "clear_thinking": false
    }
  }'

Payload Parameters

Parameter	Type	Description
`model`	string	Required. The specific model to run routing for (e.g., `z-ai/glm5`).
`messages`	array	Required. Array of message objects describing the conversation. Each object must contain `role` (user/assistant/system) and `content`.
`temperature`	number	Optional. Defaults to `1`. Higher values make output more random, lower values make it more focused and deterministic.
`top_p`	number	Optional. Defaults to `1`. Controls diversity via nucleus sampling.
`max_tokens`	integer	Optional. The maximum tokens allowed in the model's generated response.
`stream`	boolean	Optional. Defaults to `false`. When set to `true`, the service streams back partial response deltas using Server-Sent Events (SSE).
`chat_template_kwargs`	object	Optional. Special flags to configure model behavior (e.g., `enable_thinking`, `clear_thinking`).

Responses format

If stream: false is requested, the endpoint returns a single standard JSON payload upon generation completion.

If stream: true is requested, the endpoint streams continuous chunks over a kept-alive connection (SSE), terminating the stream with data: [DONE] when processing is complete.