Chat Completions

Create a model response for a chat conversation. This endpoint is compatible with the OpenAI Chat Completions API.

Endpoint


POST /api/v1/chat/completions

Request Body

Field	Type	Required	Description
`model`	string	Yes	Model ID (e.g., `llama-3.1-8b`)
`messages`	array	Yes	Array of message objects
`temperature`	number	No	Sampling temperature (0-2). Default: 1
`max_tokens`	number	No	Maximum tokens to generate
`stream`	boolean	No	Enable streaming responses. Default: false
`top_p`	number	No	Nucleus sampling parameter (0-1). Default: 1
`stop`	string or array	No	Stop sequences

Message Object

Field	Type	Required	Description
`role`	string	Yes	`system`, `user`, or `assistant`
`content`	string	Yes	The message content

Example Request


curl -X POST https://inferexchange.com/api/v1/chat/completions \
  -H "Authorization: Bearer sk-infer-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is machine learning?"}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

Response


{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1708300000,
  "model": "llama-3.1-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Machine learning is a subset of artificial intelligence..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Streaming

Set stream: true to receive responses as Server-Sent Events (SSE):


curl -X POST https://inferexchange.com/api/v1/chat/completions \
  -H "Authorization: Bearer sk-infer-YOUR_KEY" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "model": "llama-3.1-8b",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

Each SSE event contains a JSON chunk:


data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":" there"},"index":0}]}

data: [DONE]

Billing

Requests are billed per token based on the model used. Both prompt and completion tokens are counted. See Models for per-model pricing.