Skip to Content
API ReferenceChat Completions

Chat Completions

Create a model response for a chat conversation. This endpoint is compatible with the OpenAI Chat Completions API.

Endpoint

POST /api/v1/chat/completions

Request Body

FieldTypeRequiredDescription
modelstringYesModel ID (e.g., llama-3.1-8b)
messagesarrayYesArray of message objects
temperaturenumberNoSampling temperature (0-2). Default: 1
max_tokensnumberNoMaximum tokens to generate
streambooleanNoEnable streaming responses. Default: false
top_pnumberNoNucleus sampling parameter (0-1). Default: 1
stopstring or arrayNoStop sequences

Message Object

FieldTypeRequiredDescription
rolestringYessystem, user, or assistant
contentstringYesThe message content

Example Request

curl -X POST https://inferexchange.com/api/v1/chat/completions \ -H "Authorization: Bearer sk-infer-YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is machine learning?"} ], "temperature": 0.7, "max_tokens": 500 }'

Response

{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1708300000, "model": "llama-3.1-8b", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Machine learning is a subset of artificial intelligence..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 25, "completion_tokens": 150, "total_tokens": 175 } }

Streaming

Set stream: true to receive responses as Server-Sent Events (SSE):

curl -X POST https://inferexchange.com/api/v1/chat/completions \ -H "Authorization: Bearer sk-infer-YOUR_KEY" \ -H "Content-Type: application/json" \ --no-buffer \ -d '{ "model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello"}], "stream": true }'

Each SSE event contains a JSON chunk:

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Hello"},"index":0}]} data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":" there"},"index":0}]} data: [DONE]

Billing

Requests are billed per token based on the model used. Both prompt and completion tokens are counted. See Models for per-model pricing.