Chat Completions
Create a model response for a chat conversation. This endpoint is compatible with the OpenAI Chat Completions API.
Endpoint
POST /api/v1/chat/completionsRequest Body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g., llama-3.1-8b) |
messages | array | Yes | Array of message objects |
temperature | number | No | Sampling temperature (0-2). Default: 1 |
max_tokens | number | No | Maximum tokens to generate |
stream | boolean | No | Enable streaming responses. Default: false |
top_p | number | No | Nucleus sampling parameter (0-1). Default: 1 |
stop | string or array | No | Stop sequences |
Message Object
| Field | Type | Required | Description |
|---|---|---|---|
role | string | Yes | system, user, or assistant |
content | string | Yes | The message content |
Example Request
curl -X POST https://inferexchange.com/api/v1/chat/completions \
-H "Authorization: Bearer sk-infer-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"}
],
"temperature": 0.7,
"max_tokens": 500
}'Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1708300000,
"model": "llama-3.1-8b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Machine learning is a subset of artificial intelligence..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
}
}Streaming
Set stream: true to receive responses as Server-Sent Events (SSE):
curl -X POST https://inferexchange.com/api/v1/chat/completions \
-H "Authorization: Bearer sk-infer-YOUR_KEY" \
-H "Content-Type: application/json" \
--no-buffer \
-d '{
"model": "llama-3.1-8b",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true
}'Each SSE event contains a JSON chunk:
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":" there"},"index":0}]}
data: [DONE]Billing
Requests are billed per token based on the model used. Both prompt and completion tokens are counted. See Models for per-model pricing.