Streaming Responses
Display tokens as they arrive using server-sent events for a real-time typing experience.
Overview
By default the Chat Completions API waits until the entire response is generated before returning it. With streaming enabled, the API sends tokens incrementally as server-sent events (SSE), so your application can start rendering immediately.
Without streaming
User waits several seconds with no feedback, then sees the full response at once.
With streaming
First tokens appear within milliseconds. Users read as the model writes.
Enabling Streaming
Add "stream": true to your request body. Everything else stays the same:
curl -X POST "https://api.conduit.im/v1/chat/completions" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{ "role": "user", "content": "Explain recursion in three sentences." }
],
"stream": true
}'SSE Event Format
The response is a stream of text/event-stream lines. Each event is prefixed with data: and contains a JSON object. The stream ends with a special data: [DONE] sentinel:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709912400,"model":"gpt-4","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709912400,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"Recursion"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709912400,"model":"gpt-4","choices":[{"index":0,"delta":{"content":" is"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709912400,"model":"gpt-4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Key differences from non-streaming responses:
| Field | Non-streaming | Streaming |
|---|---|---|
| object | chat.completion | chat.completion.chunk |
| choices[].message | Full message object | Not present |
| choices[].delta | Not present | Incremental content token |
| usage | Included in response | Not included in chunks |
Node.js Example
Use the Fetch API's ReadableStream to consume chunks as they arrive:
const response = await fetch("https://api.conduit.im/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.CONDUIT_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-4",
messages: [{ role: "user", content: "Explain recursion." }],
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
for (const line of chunk.split("\n")) {
if (!line.startsWith("data: ")) continue;
const payload = line.slice(6);
if (payload === "[DONE]") break;
const parsed = JSON.parse(payload);
const token = parsed.choices[0]?.delta?.content || "";
process.stdout.write(token);
}
}
console.log(); // trailing newlinePython Example
Use requests with stream=True and iterate over lines:
import os, json, requests
response = requests.post(
"https://api.conduit.im/v1/chat/completions",
headers={
"Authorization": f"Bearer {os.environ['CONDUIT_API_KEY']}",
"Content-Type": "application/json",
},
json={
"model": "gpt-4",
"messages": [{"role": "user", "content": "Explain recursion."}],
"stream": True,
},
stream=True,
)
for line in response.iter_lines(decode_unicode=True):
if not line or not line.startswith("data: "):
continue
payload = line[len("data: "):]
if payload == "[DONE]":
break
chunk = json.loads(payload)
token = chunk["choices"][0]["delta"].get("content", "")
print(token, end="", flush=True)
print() # trailing newlineBrowser & React Example
Important: Never call the Conduit.im API directly from the browser — your API key would be exposed. Proxy requests through your own backend and stream the response to the client.
The pattern below assumes you have a backend route at /api/chat that forwards the stream:
// React component — streams from your backend proxy
import { useState } from "react";
function Chat() {
const [reply, setReply] = useState("");
const [loading, setLoading] = useState(false);
async function send(userMessage) {
setReply("");
setLoading(true);
const response = await fetch("/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "gpt-4",
messages: [{ role: "user", content: userMessage }],
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let accumulated = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
for (const line of chunk.split("\n")) {
if (!line.startsWith("data: ")) continue;
const payload = line.slice(6);
if (payload === "[DONE]") break;
const parsed = JSON.parse(payload);
const token = parsed.choices[0]?.delta?.content || "";
accumulated += token;
setReply(accumulated); // re-render with each token
}
}
setLoading(false);
}
return (
<div>
<p>{reply}</p>
<button onClick={() => send("Hello!")} disabled={loading}>
{loading ? "Generating..." : "Send"}
</button>
</div>
);
}Cancelling a Stream
Use an AbortController to let users stop generation mid-stream. When the signal fires, the connection closes and no further tokens are received:
const controller = new AbortController();
// Pass the signal to fetch
const response = await fetch("https://api.conduit.im/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-4",
messages,
stream: true,
}),
signal: controller.signal,
});
// Cancel at any time — for example on a button click
stopButton.addEventListener("click", () => controller.abort());Note: Aborting a stream stops token delivery but you are still billed for tokens generated up to the point of cancellation.
Handling Edge Cases
A production-quality streaming parser needs to handle a few subtleties:
Partial chunks
A single read() call may return a partial SSE line. Buffer data until you see a complete \n before parsing.
Multiple events per chunk
A single chunk can contain several data: lines. Always split on newlines and process each line separately.
Empty deltas
The first and last chunks often have an empty delta.content. Use a fallback like delta?.content || "" to avoid errors.
Network errors mid-stream
Wrap the read loop in a try/catch. If the connection drops, you can display the partial response along with a "generation interrupted" indicator.
Robust SSE Parser
Here's a reusable helper that handles partial chunks and error recovery:
async function* streamTokens(response) {
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
try {
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
// Keep the last (potentially incomplete) line in the buffer
buffer = lines.pop() || "";
for (const line of lines) {
if (!line.startsWith("data: ")) continue;
const payload = line.slice(6).trim();
if (payload === "[DONE]") return;
const parsed = JSON.parse(payload);
const token = parsed.choices[0]?.delta?.content;
if (token) yield token;
}
}
} finally {
reader.releaseLock();
}
}
// Usage
const response = await fetch(url, options);
let fullText = "";
for await (const token of streamTokens(response)) {
process.stdout.write(token);
fullText += token;
}When to Use Streaming
| Use case | Streaming? | Why |
|---|---|---|
| Chat / conversational UI | Yes | Users expect a real-time typing effect |
| Long-form content generation | Yes | Avoids long timeouts on large outputs |
| Background batch processing | No | No user waiting; simpler code with buffered response |
| Structured data extraction | No | JSON needs to be complete before parsing |
Next Steps
You now know how to stream responses from the Conduit.im API. Explore further: