Skip to main content
Conduit.im

Streaming Responses

Display tokens as they arrive using server-sent events for a real-time typing experience.

Overview

By default the Chat Completions API waits until the entire response is generated before returning it. With streaming enabled, the API sends tokens incrementally as server-sent events (SSE), so your application can start rendering immediately.

Without streaming

User waits several seconds with no feedback, then sees the full response at once.

With streaming

First tokens appear within milliseconds. Users read as the model writes.

Enabling Streaming

Add "stream": true to your request body. Everything else stays the same:

curl -X POST "https://api.conduit.im/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      { "role": "user", "content": "Explain recursion in three sentences." }
    ],
    "stream": true
  }'

SSE Event Format

The response is a stream of text/event-stream lines. Each event is prefixed with data:  and contains a JSON object. The stream ends with a special data: [DONE] sentinel:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709912400,"model":"gpt-4","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709912400,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"Recursion"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709912400,"model":"gpt-4","choices":[{"index":0,"delta":{"content":" is"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1709912400,"model":"gpt-4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Key differences from non-streaming responses:

FieldNon-streamingStreaming
objectchat.completionchat.completion.chunk
choices[].messageFull message objectNot present
choices[].deltaNot presentIncremental content token
usageIncluded in responseNot included in chunks

Node.js Example

Use the Fetch API's ReadableStream to consume chunks as they arrive:

const response = await fetch("https://api.conduit.im/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.CONDUIT_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "gpt-4",
    messages: [{ role: "user", content: "Explain recursion." }],
    stream: true,
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value, { stream: true });

  for (const line of chunk.split("\n")) {
    if (!line.startsWith("data: ")) continue;

    const payload = line.slice(6);
    if (payload === "[DONE]") break;

    const parsed = JSON.parse(payload);
    const token = parsed.choices[0]?.delta?.content || "";
    process.stdout.write(token);
  }
}
console.log(); // trailing newline

Python Example

Use requests with stream=True and iterate over lines:

import os, json, requests

response = requests.post(
    "https://api.conduit.im/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {os.environ['CONDUIT_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "gpt-4",
        "messages": [{"role": "user", "content": "Explain recursion."}],
        "stream": True,
    },
    stream=True,
)

for line in response.iter_lines(decode_unicode=True):
    if not line or not line.startswith("data: "):
        continue

    payload = line[len("data: "):]
    if payload == "[DONE]":
        break

    chunk = json.loads(payload)
    token = chunk["choices"][0]["delta"].get("content", "")
    print(token, end="", flush=True)

print()  # trailing newline

Browser & React Example

Important: Never call the Conduit.im API directly from the browser — your API key would be exposed. Proxy requests through your own backend and stream the response to the client.

The pattern below assumes you have a backend route at /api/chat that forwards the stream:

// React component — streams from your backend proxy
import { useState } from "react";

function Chat() {
  const [reply, setReply] = useState("");
  const [loading, setLoading] = useState(false);

  async function send(userMessage) {
    setReply("");
    setLoading(true);

    const response = await fetch("/api/chat", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        model: "gpt-4",
        messages: [{ role: "user", content: userMessage }],
        stream: true,
      }),
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    let accumulated = "";

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value, { stream: true });
      for (const line of chunk.split("\n")) {
        if (!line.startsWith("data: ")) continue;
        const payload = line.slice(6);
        if (payload === "[DONE]") break;

        const parsed = JSON.parse(payload);
        const token = parsed.choices[0]?.delta?.content || "";
        accumulated += token;
        setReply(accumulated);   // re-render with each token
      }
    }

    setLoading(false);
  }

  return (
    <div>
      <p>{reply}</p>
      <button onClick={() => send("Hello!")} disabled={loading}>
        {loading ? "Generating..." : "Send"}
      </button>
    </div>
  );
}

Cancelling a Stream

Use an AbortController to let users stop generation mid-stream. When the signal fires, the connection closes and no further tokens are received:

const controller = new AbortController();

// Pass the signal to fetch
const response = await fetch("https://api.conduit.im/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "gpt-4",
    messages,
    stream: true,
  }),
  signal: controller.signal,
});

// Cancel at any time — for example on a button click
stopButton.addEventListener("click", () => controller.abort());

Note: Aborting a stream stops token delivery but you are still billed for tokens generated up to the point of cancellation.

Handling Edge Cases

A production-quality streaming parser needs to handle a few subtleties:

Partial chunks

A single read() call may return a partial SSE line. Buffer data until you see a complete \n before parsing.

Multiple events per chunk

A single chunk can contain several data: lines. Always split on newlines and process each line separately.

Empty deltas

The first and last chunks often have an empty delta.content. Use a fallback like delta?.content || "" to avoid errors.

Network errors mid-stream

Wrap the read loop in a try/catch. If the connection drops, you can display the partial response along with a "generation interrupted" indicator.

Robust SSE Parser

Here's a reusable helper that handles partial chunks and error recovery:

async function* streamTokens(response) {
  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = "";

  try {
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      buffer += decoder.decode(value, { stream: true });
      const lines = buffer.split("\n");

      // Keep the last (potentially incomplete) line in the buffer
      buffer = lines.pop() || "";

      for (const line of lines) {
        if (!line.startsWith("data: ")) continue;
        const payload = line.slice(6).trim();
        if (payload === "[DONE]") return;

        const parsed = JSON.parse(payload);
        const token = parsed.choices[0]?.delta?.content;
        if (token) yield token;
      }
    }
  } finally {
    reader.releaseLock();
  }
}

// Usage
const response = await fetch(url, options);
let fullText = "";
for await (const token of streamTokens(response)) {
  process.stdout.write(token);
  fullText += token;
}

When to Use Streaming

Use caseStreaming?Why
Chat / conversational UIYesUsers expect a real-time typing effect
Long-form content generationYesAvoids long timeouts on large outputs
Background batch processingNoNo user waiting; simpler code with buffered response
Structured data extractionNoJSON needs to be complete before parsing

Next Steps