Best Practices
Patterns and recommendations for building reliable, cost-effective, and secure applications with the Conduit.im API.
API Key Security
Never expose your API key in client-side code. Keys embedded in JavaScript bundles, mobile apps, or public repositories can be extracted and misused.
Use environment variables
Store keys in environment variables or a secrets manager. Never hard-code them in source files.
// Good — read from environment
const API_KEY = process.env.CONDUIT_API_KEY;
// Bad — hard-coded secret
const API_KEY = "cnd_live_abc123...";Proxy through your backend
Browser and mobile clients should call your own server, which then forwards the request to Conduit.im with the API key attached server-side.
Use separate keys per environment
Create distinct keys for development, staging, and production. If a dev key leaks, your production traffic is unaffected.
Rotate keys regularly
Rotate keys periodically and revoke any that may have been compromised. You can manage keys from the API Keys dashboard.
Error Handling
Always check the HTTP status code and handle errors gracefully:
async function callConduit(body) {
const res = await fetch("https://api.conduit.im/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.CONDUIT_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
if (!res.ok) {
const { error } = await res.json();
// Non-retryable client errors — surface to the user
if (res.status < 500 && res.status !== 429) {
throw new Error(`[${error.code}] ${error.message}`);
}
// Retryable — implement back-off (see Rate Limiting guide)
throw new RetryableError(error.message, res.status);
}
return await res.json();
}- Retry
429and5xxerrors with exponential back-off - Never retry
401or402— these require user action - Log the
requestIdfor debugging and support
Cost Optimisation
Set spending limits
Configure per-key spending limits (daily, weekly, or monthly) to cap costs and prevent runaway usage.
Use max_tokens
Always set max_tokens to the maximum you actually need. This prevents unexpectedly long (and expensive) responses.
{
"model": "gpt-4",
"messages": [...],
"max_tokens": 500 // cap the response length
}Choose the right model
More capable models cost more per token. Use a smaller model for simple tasks (classification, extraction) and reserve larger models for complex reasoning. Browse the Models page to compare pricing.
Trim conversation history
Every token in the messages array counts towards input costs. For long conversations, keep a sliding window of recent messages or summarise older turns.
Monitor usage
Use the Usage API to track your balance and transaction history. Set up alerts when your balance drops below a threshold.
Performance
Use streaming for user-facing apps
Streaming delivers tokens as they are generated, dramatically reducing perceived latency. Users see the first word in milliseconds rather than waiting several seconds.
Set timeouts
Always set a request timeout so a slow upstream provider doesn't block your application indefinitely. Use an AbortController in JavaScript or the timeout parameter in Python's requests.
// JavaScript — 30-second timeout
const controller = new AbortController();
setTimeout(() => controller.abort(), 30_000);
const res = await fetch(url, {
...options,
signal: controller.signal,
});Keep prompts concise
Shorter prompts mean faster time-to-first-token and lower costs. Put essential context first and remove filler text.
Cache repeated requests
If multiple users ask the same question, cache the response on your server. This eliminates duplicate API calls and reduces both latency and cost.
Prompt Engineering
Use system messages effectively
Set the tone, persona, and constraints in the system message. This is more reliable than putting instructions in the user message.
{
"messages": [
{
"role": "system",
"content": "You are a customer support agent for Acme Corp. Be concise and helpful. Only answer questions about Acme products. If unsure, say so."
},
{
"role": "user",
"content": "How do I reset my password?"
}
]
}Be specific about output format
If you need JSON, bullet points, or a particular structure, say so explicitly in the prompt. This reduces post-processing and improves reliability.
Use temperature wisely
Lower temperature (0.0–0.3) for factual or deterministic tasks. Higher temperature (0.7–1.0) for creative writing or brainstorming.
Reliability
Implement retry with back-off
Transient failures happen. Use exponential back-off with jitter for 429 and 5xx responses.
Have a fallback model
If your primary model is temporarily unavailable, fall back to an alternative. Since Conduit.im provides all models through the same interface, switching is a one-line change.
const MODELS = ["gpt-4", "claude-3-sonnet", "gemini-pro"];
async function callWithFallback(messages) {
for (const model of MODELS) {
try {
return await callConduit({ model, messages });
} catch (err) {
if (err.status === 404) continue; // model unavailable, try next
throw err; // non-model error, don't mask it
}
}
throw new Error("All models unavailable");
}Log request IDs
Every error response includes a requestId. Log it so you can reference it when contacting support.
Quick Reference Checklist
- ✓API key stored in environment variable, never in client code
- ✓All API calls proxied through your own backend
- ✓Separate API keys for dev, staging, and production
- ✓Spending limits configured on every key
- ✓
max_tokensset on every request - ✓Retry logic with exponential back-off and jitter
- ✓Request timeout configured (e.g., 30 seconds)
- ✓Streaming enabled for user-facing interfaces
- ✓Conversation history trimmed to control costs
- ✓Error
requestIdvalues logged for debugging
Next Steps
Ready to put these practices into action? Explore further: