How to Build Real‑Time LLM Streaming in the Browser with Fetch
This article explains the mechanism of HTTP API streaming for large language models and shows step‑by‑step how front‑end developers can use the Fetch API, readable streams, and incremental UI updates to deliver real‑time, progressive results while handling errors and connection interruptions.
What Is HTTP API Streaming?
HTTP API streaming sends response data in chunks as soon as the large language model generates it, allowing the front‑end to display partial results without waiting for the complete response.
Basic Streaming Flow
Client Request: The front‑end sends a POST request with the prompt and parameters.
Server Processing and Chunked Response: The server begins generating text and streams each chunk to the client.
Client Receives and Processes Chunks: The client continuously reads each chunk from the stream.
Connection Close: After generation finishes, the server closes the connection.
Implementing LLM HTTP API Streaming
Below is a typical front‑end implementation using fetch to initiate a streaming call.
const fetchStreamData = async (prompt) => {
const response = await fetch('https://api.openai.com/v1/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer YOUR_API_KEY`
},
body: JSON.stringify({
model: 'gpt-4',
prompt: prompt,
stream: true // enable streaming
})
});
if (!response.ok) {
throw new Error('Network response was not ok');
}
const reader = response.body.getReader();
const decoder = new TextDecoder('utf-8');
let done = false;
while (!done) {
const { value, done: readerDone } = await reader.read();
done = readerDone;
const chunk = decoder.decode(value, { stream: true });
console.log(chunk); // process each chunk
}
};Request Settings
Use fetch with stream: true to tell the server to stream.
The request body includes the model ID, prompt, and other required parameters such as the API key.
Reading Stream Data
Call response.body.getReader() to obtain a reader that can read the response chunk by chunk.
Use TextDecoder to decode byte data into text.
Processing Chunks
Repeatedly call reader.read() to get value (bytes) and done (stream end flag).
The decoded chunk can be displayed or processed immediately.
How the Front‑End Handles Streaming Responses
When the back‑end returns a streamed response, the front‑end can update the UI incrementally, handle interruptions, concatenate chunks, and improve user interaction.
1. Incremental UI Updates
const chatBox = document.getElementById('chat-box');
const updateChat = (text) => {
chatBox.innerHTML += `<p>${text}</p>`;
};
while (!done) {
const { value, done: readerDone } = await reader.read();
const chunk = decoder.decode(value, { stream: true });
updateChat(chunk);
}2. Handling Interruptions or Errors
if (!response.ok) {
console.error('Error with the request');
return;
}
reader.read().then(processStream).catch(error => {
console.error('Error while reading stream:', error);
});3. Concatenating Stream Data
let fullResponse = '';
while (!done) {
const { value, done: readerDone } = await reader.read();
const chunk = decoder.decode(value, { stream: true });
fullResponse += chunk; // build complete response
}4. Auto‑Scroll and Interaction Optimisation
const scrollToBottom = () => {
chatBox.scrollTop = chatBox.scrollHeight;
};
updateChat(chunk);
scrollToBottom(); // keep view at latest contentAdvantages of Streaming Calls
Improved User Experience: Users see partial results instantly, reducing perceived latency.
Reduced Server Load: Streaming allows the server to send data incrementally instead of generating a large payload at once.
Enhanced Interactivity: Real‑time feedback enables richer conversational or assistive applications.
Conclusion
HTTP API streaming provides an efficient, real‑time interaction model for large language models. By processing streamed chunks, updating the UI incrementally, handling errors, and concatenating data, front‑end developers can deliver smoother experiences in chatbots, assistants, and other interactive applications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
