Mobile Development 16 min read

How to Eliminate Text Lag in iOS LLM Chat Apps with Smart Buffering and Typewriter Animation

This article explains how to eliminate stuttered text output in iOS chat applications powered by local LLMs using the MNN framework, by introducing a three‑layer optimization—smart stream buffering, UI update throttling with batch processing, and a typewriter‑style animation—to achieve smooth, near‑online responsiveness.

DaTaobao Tech

Aug 15, 2025

How to Eliminate Text Lag in iOS LLM Chat Apps with Smart Buffering and Typewriter Animation

Background

When deploying a large language model (LLM) on iOS with the MNN inference framework, directly feeding model output to the UI causes noticeable stutter and a harsh text appearance, far from the smooth typing effect users expect from online services like ChatGPT.

Problem Analysis

The stutter originates from three core issues:

Model output speed vs. UI refresh rate mismatch : Fast inference accumulates text and updates the UI in large bursts.

Excessive UI refresh frequency : Each character triggers a UI update, overloading the main thread.

Lack of visual streaming animation : Text appears instantly without a gradual typing effect.

Three‑Layer Collaborative Optimization

The solution introduces a pipeline: raw output → smart buffer → batch UI update → animation rendering → UI . The three layers are:

1. OptimizedLlmStreamBuffer (C++)

Implements a custom std::streambuf that buffers characters and flushes when a size threshold (64 bytes) or punctuation trigger is reached.

class OptimizedLlmStreamBuffer : public std::streambuf {
private:
    static const size_t BUFFER_THRESHOLD = 64;
    std::string buffer_;
public:
    using CallBack = std::function<void(const char* str, size_t len)>;
    OptimizedLlmStreamBuffer(CallBack callback);
protected:
    std::streamsize xsputn(const char* s, std::streamsize n) override;
private:
    void flushBuffer();
    bool checkForFlushTriggers(const char* s, std::streamsize n);
    bool checkUnicodePunctuation();
};

2. UIUpdateOptimizer (Swift)

Uses a Swift 5.5 actor to collect UI update requests, applying a dual‑trigger strategy: batch size of 5 updates or a 30 ms timeout.

actor UIUpdateOptimizer {
    static let shared = UIUpdateOptimizer()
    private var pendingUpdates: [String] = []
    private var lastFlushTime = Date()
    private var flushTask: Task<Void, Never>?
    private let batchSize = 5
    private let flushInterval: TimeInterval = 0.03
    func addUpdate(_ content: String, completion: @escaping (String) -> Void) { /* … */ }
    private func scheduleFlush(completion: @escaping (String) -> Void) { /* … */ }
    private func flushUpdates(completion: @escaping (String) -> Void) { /* … */ }
}

3. LLMMessageTextView (SwiftUI)

Provides a conditional typewriter animation for AI messages, activating only for assistant messages longer than five characters. It handles streaming text, markdown rendering, and automatic resource cleanup.

struct LLMMessageTextView: View {
    let text: String?
    let isAssistantMessage: Bool
    let isStreamingMessage: Bool
    @State private var displayedText = ""
    @State private var animationTimer: Timer?
    private let typingSpeed: TimeInterval = 0.015
    var body: some View {
        if let text = text, isAssistantMessage && isStreamingMessage && shouldUseTypewriter {
            typewriterView(text)
        } else {
            staticView(text)
        }
    }
    // start/stop animation, append characters, etc.
}

Result

Combining the three layers eliminates the previous bottlenecks, delivering a fluid, near‑online typing experience for local LLM chat apps on iOS. The GitHub repository for the full project is MNNLLMChat .