Mobile Development 18 min read

Overview of iOS Live Streaming Workflow

This article provides a comprehensive overview of the iOS live‑streaming workflow, detailing the six stages—capture, processing, encoding, packaging, network transmission, and playback—along with sample code for video/audio capture, encoding settings, and RTMP transmission.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Overview of iOS Live Streaming Workflow

iOS Live Streaming Workflow Overview

The purpose of this article is to explain the stages a live stream goes through on iOS, from capturing raw media to delivering it to viewers.

📌 Introduction

The live‑streaming process is divided into six phases: Capture, Processing, Encoding, Packaging, Network Transmission, and Playback.

Capture

Processing

Encoding

Packaging

Network Transmission

Playback

📷 Capture

Capture includes both video and audio. On iOS, AVFoundation is used for camera video, ReplayKit for screen recording, and Audio Unit for audio.

Video Capture: Camera

Core Classes AVCaptureXXX

The main classes for camera capture are shown below.

Sample code:

// 1. Create a session
var session = AVCaptureSession.init()
// 2. Get the camera device
guard let device = AVCaptureDevice.default(for: .video) else {
    print("Failed to get back camera")
    return
}
// 3. Create input
let input = try AVCaptureDeviceInput(device: device)
if session.canAddInput(input) {
    session.addInput(input)
}
// 4. Create output
let videoOutput = AVCaptureVideoDataOutput.init()
let pixelBufferFormat = kCVPixelBufferPixelFormatTypeKey as String
// Set YUV video format
videoOutput.videoSettings = [pixelBufferFormat: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange]
videoOutput.setSampleBufferDelegate(self, queue: outputQueue)
if session.canAddOutput(videoOutput) {
    session.addOutput(videoOutput)
}
// 5. Set preview layer
let previewViewLayer = videoConfig.previewView.layer
previewViewLayer.backgroundColor = UIColor.black.cgColor
let layerFrame = previewViewLayer.bounds
let videoPreviewLayer = AVCaptureVideoPreviewLayer(session: session)
videoPreviewLayer.frame = layerFrame
videoConfig.previewView.layer.insertSublayer(videoPreviewLayer, at: 0)
// 6. Process video frames in delegate
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
    // TODO: handle video frame
}

Color Sub‑sampling: YUV

Media data is usually compressed using color sub‑sampling. The code sets the output format to kCVPixelFormatType_420YpCbCr8BiPlanarFullRange , where 420 indicates 4:2:0 chroma subsampling and YpCbCr represents the YUV format.

YpCbCr (YUV) – Y is luminance, Cb and Cr are chroma components.

Human eyes are more sensitive to luminance, allowing chroma to be heavily compressed.

Video Capture: Screen Recording

Screen recording can be done inside an app (captures only the app’s UI) or outside (captures the whole device screen, useful for game streaming).

1. In‑App Capture

// iOS screen recording uses ReplayKit
import ReplayKit
// Start recording
RPScreenRecorder.shared().startCapture { sampleBuffer, bufferType, err in
    // handle sample buffer
} completionHandler: { err in
    // handle error
}
// Stop recording
RPScreenRecorder.shared().stopCapture { err in
    // handle error
}

Tips for in‑app capture:

UI that should not be recorded can be placed on a custom UIWindow .

Enable the front‑camera preview via RPScreenRecorder.shared().cameraPreviewView and add it to the view hierarchy.

2. Out‑of‑App Capture

Requires a Broadcast Upload Extension that provides a SampleHandler class to receive video data.

class SampleHandler: RPBroadcastSampleHandler {
    func sohuSportUserDefaults() -> UserDefaults? {
        return UserDefaults(suiteName: "com.xxx.xx")
    }
    override func broadcastStarted(withSetupInfo setupInfo: [String : NSObject]?) {
        // start capture
    }
    override func broadcastPaused() {
        // pause capture
    }
    override func broadcastResumed() {
        // resume capture
    }
    override func broadcastFinished() {
        // finish capture
    }
    // Process incoming sample buffers
    override func processSampleBuffer(_ sampleBuffer: CMSampleBuffer, with sampleBufferType: RPSampleBufferType) {
        switch sampleBufferType {
        case .video:
            // handle video
        case .audioApp:
            // handle app audio
        case .audioMic:
            // handle mic audio
        }
    }
}

Communication between the extension and the main app can use App Groups, sockets, or CFNotification .

Audio Capture: Audio Unit

Audio Unit provides low‑level access to audio capture with configurable parameters for high‑quality, low‑latency recording.

// Create audio unit
self.component = AudioComponentFindNext(nil, &acd)
OSStatus status = AudioComponentInstanceNew(self.component, &_audio_unit)
if (status != noErr) {
    [self handleAudiounitCreateFail]
}
// Configure stream format
AudioStreamBasicDescription desc = {0}
desc.mSampleRate = 44100
desc.mFormatID = kAudioFormatLinearPCM
desc.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked
desc.mChannelsPerFrame = 1
desc.mFramesPerPacket = 1
desc.mBitsPerChannel = 16
desc.mBytesPerFrame = desc.mBitsPerChannel / 8 * desc.mChannelsPerFrame
desc.mBytesPerPacket = desc.mBytesPerFrame * desc.mFramesPerPacket
// Set callback
AURenderCallbackStruct callback
callback.inputProcRefCon = (__bridge void *)(self)
callback.inputProc = handleVideoInputBuffer
AudioUnitSetProperty(self.audio_unit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Output, 1, &desc, sizeof(desc))
AudioUnitSetProperty(self.audio_unit, kAudioOutputUnitProperty_SetInputCallback, kAudioUnitScope_Global, 1, &callback, sizeof(callback))
// Configure AVAudioSession
AVAudioSession *session = [AVAudioSession sharedInstance]
[session setCategory:AVAudioSessionCategoryPlayAndRecord withOptions:AVAudioSessionCategoryOptionDefaultToSpeaker | AVAudioSessionCategoryOptionInterruptSpokenAudioAndMixWithOthers error:nil]
[session setActive:YES withOptions:kAudioSessionSetActiveFlag_NotifyOthersOnDeactivation error:nil]

🚧 Processing

Processing works on SampleBuffer to apply whitening, smoothing, filters, etc., typically using GPUImage (OpenGL or Metal) which offers over 100 filters.

🛠 Encoding

After processing, audio and video are encoded. Video encoding discards redundant information (spatial, temporal, visual, knowledge, structural) using lossy compression. Common codecs are H.264 and H.265.

Video Encoding Example

// Create encoder
OSStatus status = VTCompressionSessionCreate(NULL, _configuration.videoSize.width, _configuration.videoSize.height, kCMVideoCodecType_H264, NULL, NULL, NULL, VideoCompressonOutputCallback, (__bridge void *)self, &compressionSession);
// Set encoder properties
VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, (__bridge CFTypeRef)@(_videoMaxKeyframeInterval));
// Prepare to encode
VTCompressionSessionPrepareToEncodeFrames(compressionSession);
// Encode frame
OSStatus status = VTCompressionSessionEncodeFrame(compressionSession, pixelBuffer, presentationTimeStamp, duration, (__bridge CFDictionaryRef)properties, (__bridge_retained void *)timeNumber, &flags);

Audio Encoding Example

#import
// Create encoder
OSStatus result = AudioConverterNewSpecific(&inputFormat, &outputFormat, 2, requestedCodecs, &m_converter);
// Encode
AudioConverterFillComplexBuffer(m_converter, inputDataProc, &buffers, &outputDataPacketSize, &outBufferList, NULL);

📦 Packaging

Encoded streams are placed into container formats such as MP4, FLV, or TS. Live streaming commonly uses FLV or TS because they support streaming protocols.

🕸 Network Transmission

RTMP (based on TCP) is typically used. Media data is wrapped into RTMP messages, each consisting of a header (type, length, timestamp) and a body. Messages are split into 128‑byte chunks for transmission.

🖥 Playback

Clients pull the stream, reassemble chunks into messages, demux the container, decode audio and video, synchronize them, and render video while playing audio.

Mobile DevelopmentiOSLive StreamingencodingRTMPvideo captureAVFoundation
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.