Overview of iOS Live Streaming Workflow
This article provides a comprehensive overview of the iOS live‑streaming workflow, detailing the six stages—capture, processing, encoding, packaging, network transmission, and playback—along with sample code for video/audio capture, encoding settings, and RTMP transmission.
iOS Live Streaming Workflow Overview
The purpose of this article is to explain the stages a live stream goes through on iOS, from capturing raw media to delivering it to viewers.
📌 Introduction
The live‑streaming process is divided into six phases: Capture, Processing, Encoding, Packaging, Network Transmission, and Playback.
Capture
Processing
Encoding
Packaging
Network Transmission
Playback
📷 Capture
Capture includes both video and audio. On iOS, AVFoundation is used for camera video, ReplayKit for screen recording, and Audio Unit for audio.
Video Capture: Camera
Core Classes AVCaptureXXX
The main classes for camera capture are shown below.
Sample code:
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="color: #5c6370; font-style: italic; line-height: 26px">// 1. Create a session</span>
var session = AVCaptureSession.init()
<span style="color: #5c6370; font-style: italic; line-height: 26px">// 2. Get the camera device</span>
guard let device = AVCaptureDevice.default(for: .video) else {
print("Failed to get back camera")
return
}
<span style="color: #5c6370; font-style: italic; line-height: 26px">// 3. Create input</span>
let input = try AVCaptureDeviceInput(device: device)
if session.canAddInput(input) {
session.addInput(input)
}
<span style="color: #5c6370; font-style: italic; line-height: 26px">// 4. Create output</span>
let videoOutput = AVCaptureVideoDataOutput.init()
let pixelBufferFormat = kCVPixelBufferPixelFormatTypeKey as String
// Set YUV video format
videoOutput.videoSettings = [pixelBufferFormat: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange]
videoOutput.setSampleBufferDelegate(self, queue: outputQueue)
if session.canAddOutput(videoOutput) {
session.addOutput(videoOutput)
}
<span style="color: #5c6370; font-style: italic; line-height: 26px">// 5. Set preview layer</span>
let previewViewLayer = videoConfig.previewView.layer
previewViewLayer.backgroundColor = UIColor.black.cgColor
let layerFrame = previewViewLayer.bounds
let videoPreviewLayer = AVCaptureVideoPreviewLayer(session: session)
videoPreviewLayer.frame = layerFrame
videoConfig.previewView.layer.insertSublayer(videoPreviewLayer, at: 0)
<span style="color: #5c6370; font-style: italic; line-height: 26px">// 6. Process video frames in delegate</span>
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// TODO: handle video frame
}
</code>Color Sub‑sampling: YUV
Media data is usually compressed using color sub‑sampling. The code sets the output format to kCVPixelFormatType_420YpCbCr8BiPlanarFullRange, where 420 indicates 4:2:0 chroma subsampling and YpCbCr represents the YUV format. YpCbCr (YUV) – Y is luminance, Cb and Cr are chroma components.
Human eyes are more sensitive to luminance, allowing chroma to be heavily compressed.
Video Capture: Screen Recording
Screen recording can be done inside an app (captures only the app’s UI) or outside (captures the whole device screen, useful for game streaming).
1. In‑App Capture
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="color: #5c6370; font-style: italic; line-height: 26px">// iOS screen recording uses ReplayKit</span>
import ReplayKit
<span style="color: #5c6370; font-style: italic; line-height: 26px">// Start recording</span>
RPScreenRecorder.shared().startCapture { sampleBuffer, bufferType, err in
// handle sample buffer
} completionHandler: { err in
// handle error
}
<span style="color: #5c6370; font-style: italic; line-height: 26px">// Stop recording</span>
RPScreenRecorder.shared().stopCapture { err in
// handle error
}
</code>Tips for in‑app capture:
UI that should not be recorded can be placed on a custom UIWindow.
Enable the front‑camera preview via RPScreenRecorder.shared().cameraPreviewView and add it to the view hierarchy.
2. Out‑of‑App Capture
Requires a Broadcast Upload Extension that provides a SampleHandler class to receive video data.
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menhi, monospace; font-size: 12px">class SampleHandler: RPBroadcastSampleHandler {
func sohuSportUserDefaults() -> UserDefaults? {
return UserDefaults(suiteName: "com.xxx.xx")
}
override func broadcastStarted(withSetupInfo setupInfo: [String : NSObject]?) {
// start capture
}
override func broadcastPaused() {
// pause capture
}
override func broadcastResumed() {
// resume capture
}
override func broadcastFinished() {
// finish capture
}
// Process incoming sample buffers
override func processSampleBuffer(_ sampleBuffer: CMSampleBuffer, with sampleBufferType: RPSampleBufferType) {
switch sampleBufferType {
case .video:
// handle video
case .audioApp:
// handle app audio
case .audioMic:
// handle mic audio
}
}
}
</code>Communication between the extension and the main app can use App Groups, sockets, or CFNotification.
Audio Capture: Audio Unit
Audio Unit provides low‑level access to audio capture with configurable parameters for high‑quality, low‑latency recording.
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="color: #5c6370; font-style: italic; line-height: 26px">// Create audio unit</span>
self.component = AudioComponentFindNext(nil, &acd)
OSStatus status = AudioComponentInstanceNew(self.component, &_audio_unit)
if (status != noErr) {
[self handleAudiounitCreateFail]
}
<span style="color: #5c6370; font-style: italic; line-height: 26px">// Configure stream format</span>
AudioStreamBasicDescription desc = {0}
desc.mSampleRate = 44100
desc.mFormatID = kAudioFormatLinearPCM
desc.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked
desc.mChannelsPerFrame = 1
desc.mFramesPerPacket = 1
desc.mBitsPerChannel = 16
desc.mBytesPerFrame = desc.mBitsPerChannel / 8 * desc.mChannelsPerFrame
desc.mBytesPerPacket = desc.mBytesPerFrame * desc.mFramesPerPacket
<span style="color: #5c6370; font-style: italic; line-height: 26px">// Set callback</span>
AURenderCallbackStruct callback
callback.inputProcRefCon = (__bridge void *)(self)
callback.inputProc = handleVideoInputBuffer
AudioUnitSetProperty(self.audio_unit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Output, 1, &desc, sizeof(desc))
AudioUnitSetProperty(self.audio_unit, kAudioOutputUnitProperty_SetInputCallback, kAudioUnitScope_Global, 1, &callback, sizeof(callback))
<span style="color: #5c6370; font-style: italic; line-height: 26px">// Configure AVAudioSession</span>
AVAudioSession *session = [AVAudioSession sharedInstance]
[session setCategory:AVAudioSessionCategoryPlayAndRecord withOptions:AVAudioSessionCategoryOptionDefaultToSpeaker | AVAudioSessionCategoryOptionInterruptSpokenAudioAndMixWithOthers error:nil]
[session setActive:YES withOptions:kAudioSessionSetActiveFlag_NotifyOthersOnDeactivation error:nil]
</code>🚧 Processing
Processing works on SampleBuffer to apply whitening, smoothing, filters, etc., typically using GPUImage (OpenGL or Metal) which offers over 100 filters.
🛠 Encoding
After processing, audio and video are encoded. Video encoding discards redundant information (spatial, temporal, visual, knowledge, structural) using lossy compression. Common codecs are H.264 and H.265.
Video Encoding Example
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="color: #5c6370; font-style: italic; line-height: 26px">// Create encoder</span>
OSStatus status = VTCompressionSessionCreate(NULL, _configuration.videoSize.width, _configuration.videoSize.height, kCMVideoCodecType_H264, NULL, NULL, NULL, VideoCompressonOutputCallback, (__bridge void *)self, &compressionSession);
<span style="color: #5c6370; font-style: italic; line-height: 26px">// Set encoder properties</span>
VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, (__bridge CFTypeRef)@(_videoMaxKeyframeInterval));
<span style="color: #5c6370; font-style: italic; line-height: 26px">// Prepare to encode</span>
VTCompressionSessionPrepareToEncodeFrames(compressionSession);
<span style="color: #5c6370; font-style: italic; line-height: 26px">// Encode frame</span>
OSStatus status = VTCompressionSessionEncodeFrame(compressionSession, pixelBuffer, presentationTimeStamp, duration, (__bridge CFDictionaryRef)properties, (__bridge_retained void *)timeNumber, &flags);
</code>Audio Encoding Example
<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px">#import <AudioToolbox/AudioToolbox.h>
<span style="color: #5c6370; font-style: italic; line-height: 26px">// Create encoder</span>
OSStatus result = AudioConverterNewSpecific(&inputFormat, &outputFormat, 2, requestedCodecs, &m_converter);
<span style="color: #5c6370; font-style: italic; line-height: 26px">// Encode</span>
AudioConverterFillComplexBuffer(m_converter, inputDataProc, &buffers, &outputDataPacketSize, &outBufferList, NULL);
</code>📦 Packaging
Encoded streams are placed into container formats such as MP4, FLV, or TS. Live streaming commonly uses FLV or TS because they support streaming protocols.
🕸 Network Transmission
RTMP (based on TCP) is typically used. Media data is wrapped into RTMP messages, each consisting of a header (type, length, timestamp) and a body. Messages are split into 128‑byte chunks for transmission.
🖥 Playback
Clients pull the stream, reassemble chunks into messages, demux the container, decode audio and video, synchronize them, and render video while playing audio.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
