Overview of iOS Live Streaming Workflow
This article provides a comprehensive overview of the iOS live‑streaming workflow, detailing the six stages—capture, processing, encoding, packaging, network transmission, and playback—along with sample code for video/audio capture, encoding settings, and RTMP transmission.
iOS Live Streaming Workflow Overview
The purpose of this article is to explain the stages a live stream goes through on iOS, from capturing raw media to delivering it to viewers.
📌 Introduction
The live‑streaming process is divided into six phases: Capture, Processing, Encoding, Packaging, Network Transmission, and Playback.
Capture
Processing
Encoding
Packaging
Network Transmission
Playback
📷 Capture
Capture includes both video and audio. On iOS, AVFoundation is used for camera video, ReplayKit for screen recording, and Audio Unit for audio.
Video Capture: Camera
Core Classes AVCaptureXXX
The main classes for camera capture are shown below.
Sample code:
// 1. Create a session
var session = AVCaptureSession.init()
// 2. Get the camera device
guard let device = AVCaptureDevice.default(for: .video) else {
print("Failed to get back camera")
return
}
// 3. Create input
let input = try AVCaptureDeviceInput(device: device)
if session.canAddInput(input) {
session.addInput(input)
}
// 4. Create output
let videoOutput = AVCaptureVideoDataOutput.init()
let pixelBufferFormat = kCVPixelBufferPixelFormatTypeKey as String
// Set YUV video format
videoOutput.videoSettings = [pixelBufferFormat: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange]
videoOutput.setSampleBufferDelegate(self, queue: outputQueue)
if session.canAddOutput(videoOutput) {
session.addOutput(videoOutput)
}
// 5. Set preview layer
let previewViewLayer = videoConfig.previewView.layer
previewViewLayer.backgroundColor = UIColor.black.cgColor
let layerFrame = previewViewLayer.bounds
let videoPreviewLayer = AVCaptureVideoPreviewLayer(session: session)
videoPreviewLayer.frame = layerFrame
videoConfig.previewView.layer.insertSublayer(videoPreviewLayer, at: 0)
// 6. Process video frames in delegate
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// TODO: handle video frame
}Color Sub‑sampling: YUV
Media data is usually compressed using color sub‑sampling. The code sets the output format to kCVPixelFormatType_420YpCbCr8BiPlanarFullRange , where 420 indicates 4:2:0 chroma subsampling and YpCbCr represents the YUV format.
YpCbCr (YUV) – Y is luminance, Cb and Cr are chroma components.
Human eyes are more sensitive to luminance, allowing chroma to be heavily compressed.
Video Capture: Screen Recording
Screen recording can be done inside an app (captures only the app’s UI) or outside (captures the whole device screen, useful for game streaming).
1. In‑App Capture
// iOS screen recording uses ReplayKit
import ReplayKit
// Start recording
RPScreenRecorder.shared().startCapture { sampleBuffer, bufferType, err in
// handle sample buffer
} completionHandler: { err in
// handle error
}
// Stop recording
RPScreenRecorder.shared().stopCapture { err in
// handle error
}Tips for in‑app capture:
UI that should not be recorded can be placed on a custom UIWindow .
Enable the front‑camera preview via RPScreenRecorder.shared().cameraPreviewView and add it to the view hierarchy.
2. Out‑of‑App Capture
Requires a Broadcast Upload Extension that provides a SampleHandler class to receive video data.
class SampleHandler: RPBroadcastSampleHandler {
func sohuSportUserDefaults() -> UserDefaults? {
return UserDefaults(suiteName: "com.xxx.xx")
}
override func broadcastStarted(withSetupInfo setupInfo: [String : NSObject]?) {
// start capture
}
override func broadcastPaused() {
// pause capture
}
override func broadcastResumed() {
// resume capture
}
override func broadcastFinished() {
// finish capture
}
// Process incoming sample buffers
override func processSampleBuffer(_ sampleBuffer: CMSampleBuffer, with sampleBufferType: RPSampleBufferType) {
switch sampleBufferType {
case .video:
// handle video
case .audioApp:
// handle app audio
case .audioMic:
// handle mic audio
}
}
}Communication between the extension and the main app can use App Groups, sockets, or CFNotification .
Audio Capture: Audio Unit
Audio Unit provides low‑level access to audio capture with configurable parameters for high‑quality, low‑latency recording.
// Create audio unit
self.component = AudioComponentFindNext(nil, &acd)
OSStatus status = AudioComponentInstanceNew(self.component, &_audio_unit)
if (status != noErr) {
[self handleAudiounitCreateFail]
}
// Configure stream format
AudioStreamBasicDescription desc = {0}
desc.mSampleRate = 44100
desc.mFormatID = kAudioFormatLinearPCM
desc.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked
desc.mChannelsPerFrame = 1
desc.mFramesPerPacket = 1
desc.mBitsPerChannel = 16
desc.mBytesPerFrame = desc.mBitsPerChannel / 8 * desc.mChannelsPerFrame
desc.mBytesPerPacket = desc.mBytesPerFrame * desc.mFramesPerPacket
// Set callback
AURenderCallbackStruct callback
callback.inputProcRefCon = (__bridge void *)(self)
callback.inputProc = handleVideoInputBuffer
AudioUnitSetProperty(self.audio_unit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Output, 1, &desc, sizeof(desc))
AudioUnitSetProperty(self.audio_unit, kAudioOutputUnitProperty_SetInputCallback, kAudioUnitScope_Global, 1, &callback, sizeof(callback))
// Configure AVAudioSession
AVAudioSession *session = [AVAudioSession sharedInstance]
[session setCategory:AVAudioSessionCategoryPlayAndRecord withOptions:AVAudioSessionCategoryOptionDefaultToSpeaker | AVAudioSessionCategoryOptionInterruptSpokenAudioAndMixWithOthers error:nil]
[session setActive:YES withOptions:kAudioSessionSetActiveFlag_NotifyOthersOnDeactivation error:nil]🚧 Processing
Processing works on SampleBuffer to apply whitening, smoothing, filters, etc., typically using GPUImage (OpenGL or Metal) which offers over 100 filters.
🛠 Encoding
After processing, audio and video are encoded. Video encoding discards redundant information (spatial, temporal, visual, knowledge, structural) using lossy compression. Common codecs are H.264 and H.265.
Video Encoding Example
// Create encoder
OSStatus status = VTCompressionSessionCreate(NULL, _configuration.videoSize.width, _configuration.videoSize.height, kCMVideoCodecType_H264, NULL, NULL, NULL, VideoCompressonOutputCallback, (__bridge void *)self, &compressionSession);
// Set encoder properties
VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, (__bridge CFTypeRef)@(_videoMaxKeyframeInterval));
// Prepare to encode
VTCompressionSessionPrepareToEncodeFrames(compressionSession);
// Encode frame
OSStatus status = VTCompressionSessionEncodeFrame(compressionSession, pixelBuffer, presentationTimeStamp, duration, (__bridge CFDictionaryRef)properties, (__bridge_retained void *)timeNumber, &flags);Audio Encoding Example
#import
// Create encoder
OSStatus result = AudioConverterNewSpecific(&inputFormat, &outputFormat, 2, requestedCodecs, &m_converter);
// Encode
AudioConverterFillComplexBuffer(m_converter, inputDataProc, &buffers, &outputDataPacketSize, &outBufferList, NULL);📦 Packaging
Encoded streams are placed into container formats such as MP4, FLV, or TS. Live streaming commonly uses FLV or TS because they support streaming protocols.
🕸 Network Transmission
RTMP (based on TCP) is typically used. Media data is wrapped into RTMP messages, each consisting of a header (type, length, timestamp) and a body. Messages are split into 128‑byte chunks for transmission.
🖥 Playback
Clients pull the stream, reassemble chunks into messages, demux the container, decode audio and video, synchronize them, and render video while playing audio.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.