Implementing Voice Playback for iOS Push Notifications Using Notification Service Extension and Baidu TTS
This article details the background, development steps, and debugging process for enabling dynamic voice playback in iOS push notifications via Notification Service Extension, covering iOS version constraints, integration of system AVSpeechSynthesizer and Baidu offline TTS SDK, code examples, and deployment considerations.
1. Background
iOS push notification voice playback is required to read the notification text aloud, similar to Alipay and WeChat payment voice alerts. Only iOS 10+ supports background/audio playback after the app is awakened; iOS <10 can only play a fixed ringtone.
iOS 12 and later restrict background audio in Notification Service Extension, making implementation harder.
If the app is to be published on the App Store, only fixed audio or concatenated audio can be used via notification sound settings.
For internal distribution, the Notification Service Extension can be manually enabled for background playback.
2. Development Process
a. Notification Service Extension
After adding a Notification Service Extension target, the system invokes its methods when a push arrives, allowing modification of title, content, and sound before displaying the notification.
Lifecycle of the notification bar is roughly 6 seconds; if the user does not open the notification, the system calls serviceExtensionTimeWillExpire after up to 30 seconds.
Ensure new files are added to the correct target.
Resources such as sound files can be shared via App Groups.
Creation steps:
Create a Notification Service Extension target in Xcode (File → New → Target).
Enter a product name and finish the wizard.
Open NotificationService.m to handle the push.
@interface NotificationService ()
@property (nonatomic, strong) void (^contentHandler)(UNNotificationContent *contentToDeliver);
@property (nonatomic, strong) UNMutableNotificationContent *bestAttemptContent;
@end
@implementation NotificationService
- (void)didReceiveNotificationRequest:(UNNotificationRequest *)request withContentHandler:(void (^)(UNNotificationContent * _Nonnull))contentHandler {
self.contentHandler = contentHandler;
self.bestAttemptContent = [request.content mutableCopy];
// Modify the notification content here…
[self playVoiceWithInfo:self.bestAttemptContent.userInfo];
self.contentHandler(self.bestAttemptContent);
}
- (void)serviceExtensionTimeWillExpire {
self.contentHandler(self.bestAttemptContent);
}
- (void)playVoiceWithInfo:(NSDictionary *)userInfo {
NSString *title = userInfo[@"aps"][@"alert"][@"title"];
NSString *isRead = userInfo[@"isRead"];
NSString *isUseBaiDu = userInfo[@"isBaiDu"];
[[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayback withOptions:AVAudioSessionCategoryOptionDuckOthers error:nil];
[[AVAudioSession sharedInstance] setActive:YES withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:nil];
if ([isRead isEqual:@"1"]) {
if ([isUseBaiDu isEqual:@"1"]) {
[[BaiDuTtsUtils shared] playBaiDuTTSVoiceWithContent:title];
} else {
[[AppleTtsUtils shared] playAppleTtsVoiceWithContent:title];
}
}
}
@endKey points in AppleTtsUtils :
Volume is the product of the set volume and the system volume.
Numbers are spoken correctly by inserting a space after each digit.
#import "AppleTtsUtils.h"
#import
@interface AppleTtsUtils ()
@property (nonatomic, strong) AVSpeechSynthesizer *speechSynthesizer;
@property (nonatomic, strong) AVSpeechSynthesisVoice *speechSynthesisVoice;
@end
@implementation AppleTtsUtils
+ (instancetype)shared {
static id instance = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{ instance = [[self class] new]; });
return instance;
}
- (void)playAppleTtsVoiceWithContent:(NSString *)content {
if (!content.length) return;
NSString *newResult = @"";
for (int i = 0; i < content.length; i++) {
NSString *tempStr = [content substringWithRange:NSMakeRange(i, 1)];
newResult = [newResult stringByAppendingString:tempStr];
if ([self deptNumInputShouldNumber:tempStr]) {
newResult = [newResult stringByAppendingString:@" "];
}
}
AVSpeechUtterance *utterance = [AVSpeechUtterance speechUtteranceWithString:newResult];
utterance.rate = AVSpeechUtteranceDefaultSpeechRate;
utterance.voice = self.speechSynthesisVoice;
utterance.volume = 1.0;
[self.speechSynthesizer speakUtterance:utterance];
}
// ... delegate methods omitted for brevity ...
@endb. Baidu TTS Offline SDK Integration
Create a Baidu AI console application and use the Notification Service Extension bundle ID.
Download the offline SDK, obtain AppId, AppKey, SecretKey, and SN.
Add the SDK files (BDSClientHeaders, BDSClientLib, BDSClientResource) to the extension target, ensuring the copy flag is set.
Link required system libraries as shown in the Baidu sample project.
#import "BaiDuTtsUtils.h"
#import "BDSSpeechSynthesizer.h"
NSString *BaiDuTTSAPP_ID = @"Your_APP_ID";
NSString *BaiDuTTSAPI_KEY = @"Your_APP_KEY";
NSString *BaiDuTTSSECRET_KEY = @"Your_SECRET_KEY";
NSString *BaiDuTTSSN = @"Your_SN";
@implementation BaiDuTtsUtils
+ (instancetype)shared { /* singleton */ }
- (void)configureOfflineTTS {
NSString *offlineSpeechData = [[NSBundle mainBundle] pathForResource:@"bd_etts_common_speech_m15_mand_eng_high_am-mgc_v3.6.0_20190117" ofType:@"dat"];
NSString *offlineTextData = [[NSBundle mainBundle] pathForResource:@"bd_etts_common_text_txt_all_mand_eng_middle_big_v3.4.2_20210319" ofType:@"dat"];
if (!offlineSpeechData || !offlineTextData) { NSLog(@"Offline resources missing"); return; }
NSError *err = [[BDSSpeechSynthesizer sharedInstance] loadOfflineEngine:offlineTextData speechDataPath:offlineSpeechData licenseFilePath:nil withAppCode:BaiDuTTSAPP_ID withSn:BaiDuTTSSN];
if (err) { NSLog(@"Offline TTS init failed"); }
}
- (void)playBaiDuTTSVoiceWithContent:(NSString *)voiceText {
[[BDSSpeechSynthesizer sharedInstance] setSynthesizerDelegate:self];
[self configureOfflineTTS];
[[BDSSpeechSynthesizer sharedInstance] setPlayerVolume:10];
[[BDSSpeechSynthesizer sharedInstance] setSynthParam:@(5) forKey:BDS_SYNTHESIZER_PARAM_SPEED];
NSError *speakError = nil;
[[BDSSpeechSynthesizer sharedInstance] speakSentence:voiceText withError:&speakError];
if (speakError) { NSLog(@"Error: %ld, %@", (long)speakError.code, speakError.localizedDescription); }
}
// Delegate callbacks omitted
@endc. Debugging
Run the main app first, then launch the Notification Service Extension target and set breakpoints in didReceiveNotificationRequest:withContentHandler: . Verify that the payload contains mutable-content = 1 . Example payload:
{
"aps": {
"alert": {
"title": "标题",
"subtitle": "副标题",
"body": "内容"
},
"badge": 1,
"sound": "default",
"mutable-content": 1
}
}Common errors after iOS 12 include audio queue start failures because background playback is not allowed. The solution is to enable the Audio, AirPlay, and Picture in Picture background mode in the main app’s Signing & Capabilities, and add Required background modes with App plays audio or streams audio/video using AirPlay to the extension’s plist.
3. Conclusion
For internal distribution, enabling background audio in the Notification Service Extension allows dynamic voice playback using either system AVSpeechSynthesizer or Baidu offline TTS. For App Store submissions, only fixed audio files can be used; dynamic synthesis is not permitted.
References
iOS Voice Playback Solution (Alipay/WeChat style)
iOS JPush + Voice Playback
Baidu Offline TTS iOS SDK Documentation
Baidu AI Console
iOS 12+ Voice Playback Issues and Exploration
iOS 12.1 Baidu TTS Playback Issues
WeChat iOS Payment Voice Reminder Summary
iOS 13 WeChat Payment Voice Reminder Summary
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.