如何检测语音开始于 iOS 语音 API
How to detect speech start on iOS Speech API
我有一个用 XCode/objective C 开发的 iOS 应用程序。
它使用iOS Speech API 来处理连续语音识别。
它工作正常,但我想在语音开始时将麦克风图标变为红色,我还想检测语音何时结束。
我实现了 SFSpeechRecognitionTaskDelegate 接口,它提供回调 onDetectedSpeechStart 和 speechRecognitionTask:didHypothesizeTranscription: 但这些直到处理第一个单词的结尾才会发生,而不是在语音的开头。
我想检测语音的开头(或任何噪音)。我认为它应该可以从 installTapOnBus: 从 AVAudioPCMBuffer 获得,但我不确定如何检测这是静音还是可能是语音的噪音。
此外,语音 API 不会在人停止说话时给出事件,即静音检测,它只是记录直到超时。我有一个通过检查上次触发事件之间的时间来检测静音的技巧,不确定他们是否是更好的方法。
代码在这里,
NSError * outError;
AVAudioSession *audioSession = [AVAudioSession sharedInstance];
[audioSession setCategory: AVAudioSessionCategoryPlayAndRecord withOptions:AVAudioSessionCategoryOptionDefaultToSpeaker error:&outError];
[audioSession setMode: AVAudioSessionModeMeasurement error:&outError];
[audioSession setActive: true withOptions: AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&outError];
SFSpeechAudioBufferRecognitionRequest* speechRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
if (speechRequest == nil) {
NSLog(@"Unable to create SFSpeechAudioBufferRecognitionRequest.");
return;
}
audioEngine = [[AVAudioEngine alloc] init];
AVAudioInputNode* inputNode = [audioEngine inputNode];
speechRequest.shouldReportPartialResults = true;
// iOS speech does not detect end of speech, so must track silence.
lastSpeechDetected = -1;
speechTask = [speechRecognizer recognitionTaskWithRequest: speechRequest delegate: self];
[inputNode installTapOnBus:0 bufferSize: 4096 format: [inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer* buffer, AVAudioTime* when) {
long millis = [[NSDate date] timeIntervalSince1970] * 1000;
if (lastSpeechDetected != -1 && ((millis - lastSpeechDetected) > 1000)) {
lastSpeechDetected = -1;
[speechTask finish];
return;
}
[speechRequest appendAudioPCMBuffer: buffer];
}];
[audioEngine prepare];
[audioEngine startAndReturnError: &outError];
我建议使用 AVAudioRecorder
和 NSTimer
对功率信号进行低通滤波以进行回调。通过这种方式,您将能够检测到录音机读数何时达到某个阈值,低通滤波将有助于减轻噪音。
.h 文件中:
#import <UIKit/UIKit.h>
#import <AVFoundation/AVFoundation.h>
#import <CoreAudio/CoreAudioTypes.h>
@interface ViewController : UIViewController{
AVAudioRecorder *recorder;
NSTimer *levelTimer;
double lowPassResults;
}
- (void)levelTimerCallback:(NSTimer *)timer;
@end
.m 文件中:
#import "ViewController.h"
@interface ViewController ()
@end
@implementation ViewController
- (void)viewDidLoad {
[super viewDidLoad];
// AVAudioSession already set in your code, so no need for these 2 lines.
[[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayAndRecord error:nil];
[[AVAudioSession sharedInstance] setActive:YES error:nil];
NSURL *url = [NSURL fileURLWithPath:@"/dev/null"];
NSDictionary *settings = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithFloat: 44100.0], AVSampleRateKey,
[NSNumber numberWithInt: kAudioFormatAppleLossless], AVFormatIDKey,
[NSNumber numberWithInt: 1], AVNumberOfChannelsKey,
[NSNumber numberWithInt: AVAudioQualityMax], AVEncoderAudioQualityKey,
nil];
NSError *error;
lowPassResults = 0;
recorder = [[AVAudioRecorder alloc] initWithURL:url settings:settings error:&error];
if (recorder) {
[recorder prepareToRecord];
recorder.meteringEnabled = YES;
[recorder record];
levelTimer = [NSTimer scheduledTimerWithTimeInterval: 0.05 target: self selector: @selector(levelTimerCallback:) userInfo: nil repeats: YES];
} else
NSLog(@"%@", [error description]);
}
- (void)levelTimerCallback:(NSTimer *)timer {
[recorder updateMeters];
const double ALPHA = 0.05;
double peakPowerForChannel = pow(10, (0.05 * [recorder peakPowerForChannel:0]));
lowPassResults = ALPHA * peakPowerForChannel + (1.0 - ALPHA) * lowPassResults;
NSLog(@"lowPassResults: %f",lowPassResults);
// Use here a threshold value to stablish if there is silence or speech
if (lowPassResults < 0.1) {
NSLog(@"Silence");
} else if(lowPassResults > 0.5){
NSLog(@"Speech");
}
}
- (void)didReceiveMemoryWarning {
[super didReceiveMemoryWarning];
// Dispose of any resources that can be recreated.
}
@end
您尝试过使用 AVCaptureAudioChannel
吗?这是 link 到 documentation
您有一个 volume
属性 提供频道的当前音量(增益)。
这是我们最终得到的有效代码。
关键是安装 TapOnBus() 然后是检测音量的魔术代码,
浮动量 = fabsf(*buffer.floatChannelData[0]);
-(void) doActualRecording {
NSLog(@"doActualRecording");
@try {
//if (!recording) {
if (audioEngine != NULL) {
[audioEngine stop];
[speechTask cancel];
AVAudioInputNode* inputNode = [audioEngine inputNode];
[inputNode removeTapOnBus: 0];
}
recording = YES;
micButton.selected = YES;
//NSLog(@"Starting recording... SFSpeechRecognizer Available? %d", [speechRecognizer isAvailable]);
NSError * outError;
//NSLog(@"AUDIO SESSION CATEGORY0: %@", [[AVAudioSession sharedInstance] category]);
AVAudioSession* audioSession = [AVAudioSession sharedInstance];
[audioSession setCategory: AVAudioSessionCategoryPlayAndRecord withOptions:AVAudioSessionCategoryOptionDefaultToSpeaker error:&outError];
[audioSession setMode: AVAudioSessionModeMeasurement error:&outError];
[audioSession setActive: true withOptions: AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&outError];
SFSpeechAudioBufferRecognitionRequest* speechRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
//NSLog(@"AUDIO SESSION CATEGORY1: %@", [[AVAudioSession sharedInstance] category]);
if (speechRequest == nil) {
NSLog(@"Unable to create SFSpeechAudioBufferRecognitionRequest.");
return;
}
speechDetectionSamples = 0;
// This some how fixes a crash on iPhone 7
// Seems like a bug in iOS ARC/lack of gc
AVAudioEngine* temp = audioEngine;
audioEngine = [[AVAudioEngine alloc] init];
AVAudioInputNode* inputNode = [audioEngine inputNode];
speechRequest.shouldReportPartialResults = true;
// iOS speech does not detect end of speech, so must track silence.
lastSpeechDetected = -1;
speechTask = [speechRecognizer recognitionTaskWithRequest: speechRequest delegate: self];
[inputNode installTapOnBus:0 bufferSize: 4096 format: [inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer* buffer, AVAudioTime* when) {
@try {
long long millis = [[NSDate date] timeIntervalSince1970] * 1000;
if (lastSpeechDetected != -1 && ((millis - lastSpeechDetected) > 1000)) {
lastSpeechDetected = -1;
[speechTask finish];
return;
}
[speechRequest appendAudioPCMBuffer: buffer];
//Calculate volume level
if ([buffer floatChannelData] != nil) {
float volume = fabsf(*buffer.floatChannelData[0]);
if (volume >= speechDetectionThreshold) {
speechDetectionSamples++;
if (speechDetectionSamples >= speechDetectionSamplesNeeded) {
//Need to change mic button image in main thread
[[NSOperationQueue mainQueue] addOperationWithBlock:^ {
[micButton setImage: [UIImage imageNamed: @"micRecording"] forState: UIControlStateSelected];
}];
}
} else {
speechDetectionSamples = 0;
}
}
}
@catch (NSException * e) {
NSLog(@"Exception: %@", e);
}
}];
[audioEngine prepare];
[audioEngine startAndReturnError: &outError];
NSLog(@"Error %@", outError);
//}
}
@catch (NSException * e) {
NSLog(@"Exception: %@", e);
}
}
我有一个用 XCode/objective C 开发的 iOS 应用程序。 它使用iOS Speech API 来处理连续语音识别。 它工作正常,但我想在语音开始时将麦克风图标变为红色,我还想检测语音何时结束。
我实现了 SFSpeechRecognitionTaskDelegate 接口,它提供回调 onDetectedSpeechStart 和 speechRecognitionTask:didHypothesizeTranscription: 但这些直到处理第一个单词的结尾才会发生,而不是在语音的开头。
我想检测语音的开头(或任何噪音)。我认为它应该可以从 installTapOnBus: 从 AVAudioPCMBuffer 获得,但我不确定如何检测这是静音还是可能是语音的噪音。
此外,语音 API 不会在人停止说话时给出事件,即静音检测,它只是记录直到超时。我有一个通过检查上次触发事件之间的时间来检测静音的技巧,不确定他们是否是更好的方法。
代码在这里,
NSError * outError;
AVAudioSession *audioSession = [AVAudioSession sharedInstance];
[audioSession setCategory: AVAudioSessionCategoryPlayAndRecord withOptions:AVAudioSessionCategoryOptionDefaultToSpeaker error:&outError];
[audioSession setMode: AVAudioSessionModeMeasurement error:&outError];
[audioSession setActive: true withOptions: AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&outError];
SFSpeechAudioBufferRecognitionRequest* speechRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
if (speechRequest == nil) {
NSLog(@"Unable to create SFSpeechAudioBufferRecognitionRequest.");
return;
}
audioEngine = [[AVAudioEngine alloc] init];
AVAudioInputNode* inputNode = [audioEngine inputNode];
speechRequest.shouldReportPartialResults = true;
// iOS speech does not detect end of speech, so must track silence.
lastSpeechDetected = -1;
speechTask = [speechRecognizer recognitionTaskWithRequest: speechRequest delegate: self];
[inputNode installTapOnBus:0 bufferSize: 4096 format: [inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer* buffer, AVAudioTime* when) {
long millis = [[NSDate date] timeIntervalSince1970] * 1000;
if (lastSpeechDetected != -1 && ((millis - lastSpeechDetected) > 1000)) {
lastSpeechDetected = -1;
[speechTask finish];
return;
}
[speechRequest appendAudioPCMBuffer: buffer];
}];
[audioEngine prepare];
[audioEngine startAndReturnError: &outError];
我建议使用 AVAudioRecorder
和 NSTimer
对功率信号进行低通滤波以进行回调。通过这种方式,您将能够检测到录音机读数何时达到某个阈值,低通滤波将有助于减轻噪音。
.h 文件中:
#import <UIKit/UIKit.h>
#import <AVFoundation/AVFoundation.h>
#import <CoreAudio/CoreAudioTypes.h>
@interface ViewController : UIViewController{
AVAudioRecorder *recorder;
NSTimer *levelTimer;
double lowPassResults;
}
- (void)levelTimerCallback:(NSTimer *)timer;
@end
.m 文件中:
#import "ViewController.h"
@interface ViewController ()
@end
@implementation ViewController
- (void)viewDidLoad {
[super viewDidLoad];
// AVAudioSession already set in your code, so no need for these 2 lines.
[[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayAndRecord error:nil];
[[AVAudioSession sharedInstance] setActive:YES error:nil];
NSURL *url = [NSURL fileURLWithPath:@"/dev/null"];
NSDictionary *settings = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithFloat: 44100.0], AVSampleRateKey,
[NSNumber numberWithInt: kAudioFormatAppleLossless], AVFormatIDKey,
[NSNumber numberWithInt: 1], AVNumberOfChannelsKey,
[NSNumber numberWithInt: AVAudioQualityMax], AVEncoderAudioQualityKey,
nil];
NSError *error;
lowPassResults = 0;
recorder = [[AVAudioRecorder alloc] initWithURL:url settings:settings error:&error];
if (recorder) {
[recorder prepareToRecord];
recorder.meteringEnabled = YES;
[recorder record];
levelTimer = [NSTimer scheduledTimerWithTimeInterval: 0.05 target: self selector: @selector(levelTimerCallback:) userInfo: nil repeats: YES];
} else
NSLog(@"%@", [error description]);
}
- (void)levelTimerCallback:(NSTimer *)timer {
[recorder updateMeters];
const double ALPHA = 0.05;
double peakPowerForChannel = pow(10, (0.05 * [recorder peakPowerForChannel:0]));
lowPassResults = ALPHA * peakPowerForChannel + (1.0 - ALPHA) * lowPassResults;
NSLog(@"lowPassResults: %f",lowPassResults);
// Use here a threshold value to stablish if there is silence or speech
if (lowPassResults < 0.1) {
NSLog(@"Silence");
} else if(lowPassResults > 0.5){
NSLog(@"Speech");
}
}
- (void)didReceiveMemoryWarning {
[super didReceiveMemoryWarning];
// Dispose of any resources that can be recreated.
}
@end
您尝试过使用 AVCaptureAudioChannel
吗?这是 link 到 documentation
您有一个 volume
属性 提供频道的当前音量(增益)。
这是我们最终得到的有效代码。
关键是安装 TapOnBus() 然后是检测音量的魔术代码,
浮动量 = fabsf(*buffer.floatChannelData[0]);
-(void) doActualRecording {
NSLog(@"doActualRecording");
@try {
//if (!recording) {
if (audioEngine != NULL) {
[audioEngine stop];
[speechTask cancel];
AVAudioInputNode* inputNode = [audioEngine inputNode];
[inputNode removeTapOnBus: 0];
}
recording = YES;
micButton.selected = YES;
//NSLog(@"Starting recording... SFSpeechRecognizer Available? %d", [speechRecognizer isAvailable]);
NSError * outError;
//NSLog(@"AUDIO SESSION CATEGORY0: %@", [[AVAudioSession sharedInstance] category]);
AVAudioSession* audioSession = [AVAudioSession sharedInstance];
[audioSession setCategory: AVAudioSessionCategoryPlayAndRecord withOptions:AVAudioSessionCategoryOptionDefaultToSpeaker error:&outError];
[audioSession setMode: AVAudioSessionModeMeasurement error:&outError];
[audioSession setActive: true withOptions: AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&outError];
SFSpeechAudioBufferRecognitionRequest* speechRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
//NSLog(@"AUDIO SESSION CATEGORY1: %@", [[AVAudioSession sharedInstance] category]);
if (speechRequest == nil) {
NSLog(@"Unable to create SFSpeechAudioBufferRecognitionRequest.");
return;
}
speechDetectionSamples = 0;
// This some how fixes a crash on iPhone 7
// Seems like a bug in iOS ARC/lack of gc
AVAudioEngine* temp = audioEngine;
audioEngine = [[AVAudioEngine alloc] init];
AVAudioInputNode* inputNode = [audioEngine inputNode];
speechRequest.shouldReportPartialResults = true;
// iOS speech does not detect end of speech, so must track silence.
lastSpeechDetected = -1;
speechTask = [speechRecognizer recognitionTaskWithRequest: speechRequest delegate: self];
[inputNode installTapOnBus:0 bufferSize: 4096 format: [inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer* buffer, AVAudioTime* when) {
@try {
long long millis = [[NSDate date] timeIntervalSince1970] * 1000;
if (lastSpeechDetected != -1 && ((millis - lastSpeechDetected) > 1000)) {
lastSpeechDetected = -1;
[speechTask finish];
return;
}
[speechRequest appendAudioPCMBuffer: buffer];
//Calculate volume level
if ([buffer floatChannelData] != nil) {
float volume = fabsf(*buffer.floatChannelData[0]);
if (volume >= speechDetectionThreshold) {
speechDetectionSamples++;
if (speechDetectionSamples >= speechDetectionSamplesNeeded) {
//Need to change mic button image in main thread
[[NSOperationQueue mainQueue] addOperationWithBlock:^ {
[micButton setImage: [UIImage imageNamed: @"micRecording"] forState: UIControlStateSelected];
}];
}
} else {
speechDetectionSamples = 0;
}
}
}
@catch (NSException * e) {
NSLog(@"Exception: %@", e);
}
}];
[audioEngine prepare];
[audioEngine startAndReturnError: &outError];
NSLog(@"Error %@", outError);
//}
}
@catch (NSException * e) {
NSLog(@"Exception: %@", e);
}
}