使用 ML 套件(使用 CMSampleBuffer)从实时视频流中识别文本
Text recognition from a live video stream using ML kit (with CMSampleBuffer)
我正在尝试修改 Google here 提供的设备上文本识别示例,以使其适用于实时摄像头。
当将相机放在文本上时(与图像示例一起使用),我的控制台在最终 运行 内存不足之前在流中生成以下内容:
2018-05-16 10:48:22.129901+1200 TextRecognition[32138:5593533] An empty result returned from from GMVDetector for VisionTextDetector.
这是我的视频拍摄方法:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
if let textDetector = self.textDetector {
let visionImage = VisionImage(buffer: sampleBuffer)
let metadata = VisionImageMetadata()
metadata.orientation = .rightTop
visionImage.metadata = metadata
textDetector.detect(in: visionImage) { (features, error) in
guard error == nil, let features = features, !features.isEmpty else {
// Error. You should also check the console for error messages.
// ...
return
}
// Recognized and extracted text
print("Detected text has: \(features.count) blocks")
// ...
}
}
}
这样做正确吗?
ML Kit 仍在将 CMSampleBuffer 用法的示例代码添加到 Firebase 快速入门。
同时,以下代码适用于 CMSampleBuffer。
设置 AV 捕获(对 kCVPixelBufferPixelFormatTypeKey 使用 kCVPixelFormatType_32BGRA):
@property(nonatomic, strong) AVCaptureSession *session;
@property(nonatomic, strong) AVCaptureVideoDataOutput *videoDataOutput;
- (void)setupVideoProcessing {
self.videoDataOutput = [[AVCaptureVideoDataOutput alloc] init];
NSDictionary *rgbOutputSettings = @{
(__bridge NSString*)kCVPixelBufferPixelFormatTypeKey : @(kCVPixelFormatType_32BGRA)
};
[self.videoDataOutput setVideoSettings:rgbOutputSettings];
if (![self.session canAddOutput:self.videoDataOutput]) {
[self cleanupVideoProcessing];
NSLog(@"Failed to setup video output");
return;
}
[self.videoDataOutput setAlwaysDiscardsLateVideoFrames:YES];
[self.videoDataOutput setSampleBufferDelegate:self queue:self.videoDataOutputQueue];
[self.session addOutput:self.videoDataOutput];
}
消耗CMSampleBuffer和运行检测:
- (void)runDetection:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
fromConnection:(AVCaptureConnection *)connection {
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
size_t imageWidth = CVPixelBufferGetWidth(imageBuffer);
size_t imageHeight = CVPixelBufferGetHeight(imageBuffer);
AVCaptureDevicePosition devicePosition = self.isUsingFrontCamera ? AVCaptureDevicePositionFront : AVCaptureDevicePositionBack;
// Calculate the image orientation.
UIDeviceOrientation deviceOrientation = [[UIDevice currentDevice] orientation];
ImageOrientation orientation =
[ImageUtility imageOrientationFromOrientation:deviceOrientation
withCaptureDevicePosition:devicePosition
defaultDeviceOrientation:[self deviceOrientationFromInterfaceOrientation]];
// Invoke text detection.
FIRVisionImage *image = [[FIRVisionImage alloc] initWithBuffer:sampleBuffer];
FIRVisionImageMetadata *metadata = [[FIRVisionImageMetadata alloc] init];
metadata.orientation = orientation;
image.metadata = metadata;
FIRVisionTextDetectionCallback callback =
^(NSArray<id<FIRVisionText>> *_Nullable features, NSError *_Nullable error) {
...
};
[self.textDetector detectInImage:image completion:callback];
}
上面使用的ImageUtility辅助函数判断方向:
+ (FIRVisionDetectorImageOrientation)imageOrientationFromOrientation:(UIDeviceOrientation)deviceOrientation
withCaptureDevicePosition:(AVCaptureDevicePosition)position
defaultDeviceOrientation:(UIDeviceOrientation)defaultOrientation {
if (deviceOrientation == UIDeviceOrientationFaceDown ||
deviceOrientation == UIDeviceOrientationFaceUp ||
deviceOrientation == UIDeviceOrientationUnknown) {
deviceOrientation = defaultOrientation;
}
FIRVisionDetectorImageOrientation orientation = FIRVisionDetectorImageOrientationTopLeft;
switch (deviceOrientation) {
case UIDeviceOrientationPortrait:
if (position == AVCaptureDevicePositionFront) {
orientation = FIRVisionDetectorImageOrientationLeftTop;
} else {
orientation = FIRVisionDetectorImageOrientationRightTop;
}
break;
case UIDeviceOrientationLandscapeLeft:
if (position == AVCaptureDevicePositionFront) {
orientation = FIRVisionDetectorImageOrientationBottomLeft;
} else {
orientation = FIRVisionDetectorImageOrientationTopLeft;
}
break;
case UIDeviceOrientationPortraitUpsideDown:
if (position == AVCaptureDevicePositionFront) {
orientation = FIRVisionDetectorImageOrientationRightBottom;
} else {
orientation = FIRVisionDetectorImageOrientationLeftBottom;
}
break;
case UIDeviceOrientationLandscapeRight:
if (position == AVCaptureDevicePositionFront) {
orientation = FIRVisionDetectorImageOrientationTopRight;
} else {
orientation = FIRVisionDetectorImageOrientationBottomRight;
}
break;
default:
orientation = FIRVisionDetectorImageOrientationTopLeft;
break;
}
return orientation;
}
ML Kit 早已从 Firebase 迁移出来并成为一个独立的 SDK (migration guide)。
Swift 中的快速入门示例应用展示了如何使用 ML Kit(带有 CMSampleBuffer)从实时视频流中进行文本识别,现在可在此处获取:
实时提要在 CameraViewController.swift 中实现:
我正在尝试修改 Google here 提供的设备上文本识别示例,以使其适用于实时摄像头。
当将相机放在文本上时(与图像示例一起使用),我的控制台在最终 运行 内存不足之前在流中生成以下内容:
2018-05-16 10:48:22.129901+1200 TextRecognition[32138:5593533] An empty result returned from from GMVDetector for VisionTextDetector.
这是我的视频拍摄方法:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
if let textDetector = self.textDetector {
let visionImage = VisionImage(buffer: sampleBuffer)
let metadata = VisionImageMetadata()
metadata.orientation = .rightTop
visionImage.metadata = metadata
textDetector.detect(in: visionImage) { (features, error) in
guard error == nil, let features = features, !features.isEmpty else {
// Error. You should also check the console for error messages.
// ...
return
}
// Recognized and extracted text
print("Detected text has: \(features.count) blocks")
// ...
}
}
}
这样做正确吗?
ML Kit 仍在将 CMSampleBuffer 用法的示例代码添加到 Firebase 快速入门。
同时,以下代码适用于 CMSampleBuffer。
设置 AV 捕获(对 kCVPixelBufferPixelFormatTypeKey 使用 kCVPixelFormatType_32BGRA):
@property(nonatomic, strong) AVCaptureSession *session;
@property(nonatomic, strong) AVCaptureVideoDataOutput *videoDataOutput;
- (void)setupVideoProcessing {
self.videoDataOutput = [[AVCaptureVideoDataOutput alloc] init];
NSDictionary *rgbOutputSettings = @{
(__bridge NSString*)kCVPixelBufferPixelFormatTypeKey : @(kCVPixelFormatType_32BGRA)
};
[self.videoDataOutput setVideoSettings:rgbOutputSettings];
if (![self.session canAddOutput:self.videoDataOutput]) {
[self cleanupVideoProcessing];
NSLog(@"Failed to setup video output");
return;
}
[self.videoDataOutput setAlwaysDiscardsLateVideoFrames:YES];
[self.videoDataOutput setSampleBufferDelegate:self queue:self.videoDataOutputQueue];
[self.session addOutput:self.videoDataOutput];
}
消耗CMSampleBuffer和运行检测:
- (void)runDetection:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
fromConnection:(AVCaptureConnection *)connection {
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
size_t imageWidth = CVPixelBufferGetWidth(imageBuffer);
size_t imageHeight = CVPixelBufferGetHeight(imageBuffer);
AVCaptureDevicePosition devicePosition = self.isUsingFrontCamera ? AVCaptureDevicePositionFront : AVCaptureDevicePositionBack;
// Calculate the image orientation.
UIDeviceOrientation deviceOrientation = [[UIDevice currentDevice] orientation];
ImageOrientation orientation =
[ImageUtility imageOrientationFromOrientation:deviceOrientation
withCaptureDevicePosition:devicePosition
defaultDeviceOrientation:[self deviceOrientationFromInterfaceOrientation]];
// Invoke text detection.
FIRVisionImage *image = [[FIRVisionImage alloc] initWithBuffer:sampleBuffer];
FIRVisionImageMetadata *metadata = [[FIRVisionImageMetadata alloc] init];
metadata.orientation = orientation;
image.metadata = metadata;
FIRVisionTextDetectionCallback callback =
^(NSArray<id<FIRVisionText>> *_Nullable features, NSError *_Nullable error) {
...
};
[self.textDetector detectInImage:image completion:callback];
}
上面使用的ImageUtility辅助函数判断方向:
+ (FIRVisionDetectorImageOrientation)imageOrientationFromOrientation:(UIDeviceOrientation)deviceOrientation
withCaptureDevicePosition:(AVCaptureDevicePosition)position
defaultDeviceOrientation:(UIDeviceOrientation)defaultOrientation {
if (deviceOrientation == UIDeviceOrientationFaceDown ||
deviceOrientation == UIDeviceOrientationFaceUp ||
deviceOrientation == UIDeviceOrientationUnknown) {
deviceOrientation = defaultOrientation;
}
FIRVisionDetectorImageOrientation orientation = FIRVisionDetectorImageOrientationTopLeft;
switch (deviceOrientation) {
case UIDeviceOrientationPortrait:
if (position == AVCaptureDevicePositionFront) {
orientation = FIRVisionDetectorImageOrientationLeftTop;
} else {
orientation = FIRVisionDetectorImageOrientationRightTop;
}
break;
case UIDeviceOrientationLandscapeLeft:
if (position == AVCaptureDevicePositionFront) {
orientation = FIRVisionDetectorImageOrientationBottomLeft;
} else {
orientation = FIRVisionDetectorImageOrientationTopLeft;
}
break;
case UIDeviceOrientationPortraitUpsideDown:
if (position == AVCaptureDevicePositionFront) {
orientation = FIRVisionDetectorImageOrientationRightBottom;
} else {
orientation = FIRVisionDetectorImageOrientationLeftBottom;
}
break;
case UIDeviceOrientationLandscapeRight:
if (position == AVCaptureDevicePositionFront) {
orientation = FIRVisionDetectorImageOrientationTopRight;
} else {
orientation = FIRVisionDetectorImageOrientationBottomRight;
}
break;
default:
orientation = FIRVisionDetectorImageOrientationTopLeft;
break;
}
return orientation;
}
ML Kit 早已从 Firebase 迁移出来并成为一个独立的 SDK (migration guide)。
Swift 中的快速入门示例应用展示了如何使用 ML Kit(带有 CMSampleBuffer)从实时视频流中进行文本识别,现在可在此处获取:
实时提要在 CameraViewController.swift 中实现: