如何使用 RGB 图像作为 C# EvalDll Wrapper 的输入?
How to use RGB Image as input for the C# EvalDll Wrapper?
我使用提供的 ImageReader 训练了一个网络,现在,我尝试在 C# 项目中使用 CNTK EvalDll 来评估 RGB 图像。
我看过与 EvalDll 相关的示例,但输入始终是 float/double 的数组,而不是图像。
如何使用公开的接口将经过训练的网络与 RGB 图像一起使用?
我假设您希望使用 ImageReader
进行阅读,您的 reader 配置类似于
features=[
width=224
height=224
channels=3
cropType=Center
]
您需要辅助函数来创建裁剪并将图像调整为网络接受的大小。
我将定义 System.Drawing.Bitmap
的 2 种扩展方法,一种用于裁剪,一种用于调整大小:
open System.Collections.Generic
open System.Drawing
open System.Drawing.Drawing2D
open System.Drawing.Imaging
type Bitmap with
/// Crops the image in the present object, starting at the given (column, row), and retaining
/// the given number of columns and rows.
member this.Crop(column, row, numCols, numRows) =
let rect = Rectangle(column, row, numCols, numRows)
this.Clone(rect, this.PixelFormat)
/// Creates a resized version of the present image. The returned image
/// will have the given width and height. This may distort the aspect ratio
/// of the image.
member this.ResizeImage(width, height, useHighQuality) =
// Rather than using image.GetThumbnailImage, use direct image resizing.
// GetThumbnailImage throws odd out-of-memory exceptions on some
// images, see also
//
// Use the interpolation method suggested on
//
let rect = Rectangle(0, 0, width, height);
let destImage = new Bitmap(width, height);
destImage.SetResolution(this.HorizontalResolution, this.VerticalResolution);
use graphics = Graphics.FromImage destImage
graphics.CompositingMode <- CompositingMode.SourceCopy;
if useHighQuality then
graphics.InterpolationMode <- InterpolationMode.HighQualityBicubic
graphics.CompositingQuality <- CompositingQuality.HighQuality
graphics.SmoothingMode <- SmoothingMode.HighQuality
graphics.PixelOffsetMode <- PixelOffsetMode.HighQuality
else
graphics.InterpolationMode <- InterpolationMode.Low
use wrapMode = new ImageAttributes()
wrapMode.SetWrapMode WrapMode.TileFlipXY
graphics.DrawImage(this, rect, 0, 0, this.Width,this.Height, GraphicsUnit.Pixel, wrapMode)
destImage
在此基础上,定义一个函数来做中心裁剪:
/// Returns a square sub-image from the center of the given image, with
/// a size that is cropRatio times the smallest image dimension. The
/// aspect ratio is preserved.
let CenterCrop cropRatio (image: Bitmap) =
let cropSize =
float(min image.Height image.Width) * cropRatio
|> int
let startRow = (image.Height - cropSize) / 2
let startCol = (image.Width - cropSize) / 2
image.Crop(startCol, startRow, cropSize, cropSize)
然后将它们全部插入:裁剪、调整大小,然后以 OpenCV 使用的平面顺序遍历图像:
/// Creates a list of CNTK feature values from a given bitmap.
/// The image is first resized to fit into an (targetSize x targetSize) bounding box,
/// then the image planes are converted to a CNTK tensor.
/// Returns a list with targetSize*targetSize*3 values.
let ImageToFeatures (image: Bitmap, targetSize) =
// Apply the same image pre-processing that is typically done
// in CNTK when running it in test or write mode: Take a center
// crop of the image, then re-size it to the network input size.
let cropped = CenterCrop 1.0 image
let resized = cropped.ResizeImage(targetSize, targetSize, false)
// Ensure that the initial capacity of the list is provided
// with the constructor. Creating the list via the default constructor
// makes the whole operation 20% slower.
let features = List (targetSize * targetSize * 3)
// Traverse the image in the format that is used in OpenCV:
// First the B plane, then the G plane, R plane
for c in 0 .. 2 do
for h in 0 .. (resized.Height - 1) do
for w in 0 .. (resized.Width - 1) do
let pixel = resized.GetPixel(w, h)
let v =
match c with
| 0 -> pixel.B
| 1 -> pixel.G
| 2 -> pixel.R
| _ -> failwith "No such channel"
|> float32
features.Add v
features
用有问题的图像调用 ImageToFeatures
,将结果提供给 IEvaluateModelManagedF
的实例,就可以了。我假设您的 RGB 图像来自 myImage
,并且您正在使用 224 x 224 的网络大小进行二进制分类。
let LoadModelOnCpu modelPath =
let model = new IEvaluateModelManagedF()
let description = sprintf "deviceId=-1\r\nmodelPath=\"%s\"" modelPath
model.Init description
model.CreateNetwork description
model
let model = LoadModelOnCpu("myModelFile")
let featureDict = Dictionary()
featureDict.["features"] <- ImageToFeatures(myImage, 224)
model.Evaluate(featureDict, "OutputNodes.z", 2)
我在 C# 中实现了类似的代码,加载模型,读取测试图像,执行适当的 cropping/scaling/etc,然后运行模型。正如安东指出的那样,输出与 CNTK 的输出不匹配 100%,但非常接近。
图像读取/裁剪/缩放代码:
private static Bitmap ImCrop(Bitmap img, int col, int row, int numCols, int numRows)
{
var rect = new Rectangle(col, row, numCols, numRows);
return img.Clone(rect, System.Drawing.Imaging.PixelFormat.DontCare);
}
/// Returns a square sub-image from the center of the given image, with
/// a size that is cropRatio times the smallest image dimension. The
/// aspect ratio is preserved.
private static Bitmap ImCropToCenter(Bitmap img, double cropRatio)
{
var cropSize = (int)Math.Round(Math.Min(img.Height, img.Width) * cropRatio);
var startCol = (img.Width - cropSize) / 2;
var startRow = (img.Height - cropSize) / 2;
return ImCrop(img, startCol, startRow, cropSize, cropSize);
}
/// Creates a resized version of the present image. The returned image
/// will have the given width and height. This may distort the aspect ratio
/// of the image.
private static Bitmap ImResize(Bitmap img, int width, int height)
{
return new Bitmap(img, new Size(width, height));
}
加载模型的代码和包含像素的xml文件意味着:
public static IEvaluateModelManagedF loadModel(string modelPath, string outputLayerName)
{
var networkConfiguration = String.Format("modelPath=\"{0}\" outputNodeNames=\"{1}\"", modelPath, outputLayerName);
Stopwatch stopWatch = new Stopwatch();
var model = new IEvaluateModelManagedF();
model.CreateNetwork(networkConfiguration, deviceId: -1);
stopWatch.Stop();
Console.WriteLine("Time to create network: {0} ms.", stopWatch.ElapsedMilliseconds);
return model;
}
/// Read the xml mean file, i.e. the offsets which are substracted
/// from each pixel in an image before using it as input to a CNTK model.
public static float[] readXmlMeanFile(string XmlPath, int ImgWidth, int ImgHeight)
{
// Read and parse pixel value xml file
XmlTextReader reader = new XmlTextReader(XmlPath);
reader.ReadToFollowing("data");
reader.Read();
var pixelMeansXml =
reader.Value.Split(new[] { "\r", "\n", " " }, StringSplitOptions.RemoveEmptyEntries)
.Select(Single.Parse)
.ToArray();
// Re-order mean pixel values to be in the same order as the bitmap
// image (as outputted by the getRGBChannels() function).
int inputDim = 3 * ImgWidth * ImgHeight;
Debug.Assert(pixelMeansXml.Length == inputDim);
var pixelMeans = new float[inputDim];
int counter = 0;
for (int c = 0; c < 3; c++)
for (int h = 0; h < ImgHeight; h++)
for (int w = 0; w < ImgWidth; w++)
{
int xmlIndex = h * ImgWidth * 3 + w * 3 + c;
pixelMeans[counter++] = pixelMeansXml[xmlIndex];
}
return pixelMeans;
}
加载图像并转换为模型输入的代码:
/// Creates a list of CNTK feature values from a given bitmap.
/// The image is first resized to fit into an (targetSize x targetSize) bounding box,
/// then the image planes are converted to a CNTK tensor, and the mean
/// pixel value substracted. Returns a list with targetSize * targetSize * 3 floats.
private static List<float> ImageToFeatures(Bitmap img, int targetSize, float[] pixelMeans)
{
// Apply the same image pre-processing that is done typically in CNTK:
// Take a center crop of the image, then re-size it to the network input size.
var imgCropped = ImCropToCenter(img, 1.0);
var imgResized = ImResize(imgCropped, targetSize, targetSize);
// Convert pixels to CNTK model input.
// Fast pixel extraction is ~5 faster while giving identical output
var features = new float[3 * imgResized.Height * imgResized.Width];
var boFastPixelExtraction = true;
if (boFastPixelExtraction)
{
var pixelsRGB = ImGetRGBChannels(imgResized);
for (int c = 0; c < 3; c++)
{
byte[] pixels = pixelsRGB[2 - c];
Debug.Assert(pixels.Length == imgResized.Height * imgResized.Width);
for (int i = 0; i < pixels.Length; i++)
{
int featIndex = i + c * pixels.Length;
features[featIndex] = pixels[i] - pixelMeans[featIndex];
}
}
}
else
{
// Traverse the image in the format that is used in OpenCV:
// First the B plane, then the G plane, R plane
// Note: calling GetPixel(w, h) repeatedly is slow!
int featIndex = 0;
for (int c = 0; c < 3; c++)
for (int h = 0; h < imgResized.Height; h++)
for (int w = 0; w < imgResized.Width; w++)
{
var pixel = imgResized.GetPixel(w, h);
float v;
if (c == 0)
v = pixel.B;
else if (c == 1)
v = pixel.G;
else if (c == 2)
v = pixel.R;
else
throw new Exception("");
// Substract pixel mean
features[featIndex] = v - pixelMeans[featIndex];
featIndex++;
}
}
return features.ToList();
}
/// Convert bitmap image to R,G,B channel byte arrays.
/// See:
private static List<byte[]> ImGetRGBChannels(Bitmap bmp)
{
// Lock the bitmap's bits.
Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
BitmapData bmpData = bmp.LockBits(rect, ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
// Declare an array to hold the bytes of the bitmap.
int bytes = bmpData.Stride * bmp.Height;
byte[] rgbValues = new byte[bytes];
byte[] r = new byte[bytes / 3];
byte[] g = new byte[bytes / 3];
byte[] b = new byte[bytes / 3];
// Copy the RGB values into the array, starting from ptr to the first line
IntPtr ptr = bmpData.Scan0;
Marshal.Copy(ptr, rgbValues, 0, bytes);
// Populate byte arrays
int count = 0;
int stride = bmpData.Stride;
for (int col = 0; col < bmpData.Height; col++)
{
for (int row = 0; row < bmpData.Width; row++)
{
int offset = (col * stride) + (row * 3);
b[count] = rgbValues[offset];
g[count] = rgbValues[offset + 1];
r[count++] = rgbValues[offset + 2];
}
}
bmp.UnlockBits(bmpData);
return new List<byte[]> { r, g, b };
}
我使用提供的 ImageReader 训练了一个网络,现在,我尝试在 C# 项目中使用 CNTK EvalDll 来评估 RGB 图像。
我看过与 EvalDll 相关的示例,但输入始终是 float/double 的数组,而不是图像。
如何使用公开的接口将经过训练的网络与 RGB 图像一起使用?
我假设您希望使用 ImageReader
进行阅读,您的 reader 配置类似于
features=[
width=224
height=224
channels=3
cropType=Center
]
您需要辅助函数来创建裁剪并将图像调整为网络接受的大小。
我将定义 System.Drawing.Bitmap
的 2 种扩展方法,一种用于裁剪,一种用于调整大小:
open System.Collections.Generic
open System.Drawing
open System.Drawing.Drawing2D
open System.Drawing.Imaging
type Bitmap with
/// Crops the image in the present object, starting at the given (column, row), and retaining
/// the given number of columns and rows.
member this.Crop(column, row, numCols, numRows) =
let rect = Rectangle(column, row, numCols, numRows)
this.Clone(rect, this.PixelFormat)
/// Creates a resized version of the present image. The returned image
/// will have the given width and height. This may distort the aspect ratio
/// of the image.
member this.ResizeImage(width, height, useHighQuality) =
// Rather than using image.GetThumbnailImage, use direct image resizing.
// GetThumbnailImage throws odd out-of-memory exceptions on some
// images, see also
//
// Use the interpolation method suggested on
//
let rect = Rectangle(0, 0, width, height);
let destImage = new Bitmap(width, height);
destImage.SetResolution(this.HorizontalResolution, this.VerticalResolution);
use graphics = Graphics.FromImage destImage
graphics.CompositingMode <- CompositingMode.SourceCopy;
if useHighQuality then
graphics.InterpolationMode <- InterpolationMode.HighQualityBicubic
graphics.CompositingQuality <- CompositingQuality.HighQuality
graphics.SmoothingMode <- SmoothingMode.HighQuality
graphics.PixelOffsetMode <- PixelOffsetMode.HighQuality
else
graphics.InterpolationMode <- InterpolationMode.Low
use wrapMode = new ImageAttributes()
wrapMode.SetWrapMode WrapMode.TileFlipXY
graphics.DrawImage(this, rect, 0, 0, this.Width,this.Height, GraphicsUnit.Pixel, wrapMode)
destImage
在此基础上,定义一个函数来做中心裁剪:
/// Returns a square sub-image from the center of the given image, with
/// a size that is cropRatio times the smallest image dimension. The
/// aspect ratio is preserved.
let CenterCrop cropRatio (image: Bitmap) =
let cropSize =
float(min image.Height image.Width) * cropRatio
|> int
let startRow = (image.Height - cropSize) / 2
let startCol = (image.Width - cropSize) / 2
image.Crop(startCol, startRow, cropSize, cropSize)
然后将它们全部插入:裁剪、调整大小,然后以 OpenCV 使用的平面顺序遍历图像:
/// Creates a list of CNTK feature values from a given bitmap.
/// The image is first resized to fit into an (targetSize x targetSize) bounding box,
/// then the image planes are converted to a CNTK tensor.
/// Returns a list with targetSize*targetSize*3 values.
let ImageToFeatures (image: Bitmap, targetSize) =
// Apply the same image pre-processing that is typically done
// in CNTK when running it in test or write mode: Take a center
// crop of the image, then re-size it to the network input size.
let cropped = CenterCrop 1.0 image
let resized = cropped.ResizeImage(targetSize, targetSize, false)
// Ensure that the initial capacity of the list is provided
// with the constructor. Creating the list via the default constructor
// makes the whole operation 20% slower.
let features = List (targetSize * targetSize * 3)
// Traverse the image in the format that is used in OpenCV:
// First the B plane, then the G plane, R plane
for c in 0 .. 2 do
for h in 0 .. (resized.Height - 1) do
for w in 0 .. (resized.Width - 1) do
let pixel = resized.GetPixel(w, h)
let v =
match c with
| 0 -> pixel.B
| 1 -> pixel.G
| 2 -> pixel.R
| _ -> failwith "No such channel"
|> float32
features.Add v
features
用有问题的图像调用 ImageToFeatures
,将结果提供给 IEvaluateModelManagedF
的实例,就可以了。我假设您的 RGB 图像来自 myImage
,并且您正在使用 224 x 224 的网络大小进行二进制分类。
let LoadModelOnCpu modelPath =
let model = new IEvaluateModelManagedF()
let description = sprintf "deviceId=-1\r\nmodelPath=\"%s\"" modelPath
model.Init description
model.CreateNetwork description
model
let model = LoadModelOnCpu("myModelFile")
let featureDict = Dictionary()
featureDict.["features"] <- ImageToFeatures(myImage, 224)
model.Evaluate(featureDict, "OutputNodes.z", 2)
我在 C# 中实现了类似的代码,加载模型,读取测试图像,执行适当的 cropping/scaling/etc,然后运行模型。正如安东指出的那样,输出与 CNTK 的输出不匹配 100%,但非常接近。
图像读取/裁剪/缩放代码:
private static Bitmap ImCrop(Bitmap img, int col, int row, int numCols, int numRows)
{
var rect = new Rectangle(col, row, numCols, numRows);
return img.Clone(rect, System.Drawing.Imaging.PixelFormat.DontCare);
}
/// Returns a square sub-image from the center of the given image, with
/// a size that is cropRatio times the smallest image dimension. The
/// aspect ratio is preserved.
private static Bitmap ImCropToCenter(Bitmap img, double cropRatio)
{
var cropSize = (int)Math.Round(Math.Min(img.Height, img.Width) * cropRatio);
var startCol = (img.Width - cropSize) / 2;
var startRow = (img.Height - cropSize) / 2;
return ImCrop(img, startCol, startRow, cropSize, cropSize);
}
/// Creates a resized version of the present image. The returned image
/// will have the given width and height. This may distort the aspect ratio
/// of the image.
private static Bitmap ImResize(Bitmap img, int width, int height)
{
return new Bitmap(img, new Size(width, height));
}
加载模型的代码和包含像素的xml文件意味着:
public static IEvaluateModelManagedF loadModel(string modelPath, string outputLayerName)
{
var networkConfiguration = String.Format("modelPath=\"{0}\" outputNodeNames=\"{1}\"", modelPath, outputLayerName);
Stopwatch stopWatch = new Stopwatch();
var model = new IEvaluateModelManagedF();
model.CreateNetwork(networkConfiguration, deviceId: -1);
stopWatch.Stop();
Console.WriteLine("Time to create network: {0} ms.", stopWatch.ElapsedMilliseconds);
return model;
}
/// Read the xml mean file, i.e. the offsets which are substracted
/// from each pixel in an image before using it as input to a CNTK model.
public static float[] readXmlMeanFile(string XmlPath, int ImgWidth, int ImgHeight)
{
// Read and parse pixel value xml file
XmlTextReader reader = new XmlTextReader(XmlPath);
reader.ReadToFollowing("data");
reader.Read();
var pixelMeansXml =
reader.Value.Split(new[] { "\r", "\n", " " }, StringSplitOptions.RemoveEmptyEntries)
.Select(Single.Parse)
.ToArray();
// Re-order mean pixel values to be in the same order as the bitmap
// image (as outputted by the getRGBChannels() function).
int inputDim = 3 * ImgWidth * ImgHeight;
Debug.Assert(pixelMeansXml.Length == inputDim);
var pixelMeans = new float[inputDim];
int counter = 0;
for (int c = 0; c < 3; c++)
for (int h = 0; h < ImgHeight; h++)
for (int w = 0; w < ImgWidth; w++)
{
int xmlIndex = h * ImgWidth * 3 + w * 3 + c;
pixelMeans[counter++] = pixelMeansXml[xmlIndex];
}
return pixelMeans;
}
加载图像并转换为模型输入的代码:
/// Creates a list of CNTK feature values from a given bitmap.
/// The image is first resized to fit into an (targetSize x targetSize) bounding box,
/// then the image planes are converted to a CNTK tensor, and the mean
/// pixel value substracted. Returns a list with targetSize * targetSize * 3 floats.
private static List<float> ImageToFeatures(Bitmap img, int targetSize, float[] pixelMeans)
{
// Apply the same image pre-processing that is done typically in CNTK:
// Take a center crop of the image, then re-size it to the network input size.
var imgCropped = ImCropToCenter(img, 1.0);
var imgResized = ImResize(imgCropped, targetSize, targetSize);
// Convert pixels to CNTK model input.
// Fast pixel extraction is ~5 faster while giving identical output
var features = new float[3 * imgResized.Height * imgResized.Width];
var boFastPixelExtraction = true;
if (boFastPixelExtraction)
{
var pixelsRGB = ImGetRGBChannels(imgResized);
for (int c = 0; c < 3; c++)
{
byte[] pixels = pixelsRGB[2 - c];
Debug.Assert(pixels.Length == imgResized.Height * imgResized.Width);
for (int i = 0; i < pixels.Length; i++)
{
int featIndex = i + c * pixels.Length;
features[featIndex] = pixels[i] - pixelMeans[featIndex];
}
}
}
else
{
// Traverse the image in the format that is used in OpenCV:
// First the B plane, then the G plane, R plane
// Note: calling GetPixel(w, h) repeatedly is slow!
int featIndex = 0;
for (int c = 0; c < 3; c++)
for (int h = 0; h < imgResized.Height; h++)
for (int w = 0; w < imgResized.Width; w++)
{
var pixel = imgResized.GetPixel(w, h);
float v;
if (c == 0)
v = pixel.B;
else if (c == 1)
v = pixel.G;
else if (c == 2)
v = pixel.R;
else
throw new Exception("");
// Substract pixel mean
features[featIndex] = v - pixelMeans[featIndex];
featIndex++;
}
}
return features.ToList();
}
/// Convert bitmap image to R,G,B channel byte arrays.
/// See:
private static List<byte[]> ImGetRGBChannels(Bitmap bmp)
{
// Lock the bitmap's bits.
Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
BitmapData bmpData = bmp.LockBits(rect, ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
// Declare an array to hold the bytes of the bitmap.
int bytes = bmpData.Stride * bmp.Height;
byte[] rgbValues = new byte[bytes];
byte[] r = new byte[bytes / 3];
byte[] g = new byte[bytes / 3];
byte[] b = new byte[bytes / 3];
// Copy the RGB values into the array, starting from ptr to the first line
IntPtr ptr = bmpData.Scan0;
Marshal.Copy(ptr, rgbValues, 0, bytes);
// Populate byte arrays
int count = 0;
int stride = bmpData.Stride;
for (int col = 0; col < bmpData.Height; col++)
{
for (int row = 0; row < bmpData.Width; row++)
{
int offset = (col * stride) + (row * 3);
b[count] = rgbValues[offset];
g[count] = rgbValues[offset + 1];
r[count++] = rgbValues[offset + 2];
}
}
bmp.UnlockBits(bmpData);
return new List<byte[]> { r, g, b };
}