在 iOS 上使用 Metal 提前使用卤化物 (AOT)

Question

我正在尝试使用 Metal 作为我的提前 (AOT) 卤化物管道的目标，用于 iOS。

我已经成功创建了一个使用 Metal 生成静态二进制文件的 Halide 生成器。我可以 link 并在我的 iOS 应用程序中调用这个二进制文件。

但是，当我将 Buffer<uint8_t> input_ 传递给函数时，Buffer 中的数据在 GPU 端似乎始终为零。请注意，这仅在 iOS 上的 GPU 上运行时发生。

发电机

#include "Halide.h"

using namespace Halide;

class MyHalideTest : public Halide::Generator<MyHalideTest> {
public:
    Input<Buffer<uint8_t>> input{"input", 3};
    Input<int32_t> width{"width"};
    Input<int32_t> height{"height"};
    Output<Buffer<uint8_t>> output{"output", 3};

    void generate() {
        output(x,y,c) = cast<uint8_t>(input(x,y,c)+25);
    }

    void schedule() {
        input
            .dim(0).set_stride(4)
            .dim(2).set_stride(1).set_bounds(0, 4);
        output
            .dim(0).set_stride(4)
            .dim(2).set_stride(1).set_bounds(0, 4);

        if (get_target().has_gpu_feature()) {
            output
                .reorder(c, x, y)
                .bound(c, 0, 4)
                .unroll(c);
            output.gpu_tile(x, y, xo, yo, xi, yi, 16, 16);
        }
        else {
            output
                .reorder(c, x, y)
                .unroll(c)
                .split(y, yo, yi, 16)
                .parallel(yo)
                .vectorize(x, 8);
        }
    }

private:
    Var x{"x"}, y{"y"}, c{"c"}, xi{"xi"}, xo{"xo"}, yi{"yi"}, yo{"yo"};

};

HALIDE_REGISTER_GENERATOR(MyHalideTest, "halide_test")

生成生成器的命令行

./MyHalideTest_generator -g halide_test \
-f halide_test_ARM64_metal \
-n halide_test_ARM64_metal \
-o "${DERIVED_FILE_DIR}" \
target=arm-64-ios-metal-debug-user_context

iOS 调用 Halide 函数的代码

Buffer<uint8_t> input_;
Buffer<uint8_t> output_;

// Other setup

- (void)initBuffersWithWidth:(int)w height:(int)h using_metal:(bool)using_metal
{
    // We really only need to pad this for the use_metal case,
    // but it doesn't really hurt to always do it.
    const int c = 4;
    const int pad_pixels = (64 / sizeof(int32_t));
    const int row_stride = (w + pad_pixels - 1) & ~(pad_pixels - 1);
    const halide_dimension_t pixelBufShape[] = {
        {0, w, c},
        {0, h, c * row_stride},
        {0, c, 1}
    };

    input_ = Buffer<uint8_t>(nullptr, 3, pixelBufShape);
    input_.allocate();
    auto buf = input_.raw_buffer()->host;
    memset(buf, 200, input_.size_in_bytes());

    // This allows us to make a Buffer with an arbitrary shape
    // and memory managed by Buffer itself
    output_ = Buffer<uint8_t>(nullptr, 3, pixelBufShape);
    output_.allocate();
}

...

/** Calling Halide function here **/
halide_test((__bridge void *)self, input_, width, height, output_);
output_.copy_to_host();

// Display output image...

因此，代码将 input_ 缓冲区的值设置为 200。返回的 output_ 缓冲区应为 225，但事实并非如此。所有的值都只有 25.

我应该注意到，当运行在我笔记本电脑的 GPU 和 phone 的 CPU 上时，这正确。唯一的区别是卤化物发生器 target.

关于为什么运行 Halide 函数时 Input<Buffer<uint8_t>> input 似乎设置为全零的任何想法？

调试语句似乎在设备端 malloc 内存，但我没有看到明确的语句说 halide_copy_to_device。

Answer 1

如果在 Buffer 中设置值，则需要将其标记为脏：input_.set_host_dirty()

在 iOS 上使用 Metal 提前使用卤化物 (AOT)

Using Halide ahead-of-time (AOT) with Metal on iOS

c++

ios

halide