为什么传输的缓冲区在 javascript 中绝育?

why are transfered buffers neutered in javascript?

Javascript 允许将缓冲区从源线程传输到工作线程。否则,ArrayBuffer 被复制,然后传递给 worker。在源线程 [1]:

中无法访问传输的缓冲区 ("neutered")
// create data that can be transfered
var arr = new Uint8Array(5);

// outputs: 5
console.log(arr.buffer.byteLength);

var worker = new Worker("some_worker.js");

// transfer the buffer
worker.postMessage({arr: arr}, [arr.buff]);

// the buffer vanishes. is "Neutered"
// outputs: 0
console.log(arr.buffer.byteLength);

我不明白这个机制是如何运作的。然而,我很好奇为什么要引入它。为什么不在工作线程之间共享数据,就像在传统的线程模型中那样,允许多个线程访问同一个数据 内存区域?


同一问题的其他表述需要说明:

为什么缓冲区在传输时被绝育? / 这种机制背后的原因是什么? / 为什么介绍它?为什么不能在 Worker 之间共享内存区域?

我正在寻找来自可靠 and/or 官方来源的答案。


[1] https://developer.mozilla.org/en/docs/Web/API/Worker/postMessage

原因是性能。发送的数据没有被复制,ArrayBuffer的所有权转移给接收者。

对于共享内存,你应该使用SharedArrayBuffer

这是一个非常基础的多线程问题。如果 Array 在主线程和 worker 中都可以访问,则必须实现互斥锁,以便在访问缓冲区时不会出现竞争条件。另外,我认为数组缓冲区通常在您需要性能时使用,但是锁定来自该缓冲区的 reading/writing 数据会使工作进程变慢。

我猜这是资源 "moved" 而未共享的原因之一。

TL;DR: 多线程

在 Web Workers 中引入了可传输对象,以提高性能而不是复制对象(尤其是当我们谈论大尺寸对象时)。它可以并行化为常见编程语言(例如C/C++)中按值传递和按引用传递之间的比较。

可能添加了进一步的限制,即不能在源工作线程中使用可传输对象,这样可以保证两个不同线程之间不会出现竞争条件(为了方便工作开发人员不必关心这一点)。此外,还需要在 Javascript 中实现更多的并发原语(例如互斥体等)。本质上,使用 "transfer" 意味着您只是打算将数据传输到另一个线程,而不是同时从 2 个线程使用它们,所以我们可以说实现是有意义的。

一般来说,Web Worker 并不是设计为共享内存模型,而是设计为消息交换模型。

要进一步阅读性能差异,请查看 this. You can also check this,其中讨论了为什么 WebKit 中的 Web Workers 没有采用共享内存模型。

根据WHATWG ML, the choice was to be thread safe,因为

You can't share data between workers. There is no (and there cannot be) any shared state between multiple threads of JS execution.

(source)

此外,

We want to make the source ArrayBuffer, and any ArrayBufferViews, zero length upon posting them to a worker or back to the main thread. By ping-ponging the same ArrayBuffer back and forth you can avoid allocating new backing store each iteration.

(source)

不幸的是,我没有找到关于规范的讨论,the page 它应该托管的地方给出了 404,我会尝试在其他地方找到一份副本

它似乎是受历史环境的推动,因为在工人被引入为消息传递 API [1] 之后,可转让对象被添加到规范中,目的是进行最小的更改 [ 2] [3].


[1] https://bugzilla.mozilla.org/show_bug.cgi?id=720083(请求在 Firefox 中实施)

[2] https://mail.mozilla.org/pipermail/es-discuss/2014-May/037239.html

Because when the concept of Transferable was formalized in the HTML5 spec, there was a goal to make the minimal possible changes. Transferable was basically a generalization of MessagePort, which was the only type that previously could be "transferred" to a web worker. Neutering is only a concept in spec text and not in the IDL. The Transferable typedef doesn't have any associated methods. The only way to neuter an object is to transfer it to a web worker. There were requests to provide a "close()" method and make Transferable a sub-interface of a new Closable interface. We resisted making those changes because they would have essentially introduced manual memory management to JavaScript.

[3] https://mail.mozilla.org/pipermail/es-discuss/2014-May/037227.html

First, some background. When typed arrays were designed, they were specified with Web IDL and its ECMAScript binding. There were attempts during typed arrays' development to throw exceptions on some operations -- like out-of-range indexing -- but one by one these were discovered to be incompatible with either Web IDL's or ECMAScript's semantics like property lookup.

要在 javascript 中使用这些概念,您必须使用这些编码,

PostMesage(aMessage, transferList)

在 transferList 中,您必须指定可传输的对象,这些对象包含在 aMessage 中:

var objData =
{
    str: "string",
    ab: new ArrayBuffer(100),
    i8: new Int8Array(200)
};
objWorker.postMessage(objData, [objData.ab, objData.i8.buffer]);

On other side:

self.onmessage = function(objEvent)
{
    var strText = objEvent.data.str;
    var objTypedArray = objEvent.data.ab;
    var objTypedArrayView = objEvent.data.i8;
}

要使用 "transferable objects",您实际上是将对象的所有权转移给网络工作者或从网络工作者那里转移。这就像在没有复制的情况下通过引用传递。它与普通引用传递的区别在于,传输数据的一方无法再访问它。

明确一点 transfers 适用于专用和共享 worker,因为它们都使用 MessagePorts

[1]Dedicated workers use MessagePort objects behind the scenes.

[2]Communicating with shared workers is done with explicit MessagePort objects

postMessage 指定根据调用者的选择进行传输或克隆:

[3]port.postMessage(message [, transfer] ) Posts a message through the channel. Objects listed in transfer are transferred, not just cloned, meaning that they are no longer usable on the sending side.

然而,这仅表示张贴者是否保留副本,通常基于效率,而不是是否共享任何内存。

当谈到“内存”时,明确规定无论工作人员类型如何,无论数据是否被传输或克隆,都不得共享:

[4]When a user agent is to run a worker for a script with URL url, an environment settings object settings object, and a URL referrer it must run the following steps:

Create a separate parallel execution environment (i.e. a separate thread or process or equivalent construct), and run the rest of these steps in that context.

那么现在的问题是:为什么?为什么 必须 用户代理为所有类型的工作人员创建一个并行执行环境?

安全?不,效率? (js什么时候高效了?),都不是。

原因是能够遵守或更确切地说尊重整个规范。如果您遵循 link [4],您至少会注意到:

When a user agent is to terminate a worker it must run the following steps in parallel with the worker's main loop (the "run a worker" processing model defined above):

1)Set the worker's WorkerGlobalScope object's closing flag to true.

2)If there are any tasks queued in the WorkerGlobalScope object's event loop's task queues, discard them without processing them.

3)Abort the script currently running in the worker.

4)If the worker's WorkerGlobalScope object is actually a DedicatedWorkerGlobalScope object (i.e. the worker is a dedicated worker), then empty the port message queue of the port that the worker's implicit port is entangled with.

这只是规范的一部分。

那又是为什么?它能够管理发生在 worker space 中的全部事件。实施者 必须 并行化 workers 或彻底发疯。 :)