QT QOpenGLWidget:如何在不使用数据块复制的情况下修改 VBO 中的单个顶点值?
QT QOpenGLWidget : how to modify individual vertices values in VBO without using data block copy?
不知道可不可以:
- 我有一个复制到 VBO 的 QVector3D 顶点数组
- 有时我只想修改值 (x1, y1) 和 (x2, y2) 之间的顶点范围的 z 值 - 相关顶点严格遵循彼此
- 我的 "good" 想法是仅通过直接访问 VBO 来修改 z 值。
我搜索了很多,但我看到的所有解决方案都使用 memcpy,像这样:
m_vboPos.bind();
GLfloat* PosBuffer = (GLfloat*) (m_vboPos.map(QOpenGLBuffer::WriteOnly));
if (PosBuffer != (GLfloat*) NULL) {
memcpy(PosBuffer, m_Vertices.constData(), m_Vertices.size() * sizeof(QVector3D));
m_vboPos.unmap();
m_vboPos.release();
但是是复制块的数据。
我不认为使用 memcpy 在每个相关顶点中仅更改 1 个浮点值会非常有效(我在 VBO 中有数百万个顶点)。
我只是想优化,因为复制数百万个顶点需要(太)长的时间:有没有办法实现我的目标(没有 memcpy?),只需要一个浮点数? (已经试过了,但做不到,我一定是遗漏了什么)
This call here
GLfloat* PosBuffer = (GLfloat*) (m_vboPos.map(QOpenGLBuffer::WriteOnly));
will internally call glMapBuffer
which means that it just maps the buffer contents into the address space of your process (see also the OpenGL Wiki on Buffer Object Mapping.
Since you map it write-only, you can simply overwrite each and every bit of the buffer, as you see fit. There is no need to use memcpy
, you can just use any means to write to memory, e.g. you can directly do
PosBuffer[3*vertex_id + 2] = 42.0f; // assuming 3 floats per vertex
I don't think using memcpy
to change only 1 float value in every concerned vertex would be very efficient (I have several millions of vertices in the VBO).
Yes, doing a million separate memcpy()
calls for 4 bytes each will not be a good idea. A modern compiler might actually inline it, so it might be equivalent to just individual assignments, though. But you can also do the assignments directly, since memcpy
is not gaining you anything here.
However, it is not clear what the performance impacts of all this are. glMapBuffer
might return a pointer to
- some local copy of the VBO in system memory, and will have later to copy the contents to the GPU. Since it does not know which values you changed and which not, it might have to re-transmit the whole buffer.
- some system meory inside the GART area, which is mapped on the GPU, so the GPU will directly access this memory when reading from the buffer.
- some I/O-mapped region in VRAM. In this case, the caching behavior of the memory region might be significantly different, and changing a 4 bytes in every 12 byte block might not be the most ideal approach. Just re-copying the whole sub-block as one big junk might yield better performance.
The mapping itself is also not for free, it involves changing the page tables, and the GL driver might have to synchronize it's threads, or, in the worst case, synchronize with the GPU (to prevent you from overwriting stuff the GPU is still using for a previous draw call which is still in flight).
sometimes I want to modify only the z value of a range of vertices between the values (x1, y1) and (x2, y2) - the concerned vertices strictly follow each other
So you have a continuous sub-region of the buffer which you want to modify. I would recommend to look at two alternatives:
Use glMapBufferRange
(if available in your OpenGL version) to map only the region you care about.
Forget about buffer mapping completely, and try glBufferSubData()
. Not individually on each z
component of each vertex, but as one big junk for the whole range of modified vertices. This will imply you have a local copy of the buffer contents in your memory somewhere, just update in, and send the results to the GL.
Which option is better will depend on a lot of different factors, and I would not rule one of them out without benchmarking in the actual scenario, on the actual implementations you care about. Also have a look at the general strategies for Buffer Object Streaming in OpenGL. A persistently mapped buffer might or might not be also a good option for your use case.
glMap 方法效果很好,而且非常快!
非常感谢 genpfault,速度提升如此之大以至于 3D 渲染不再不稳定。
这是我的新代码,经过简化以提供易于理解的答案:
vertexbuffer.bind();
GLfloat* posBuffer = (GLfloat*) (vertexbuffer.map(QOpenGLBuffer::WriteOnly));
if (posBuffer != (GLfloat*) NULL) {
int index = NumberOfVertices(area.y + 1, image.cols); // index of first vertex on line area.y
for (row = ...) for (col = ...) {
if (mask.at<uchar>(row, col) != 0)
posBuffer[3 * index + 2] = depthmap.at<uchar>(row, col) * depth;
index++;
}
}
vertexbuffer.unmap();
vertexbuffer.release();
不知道可不可以:
- 我有一个复制到 VBO 的 QVector3D 顶点数组
- 有时我只想修改值 (x1, y1) 和 (x2, y2) 之间的顶点范围的 z 值 - 相关顶点严格遵循彼此
- 我的 "good" 想法是仅通过直接访问 VBO 来修改 z 值。
我搜索了很多,但我看到的所有解决方案都使用 memcpy,像这样:
m_vboPos.bind();
GLfloat* PosBuffer = (GLfloat*) (m_vboPos.map(QOpenGLBuffer::WriteOnly));
if (PosBuffer != (GLfloat*) NULL) {
memcpy(PosBuffer, m_Vertices.constData(), m_Vertices.size() * sizeof(QVector3D));
m_vboPos.unmap();
m_vboPos.release();
但是是复制块的数据。
我不认为使用 memcpy 在每个相关顶点中仅更改 1 个浮点值会非常有效(我在 VBO 中有数百万个顶点)。
我只是想优化,因为复制数百万个顶点需要(太)长的时间:有没有办法实现我的目标(没有 memcpy?),只需要一个浮点数? (已经试过了,但做不到,我一定是遗漏了什么)
This call here
GLfloat* PosBuffer = (GLfloat*) (m_vboPos.map(QOpenGLBuffer::WriteOnly));
will internally call glMapBuffer
which means that it just maps the buffer contents into the address space of your process (see also the OpenGL Wiki on Buffer Object Mapping.
Since you map it write-only, you can simply overwrite each and every bit of the buffer, as you see fit. There is no need to use memcpy
, you can just use any means to write to memory, e.g. you can directly do
PosBuffer[3*vertex_id + 2] = 42.0f; // assuming 3 floats per vertex
I don't think using
memcpy
to change only 1 float value in every concerned vertex would be very efficient (I have several millions of vertices in the VBO).
Yes, doing a million separate memcpy()
calls for 4 bytes each will not be a good idea. A modern compiler might actually inline it, so it might be equivalent to just individual assignments, though. But you can also do the assignments directly, since memcpy
is not gaining you anything here.
However, it is not clear what the performance impacts of all this are. glMapBuffer
might return a pointer to
- some local copy of the VBO in system memory, and will have later to copy the contents to the GPU. Since it does not know which values you changed and which not, it might have to re-transmit the whole buffer.
- some system meory inside the GART area, which is mapped on the GPU, so the GPU will directly access this memory when reading from the buffer.
- some I/O-mapped region in VRAM. In this case, the caching behavior of the memory region might be significantly different, and changing a 4 bytes in every 12 byte block might not be the most ideal approach. Just re-copying the whole sub-block as one big junk might yield better performance.
The mapping itself is also not for free, it involves changing the page tables, and the GL driver might have to synchronize it's threads, or, in the worst case, synchronize with the GPU (to prevent you from overwriting stuff the GPU is still using for a previous draw call which is still in flight).
sometimes I want to modify only the z value of a range of vertices between the values (x1, y1) and (x2, y2) - the concerned vertices strictly follow each other
So you have a continuous sub-region of the buffer which you want to modify. I would recommend to look at two alternatives:
Use
glMapBufferRange
(if available in your OpenGL version) to map only the region you care about.Forget about buffer mapping completely, and try
glBufferSubData()
. Not individually on eachz
component of each vertex, but as one big junk for the whole range of modified vertices. This will imply you have a local copy of the buffer contents in your memory somewhere, just update in, and send the results to the GL.
Which option is better will depend on a lot of different factors, and I would not rule one of them out without benchmarking in the actual scenario, on the actual implementations you care about. Also have a look at the general strategies for Buffer Object Streaming in OpenGL. A persistently mapped buffer might or might not be also a good option for your use case.
glMap 方法效果很好,而且非常快!
非常感谢 genpfault,速度提升如此之大以至于 3D 渲染不再不稳定。
这是我的新代码,经过简化以提供易于理解的答案:
vertexbuffer.bind();
GLfloat* posBuffer = (GLfloat*) (vertexbuffer.map(QOpenGLBuffer::WriteOnly));
if (posBuffer != (GLfloat*) NULL) {
int index = NumberOfVertices(area.y + 1, image.cols); // index of first vertex on line area.y
for (row = ...) for (col = ...) {
if (mask.at<uchar>(row, col) != 0)
posBuffer[3 * index + 2] = depthmap.at<uchar>(row, col) * depth;
index++;
}
}
vertexbuffer.unmap();
vertexbuffer.release();