Vulkan Mouse Picking using Storage Buffers

An experiment in implementing mouse picking using Vulkan Storage Buffers.

Mouse Picking using Storage Buffers

Mouse picking is the act of using your mouse to select an object which is rendered on the screen. Two of the more common ways of achieving this by performing ray-cast queries on the host, or writing object IDs to a separate framebuffer and then reading back the value at the mouse’s location.

Ray casting involves casting a ray into your scene and checking for collisions between objects and the ray. There are some drawbacks to this; if you have multiple complex meshes shapes with many triangles, calculating collisions can be costly. Additionally, you need to write the algorithms or use third-party libraries to traverse your scene and calculate the collisions. This adds an unnecessary amount of work for small graphics-only projects.

The other option is to give each object a unique identifier when they are rendered, via Push Constants, which is then rendered to an additional framebuffer target at the same time the color buffer is being filled. After the rendering is complete, the texture is read back to the host, and the mouse coordinates are used to look up the object identifier.

OpenGL provides a glReadPixels function which can read the data from the framebuffer. Vulkan, on the other hand, is much more verbose and requires a lot more work to achieve the same result.

Using a Storage buffer

I had thought of a way to achieve mouse picking without using either of the above methods. No need for an additional render target which could take up quite a bit of memory, and no need to have complex algorithms to traverse your scene and perform ray-intersections.

I searched around for others that have done something similar but the closest one I found was this blog post by OurMachinery. OurMachinery’s method seems quite a bit more complicated than the one I am proposing.

The method works as follows: we take the entire depth range along the mouse ray and bucket it into a finite fixed size. I chose 32 for demonstration purposes, but really it should be in the 1000s. In the fragment shader, we write out the entity ID into the the appropriate bucket.

To create the buckets, we use a writable Storage Buffer which is bound to the fragment shader.

#define DEPTH_ARRAY_SCALE 32

layout(set=0, binding = 3) buffer writeonly s_Write_t
{
    uint data[DEPTH_ARRAY_SCALE];
} s_Write;

Additionally, we pass a Unique Identifier, in the form of an unsigned int, and the mouse coordinates to the fragment shader via Push Constants

layout(push_constant) uniform PushConsts
{
    vec2 MOUSE_POS;
    uint UNIQUE_ID;
} pushC;

Then in the fragment shader, we get the current depth value as calculated by the vertex shader using gl_FragCoord.z. This value should be between between 0 and 1. We scale it up by multiplying it with the length of the array ( DEPTH_ARRAY_SCALE ). This gives us the the index of the depth bucket. If the pixel we are currently shading is close to the current mouse coordinates, we write out the Unique ID to that index location.

// get the depth and scale it up by
// the total number of buckets in depth array
uint zIndex = uint(gl_FragCoord.z * DEPTH_ARRAY_SCALE);

if( length( pushC.MOUSE_POS - gl_FragCoord.xy) < 1)
{
    s_Write.data[zIndex] = pushC.UNIQUE_ID;
}

That’s it for the fragment shader!

On the host side, you can do one of two things, either use a HOST_VISIBLE storage buffer, and keep it persistently mapped. Or, use a DEVICE_LOCAL storage buffer, and execute a bufferCopy after the fragments have been written. I chose the former since it is easier.

What you have on the host is now an array where each index in the array represents a certain depth on the mouse ray. We loop through the array and find the closest non-zero value.

auto * v = ... get mapped memory ...
auto u = static_cast<uint32_t*>(v);

uint32_t SELECTED_OBJECT_ID = 0;

for(size_t i=0;i<DEPTH_ARRAY_SCALE;i++)
{
    if( u[i] != 0)
    {
        SELECTED_OBJECT_ID = u[i];
        break;
    }
}
// we have to zero out the memory each frame
std::memset(v, 0, DEPTH_ARRAY_SCALE * sizeof(uint32_t));

Below is the small snippet of code from my engine, I am drawing 4 spheres at incremental locations. Each object has an ID: 111, 222, 333, and 444.

    m_renderer.setModelMatrix( Transform({0,0,0}).getMatrix());
    m_renderer.setEntityID(111);
    m_renderer.setMaterial(m_materials.red,0);
    m_renderer.draw();

    m_renderer.setModelMatrix( Transform({0,0,-1}).getMatrix());
    m_renderer.setEntityID(222);
    m_renderer.setMaterial(m_materials.green,0);
    m_renderer.draw();

    m_renderer.setModelMatrix( Transform({0,0,-2}).getMatrix());
    m_renderer.setEntityID(333);
    m_renderer.setMaterial(m_materials.blue,0);
    m_renderer.draw();

    m_renderer.setModelMatrix( Transform({0,0,-3}).getMatrix());
    m_renderer.setEntityID(444);
    m_renderer.setMaterial(m_materials.red,0);
    m_renderer.draw();

Here is a video showing the outcome along with a print-out of the array to the console. The object that the mouse is over is the non-zero ID that is closest to the beginning of the array (left).

After experimenting with different lengths of arrays. I discovered that an array length of 4096 yielded relatively good results. Here is a video where the object ID is read back from the storage buffer and then sent back into the shader on the next frame to highlight the sphere.

And here is a video using 2500 spheres and a depth array length of 4096.

Conclusion

In my experiments, each sphere had a radius of 1 unit, and so was at least 2 units away from the nearest neighbour. This distance was sufficiently large enough that no two pixel’s on different spheres fell into the same Depth Bucket. If the spheres were much smaller, then there is a chance that two pixels of different objects would fall into the same bucket. In that case, you would have to increase the size of your depth array to account for the finer grain resolution you would need.

In general, I was pleasantly surprised by how well this worked. I am looking forward to using this implementation in a more sophisticated setting.

Gavin Wolf Written by:

Gavin is a C++ programmer with interests in computer graphics, numerical analysis and swing dancing