Is your feature request related to a problem? Please describe.
See below.
Describe the solution you'd like
I recently decided, mostly for fun, to challenge myself to re-implement a concurrent hashing table based on Robin Hood hashing that I designed and built in 2016 as part of a project on GPU-accelerated Latent Dirichlet Allocation, but never actually got to work.
This time, I did get it to work - and according to my benchmarks, the resulting table has higher get throughput than cuCollections' static_map with either linear probing or double hashing, by quite a bit at sufficiently high load factors.

Repo for this comparison: https://github.com/aterenin/GPURobinHoodHashing. I've also got a blog post around some of the ideas and why I found this exercise interesting: https://avt.im/blog/sculpting-fragile-glass/.
I don't know what Nvidia's policy on external contributions to this library is, but if there is interest, I'd love to contribute the core algorithmic ideas to cuCollections so that other people can use them. I'd image this would look like a third option like cuco::double_hashing or cuco::linear_probing but which would implement the Robin Hood strategy. It'd probably improve performance in static_map and possibly other data structures depending on how advanced your templating logic is.
Would a PR on this be welcome? If yes, I'll put one together - I'd appreciate pointers on the cleanest way to add this structure-wise in case you have any: I value high-quality code and will work to do everything the right way. If not, no worries, feel free to just close the issue. Please let me know either way.
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem? Please describe.
See below.
Describe the solution you'd like
I recently decided, mostly for fun, to challenge myself to re-implement a concurrent hashing table based on Robin Hood hashing that I designed and built in 2016 as part of a project on GPU-accelerated Latent Dirichlet Allocation, but never actually got to work.
This time, I did get it to work - and according to my benchmarks, the resulting table has higher get throughput than cuCollections'
static_mapwith either linear probing or double hashing, by quite a bit at sufficiently high load factors.Repo for this comparison: https://github.com/aterenin/GPURobinHoodHashing. I've also got a blog post around some of the ideas and why I found this exercise interesting: https://avt.im/blog/sculpting-fragile-glass/.
I don't know what Nvidia's policy on external contributions to this library is, but if there is interest, I'd love to contribute the core algorithmic ideas to cuCollections so that other people can use them. I'd image this would look like a third option like
cuco::double_hashingorcuco::linear_probingbut which would implement the Robin Hood strategy. It'd probably improve performance instatic_mapand possibly other data structures depending on how advanced your templating logic is.Would a PR on this be welcome? If yes, I'll put one together - I'd appreciate pointers on the cleanest way to add this structure-wise in case you have any: I value high-quality code and will work to do everything the right way. If not, no worries, feel free to just close the issue. Please let me know either way.
Describe alternatives you've considered
No response
Additional context
No response