arsenm wrote: > Correct me if I am wrong, but I think we can't do lookup tables efficiently on AMDGPU due to high memory latencies. If we can use scalar loads it's probably not that bad, but requires benchmarking https://github.com/llvm/llvm-project/pull/149937