[llvm] [NVPTX] Lower 16xi8 and 8xi8 stores efficiently (PR #73646)

Tue Dec 5 11:07:36 PST 2023

steven-johnson wrote:

This seems to have injected failures into Halide codegen; we are now getting runtime errors of the form `CUDA_ERROR_MISALIGNED_ADDRESS` for `cuMemcpyDtoH()` where we didn't before. It appears we are now emitting an aligned store instruction where we previous emitted an unaligned one. Can we get a revert of this pending further investigation, please?

https://github.com/llvm/llvm-project/pull/73646