[llvm] [AMDGPUInstCombineIntrinsic] Do not narrow 8,16-bit amdgcn_s_buffer_load instrinsics (PR #117997)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 2 04:00:58 PST 2024
jayfoad wrote:
> Do not narrow 8,16-bit amdgcn_s_buffer_load instrinsics
The wording is a bit strange since it would not make sense to narrow an 8-bit load anyway.
Why is this only for s_buffer_load, not VMEM buffer_load?
Typo "instrinsic".
> We can still narrow this:
> ```asm
> %data = call <4 x half> @llvm.amdgcn.s.buffer.load.v4f16(<4 x i32> %rsrc, i32 %ofs, i32 0)
> %elt1 = extractelement <4 x half> %data, i32 0
> ret half %elt1
> ```
> Into this (narrowing the load from <4 x half> to <2 x half> and keeping the extractelement):
> ```asm
> %data = call <2 x half> @llvm.amdgcn.s.buffer.load.v2f16(<4 x i32> %rsrc, i32 %ofs, i32 0)
> %elt1 = extractelement <2 x half> %data, i32 0
> ret half %elt1
> ```
Are you saying that narrowing is OK if the offset does not need to be updated? Does your patch implement that, or is it a future improvement?
https://github.com/llvm/llvm-project/pull/117997
More information about the llvm-commits
mailing list