[llvm] [AMDGPUInstCombineIntrinsic] Do not narrow 8,16-bit amdgcn_s_buffer_load instrinsics (PR #117997)

Mon Dec 2 04:00:58 PST 2024

jayfoad wrote:

> Do not narrow 8,16-bit amdgcn_s_buffer_load instrinsics

The wording is a bit strange since it would not make sense to narrow an 8-bit load anyway.

Why is this only for s_buffer_load, not VMEM buffer_load?

Typo "instrinsic".

> We can still narrow this:
> ```asm
>   %data = call <4 x half> @llvm.amdgcn.s.buffer.load.v4f16(<4 x i32> %rsrc, i32 %ofs, i32 0)
>   %elt1 = extractelement <4 x half> %data, i32 0
>   ret half %elt1
> ```
> Into this (narrowing the load from <4 x half> to <2 x half> and keeping the extractelement):
> ```asm
>   %data = call <2 x half> @llvm.amdgcn.s.buffer.load.v2f16(<4 x i32> %rsrc, i32 %ofs, i32 0)
>   %elt1 = extractelement <2 x half> %data, i32 0
>   ret half %elt1
> ```

Are you saying that narrowing is OK if the offset does not need to be updated? Does your patch implement that, or is it a future improvement?

https://github.com/llvm/llvm-project/pull/117997