[PATCH] D126389: [AMDGPU] Improve codegen of extractelement/insertelement in some cases

Thu May 26 07:35:14 PDT 2022

jpages added a comment.

In D126389#3539881 <https://reviews.llvm.org/D126389#3539881>, @foad wrote:

> In D126389#3538346 <https://reviews.llvm.org/D126389#3538346>, @rampitec wrote:
>
>> Any performance numbers? The 8 element case was driven by a specific customer program and the performance of the cmp/select was better than movrel.
>
> I don't know why that would be. Maybe the performance characteristics are different on GFX10+ compared to GFX9.
>
> Also on GFX10+ sgpr usage does not affect occupancy, so perhaps the heuristic could be tweaked to make it more likely to use s_movrel (not v_movrel) on GFX10+.

I will try to get some performance numbers on specific games. Do you know if this performance problem was specific to an architecture? 
Like Jay said, I could tweak the heuristic for this and only generate it for GFX10+.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126389/new/

https://reviews.llvm.org/D126389