[PATCH] D126389: [AMDGPU] Improve codegen of extractelement/insertelement in some cases
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu May 26 06:51:43 PDT 2022
foad added a comment.
In D126389#3538346 <https://reviews.llvm.org/D126389#3538346>, @rampitec wrote:
> Any performance numbers? The 8 element case was driven by a specific customer program and the performance of the cmp/select was better than movrel.
I don't know why that would be. Maybe the performance characteristics are different on GFX10+ compared to GFX9.
Also on GFX10+ sgpr usage does not affect occupancy, so perhaps the heuristic could be tweaked to make it more likely to use s_movrel (not v_movrel) on GFX10+.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D126389/new/
https://reviews.llvm.org/D126389
More information about the llvm-commits
mailing list