[PATCH] D126389: [AMDGPU] Improve codegen of extractelement/insertelement in some cases
Julien Pagès via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 30 16:26:16 PDT 2022
jpages added a comment.
In D126389#3540486 <https://reviews.llvm.org/D126389#3540486>, @rampitec wrote:
> In D126389#3540098 <https://reviews.llvm.org/D126389#3540098>, @jpages wrote:
>
>> In D126389#3539881 <https://reviews.llvm.org/D126389#3539881>, @foad wrote:
>>
>>> In D126389#3538346 <https://reviews.llvm.org/D126389#3538346>, @rampitec wrote:
>>>
>>>> Any performance numbers? The 8 element case was driven by a specific customer program and the performance of the cmp/select was better than movrel.
>>>
>>> I don't know why that would be. Maybe the performance characteristics are different on GFX10+ compared to GFX9.
>>>
>>> Also on GFX10+ sgpr usage does not affect occupancy, so perhaps the heuristic could be tweaked to make it more likely to use s_movrel (not v_movrel) on GFX10+.
>>
>> I will try to get some performance numbers on specific games. Do you know if this performance problem was specific to an architecture?
>> Like Jay said, I could tweak the heuristic for this and only generate it for GFX10+.
>
> All the tests were done on gfx9 flavors. It is also worth noting gfx9 does not have movrel, but uses s_set_gpr_idx_on instead, which may be the difference.
>
> I.e. it may be linked to FeatureMovrel.
I measured the performance with/without the patch on the original game in which we spotted this issue.
It's a small improvement of about 0.5% in the current version compared to without the patch. This was measured on GFX1030.
It seems to be a little bit faster with the patch, I can't really test on gfx9 with this game.
If it's better to keep the initial version for gfx9, I could:
- Only do this patch for GFX10+
- Some flag?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D126389/new/
https://reviews.llvm.org/D126389
More information about the llvm-commits
mailing list