[PATCH] D134423: [AMDGPU] Fix vgpr2sgpr copy analysis to check scalar operands of buffer instructions use scalar registers.

Alexander via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 7 09:08:15 PST 2022


alex-t added a comment.

In D134423#3911947 <https://reviews.llvm.org/D134423#3911947>, @foad wrote:

> In D134423#3911726 <https://reviews.llvm.org/D134423#3911726>, @skc7 wrote:
>
>> %8:sreg_32 = COPY %5:vgpr_32 
>> %7:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %4:vgpr_32, killed %6:sgpr_128, **%8:sreg_32**, 0, 0, 0, 0, implicit $exec ::
>
> I need more context. Is %5 uniform?

I think that I've got an idea behind this patch. Let's say %5 is uniform. Then we've got to try to promote all the %8 descendants to SALU if possible.
In some cases, it appears that such a copy has few or even no SALU descendants, and according to the common algorithm should be converted to VALU.
When the conversion is done, legalizeOperands creates the waterfall loop which is obviously much worse than inserting the v_readfirstlane_b32.
As far as I understand, @skc7 addresses this scenario and aims to avoid an unnecessary waterfall loop.
BTW, if %5 is divergent we have a bug in ISel. We now should not have any V2S copy with the divergent source.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134423/new/

https://reviews.llvm.org/D134423



More information about the llvm-commits mailing list