[llvm-dev] Problems with subreg-liveness and Greedy RA

Nemanja Ivanovic via llvm-dev llvm-dev at lists.llvm.org
Tue Jul 6 16:58:02 PDT 2021


Hi Quentin,
I am really sorry for the delay. I am working on getting a reproducer for
this and once I have something manageable, I'll open a bug.

Thanks for all your help,
Nemanja

On Thu, Jul 1, 2021 at 1:55 PM Quentin Colombet <qcolombet at apple.com> wrote:

> Hi Nemanja,
>
> Do you have something I could run on my side?
>
> I’d like to see how/when we create %296 and the dump, doesn’t have that
> information:
> ```
> selectOrSplit VSRpRC:%39 [536r,6168B:0)  0 at 536r L0000000000000002
> [536r,6168B:0)  0 at 536r L0000000000000040 [536r,536d:0)  0 at 536r
> weight:7.122666e+05 w=7.122666e+05
> RS_Split Cascade 11
> Analyze counted 3 instrs in 2 blocks, through 0 blocks.
> Cost of isolating all blocks = 2147483648.8
> $vsrp0  no positive bundles
> $vsrp1  no positive bundles
> $vsrp2  no positive bundles
> $vsrp3  no positive bundles
> $vsrp4  no positive bundles
> $vsrp5  no positive bundles
> $vsrp6  no positive bundles
> $vsrp17 no positive bundles
> $vsrp18 no positive bundles
> $vsrp16 no positive bundles
> $vsrp19 no positive bundles
> $vsrp20 no positive bundles
> $vsrp21 no positive bundles
> $vsrp22 no positive bundles
> $vsrp23 no positive bundles
> $vsrp24 no positive bundles
> $vsrp25 no positive bundles
> $vsrp15 no positive bundles
> $vsrp14 no positive bundles
> $vsrp13 no positive bundles
> $vsrp12 no positive bundles
> $vsrp11 no positive bundles
> $vsrp10 no positive bundles
> $vsrp9  no positive bundles
> $vsrp8  no positive bundles
> $vsrp7  no positive bundles
> $vsrp31 no positive bundles
> $vsrp30 no positive bundles
> $vsrp29 no positive bundles
> $vsrp28 no positive bundles
> $vsrp27 no positive bundles
> $vsrp26 no positive bundles
>     enterIntvBefore 5296r: valno 0
>     leaveIntvAfter 5320r: valno 0
>     useIntv [5292r;5320r): [5292r;5320r):1
> Multi-mapped complement 0 at 5316r for parent 0 at 536r hoist to %bb.2 5316r
> Direct complement def at 536r
> Removing 1 back-copies.
> Removing 5316r  undef %296.sub_64:vsrprc = COPY %39.sub_64:vsrprc.  <————
> THIS COPY
>   blit [536r,6168B:0): [536r;5292r)=0(%296)(recalc)
> [5292r;5320r)=1(%297)(recalc) [5320r;6168B)=0(%296)(recalc)
>   rewr %bb.1    536r:0  %296:vsrprc = LXVP 0, %33:g8rc_and_g8rc_nox0 ::
> (load 32 from constant-pool)
>   rewr %bb.2    5320B:1 %290:vsrc = contract nofpexcept XVMADDADP
> %290:vsrc(tied-def 0), %297.sub_vsx0:vsrprc, %31:vsrc, implicit $rm
>   rewr %bb.2    5296B:1 %289:vsrc = contract nofpexcept XVMADDADP
> %289:vsrc(tied-def 0), %297.sub_vsx0:vsrprc, %62:vsrc, implicit $rm
>   rewr %bb.2    5292B:0 undef %297.sub_64:vsrprc = COPY %296.sub_64:vsrprc
> queuing new interval: %296 [536r,6168B:0)  0 at 536r L0000000000000040
> [536r,536d:0)  0 at 536r L0000000000000002 [536r,6168B:1)  0 at x 1 at 536r
> weight:3.596946e+05
> queuing new interval: %297 [5292r,5320r:0)  0 at 5292r L0000000000000002
> [5292r,5320r:0)  0 at 5292r weight:1.520298e+07
> ```
>
> Ideally, if you can file a bug, that would make the tracking simpler.
>
> Cheers,
> -Quentin
>
> On Jun 23, 2021, at 12:18 PM, Nemanja Ivanovic <nemanja.i.ibm at gmail.com>
> wrote:
>
> Sorry, it would appear that the dev list stripped my attachment. I have
> reduced the file using bugpoint and produced the output from that.
> Attaching it here.
>
> Nemanja
>
> On Wed, Jun 23, 2021 at 2:23 PM Nemanja Ivanovic <nemanja.i.ibm at gmail.com>
> wrote:
>
>> Thank you so much for taking the time to answer Quentin.
>>
>> The bad copies are definitely added by live range splitting. The issue
>> seems to be the LaneBitmasks for the various subregisters. Honestly, I
>> don't really know what the bits of LaneBitmask produced by TblGen are meant
>> to mean, but I can't make any sense of them. And those seem to lead the
>> register allocator astray.
>> Here are the LaneBitmasks from the register include file:
>> static const LaneBitmask SubRegIndexLaneMaskTable[] = {
>>   LaneBitmask::getAll(),
>>   LaneBitmask(0x0000000000000001), // sub_32
>>   LaneBitmask(0x0000000000000002), // sub_64
>>   LaneBitmask(0x0000000000000004), // sub_eq
>>   LaneBitmask(0x0000000000000001), // sub_gp8_x0
>>   LaneBitmask(0x0000000000000200), // sub_gp8_x1
>>   LaneBitmask(0x0000000000000008), // sub_gt
>>   LaneBitmask(0x0000000000000010), // sub_lt
>>   LaneBitmask(0x0000000000000042), // sub_pair0
>>   LaneBitmask(0x0000000000000180), // sub_pair1
>>   LaneBitmask(0x0000000000000020), // sub_un
>>   LaneBitmask(0x0000000000000002), // sub_vsx0
>>   LaneBitmask(0x0000000000000040), // sub_vsx1
>>   LaneBitmask(0x0000000000000040), // sub_vsx1_then_sub_64
>>   LaneBitmask(0x0000000000000080), // sub_pair1_then_sub_64
>>   LaneBitmask(0x0000000000000080), // sub_pair1_then_sub_vsx0
>>   LaneBitmask(0x0000000000000100), // sub_pair1_then_sub_vsx1
>>   LaneBitmask(0x0000000000000100), // sub_pair1_then_sub_vsx1_then_sub_64
>>   LaneBitmask(0x0000000000000200), // sub_gp8_x1_then_sub_32
>>  };
>>
>> For example, what does it mean that the mask for sub_64 and sub_vsx0 are
>> the same? The two subregisters certainly do not represent the same lanes in
>> their respective registers. The sub_vsx0 subregister is the first VSX
>> register in a VSX register pair. And each of the two subregisters of a VSX
>> register pair (sub_vsx0, sub_vsx1) have their own scalar subregister (
>> sub_64).
>>
>> I have also attached the output of RA, but it is huge :(
>> It is the result of specifying options -debug-only=regalloc
>> -print-before=greedy -print-after=greedy on the command line.
>>
>> On Tue, Jun 22, 2021 at 3:21 PM Quentin Colombet <qcolombet at apple.com>
>> wrote:
>>
>>>
>>>
>>> On Jun 21, 2021, at 10:05 AM, Nemanja Ivanovic via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>> I am having a really difficult time with subregister related issues when
>>> I turn
>>> on subregister liveness tracking.
>>>
>>> Before RA:
>>> 79760B    %2216:vsrc = LXVDSX %5551:g8rc_and_g8rc_nox0, %2215:g8rc ::
>>> (load 8 from %ir.scevgep1857.cast, !alias.scope !92, !noalias !93)
>>> 79872B    %2225:vsrprc = LXVP 352, %661:g8rc_and_g8rc_nox0
>>> 84328B    %5540:vsrc = contract nofpexcept XVMADDADP %5540:vsrc(tied-def
>>> 0), %2225.sub_vsx0:vsrprc, %2216:vsrc, implicit $rm
>>>
>>> After RA (greedy):
>>> 79744B    %2214:vsrc = LXVDSX %5551:g8rc_and_g8rc_nox0, %6477:g8rc ::
>>> (load 8 from %ir.scevgep1860.cast, !alias.scope !92, !noalias !93)
>>> 79872B    %7503:vsrprc = LXVP 352, %661:g8rc_and_g8rc_nox0
>>> 80248B    %7527:vsrprc = COPY %7503:vsrprc
>>> 80988B    undef %7526.sub_64:vsrprc = COPY %7527.sub_64:vsrprc
>>> 84324B    undef %7501.sub_64:vsrprc = COPY %7526.sub_64:vsrprc
>>> 84328B    %5546:vsrc = contract nofpexcept XVMADDADP %5546:vsrc(tied-def
>>> 0), %7501.sub_vsx0:vsrprc, %2214:vsrc, implicit $rm
>>>
>>> Subregister definitions for PPC:
>>> def sub_64 : SubRegIndex<64>;
>>> def sub_vsx0 : SubRegIndex<128>;
>>> def sub_vsx1 : SubRegIndex<128, 128>;
>>> def sub_pair0 : SubRegIndex<256>;
>>> def sub_pair1 : SubRegIndex<256, 256>;
>>>
>>> So the instruction at 84328B uses the full register %2216 and the high
>>> order
>>> 128 bits of (256-bit) register %2225. However, the register allocator
>>> splits
>>> the live range and introduces a copy of the high order 64 bits of that
>>> 256-bit
>>> register, then another copy of that copy and rewrites the use in
>>> instruction
>>> 84328B to that copy. The copy is marked undef so the register allocator
>>> assigns just some random register to the use of that copy in 84328B.
>>>
>>> Or maybe I am completely misinterpreting the meaning of the debug dumps
>>> from the register allocator.
>>>
>>> This appears to be related to lane masks and dead lane detection although
>>> I don't see dead lane detection marking anything unexpected as undef
>>> (seems
>>> to just be INSERT_SUBREG and PHI).
>>>
>>>
>>> Are the copies added by dead lane detection or by live-range splitting?
>>>
>>> The undef flag on the definition of %7501 is suspicious and depending on
>>> how you look at it, so is the one on %7526. Essentially, we are losing the
>>> full copy in this chain of copies and I wonder what is at fault here.
>>>
>>> Could you share the debug output of regalloc?
>>>
>>>
>>> If anyone has suggestions on what might be the issue and/or how to go
>>> about figuring this out and fixing it, I would really appreciate it.
>>>
>>> Nemanja
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>> <ra-before-after-debug.txt>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210706/8720aada/attachment.html>


More information about the llvm-dev mailing list