<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Jun 23, 2021, at 11:23 AM, Nemanja Ivanovic <<a href="mailto:nemanja.i.ibm@gmail.com" class="">nemanja.i.ibm@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta charset="UTF-8" class=""><div dir="ltr" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><div class="">Thank you so much for taking the time to answer Quentin.</div><div class=""><br class=""></div><div class="">The bad copies are definitely added by live range splitting. The issue seems to be the LaneBitmasks for the various subregisters. Honestly, I don't really know what the bits of LaneBitmask produced by TblGen are meant to mean, but I can't make any sense of them. And those seem to lead the register allocator astray.</div><div class="">Here are the LaneBitmasks from the register include file:<br class=""></div><div class=""><span style="font-family: monospace;" class="">static const LaneBitmask SubRegIndexLaneMaskTable[] = {<br class=""> LaneBitmask::getAll(),<br class=""> LaneBitmask(0x0000000000000001), // sub_32<br class=""> LaneBitmask(0x0000000000000002), // sub_64<br class=""> LaneBitmask(0x0000000000000004), // sub_eq<br class=""> LaneBitmask(0x0000000000000001), // sub_gp8_x0<br class=""> LaneBitmask(0x0000000000000200), // sub_gp8_x1<br class=""> LaneBitmask(0x0000000000000008), // sub_gt<br class=""> LaneBitmask(0x0000000000000010), // sub_lt<br class=""> LaneBitmask(0x0000000000000042), // sub_pair0<br class=""> LaneBitmask(0x0000000000000180), // sub_pair1<br class=""> LaneBitmask(0x0000000000000020), // sub_un<br class=""> LaneBitmask(0x0000000000000002), // sub_vsx0<br class=""> LaneBitmask(0x0000000000000040), // sub_vsx1<br class=""> LaneBitmask(0x0000000000000040), // sub_vsx1_then_sub_64<br class=""> LaneBitmask(0x0000000000000080), // sub_pair1_then_sub_64<br class=""> LaneBitmask(0x0000000000000080), // sub_pair1_then_sub_vsx0<br class=""> LaneBitmask(0x0000000000000100), // sub_pair1_then_sub_vsx1<br class=""> LaneBitmask(0x0000000000000100), // sub_pair1_then_sub_vsx1_then_sub_64<br class=""> LaneBitmask(0x0000000000000200), // sub_gp8_x1_then_sub_32<br class=""> };</span></div><div class=""><br class=""></div><div class="">For example, what does it mean that the mask for<span class="Apple-converted-space"> </span><span style="font-family: monospace;" class="">sub_64</span><span class="Apple-converted-space"> </span>and<span class="Apple-converted-space"> </span><span style="font-family: monospace;" class="">sub_vsx0</span><span class="Apple-converted-space"> </span>are the same?</div></div></div></blockquote><div><br class=""></div><div>That just means they overlap. That’s fine (I think!)</div><div><br class=""></div><div>From LaneBitmask.h</div><div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">/// Iff the target has a register such that:</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">///</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">/// getSubReg(Reg, A) overlaps getSubReg(Reg, B)</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">///</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">/// then:</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">///</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">/// (getSubRegIndexLaneMask(A) & getSubRegIndexLaneMask(B)) != 0</span></div></div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><div class=""> The two subregisters certainly do not represent the same lanes in their respective registers. The<span class="Apple-converted-space"> </span><span style="font-family: monospace;" class="">sub_vsx0</span><span class="Apple-converted-space"> </span>subregister is the first VSX register in a VSX register pair. And each of the two subregisters of a VSX register pair (<span style="font-family: monospace;" class="">sub_vsx0</span>,<span class="Apple-converted-space"> </span><span style="font-family: monospace;" class="">sub_vsx1</span>) have their own scalar subregister (<span style="font-family: monospace;" class="">sub_64</span>).<br class=""></div><div class=""><br class=""></div><div class="">I have also attached the output of RA, but it is huge :(</div><div class="">It is the result of specifying options<span class="Apple-converted-space"> </span><span style="font-family: monospace;" class="">-debug-only=regalloc -print-before=greedy -print-after=greedy</span><span class="Apple-converted-space"> </span>on the command line.<br class=""></div></div></div></blockquote><div><br class=""></div><div>Thanks, I’ll try to take a look this week.</div><div>Looking at these lines, I wonder if the issue is not simply that we didn’t pass the right subregindex. I.e., the following code would have been fine with sub_vsx0 instead of sub_64.</div><div><br class=""></div><div><span style="font-family: monospace;" class="">80988B undef %7526.sub_64:vsrprc = COPY %7527.sub_64:vsrprc</span><br style="font-family: monospace;" class=""><span style="font-family: monospace;" class="">84324B undef %7501.sub_64:vsrprc = COPY %7526.sub_64:vsrprc</span><br style="font-family: monospace;" class=""><span style="font-family: monospace;" class="">84328B %5546:vsrc = contract nofpexcept XVMADDADP %5546:vsrc(tied-def 0), %7501.sub_vsx0:vsrprc</span></div><br class=""><blockquote type="cite" class=""><div class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><div class="gmail_quote" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><div dir="ltr" class="gmail_attr">On Tue, Jun 22, 2021 at 3:21 PM Quentin Colombet <<a href="mailto:qcolombet@apple.com" class="">qcolombet@apple.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="overflow-wrap: break-word;" class=""><br class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jun 21, 2021, at 10:05 AM, Nemanja Ivanovic via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class=""><div class=""><div dir="ltr" class="">I am having a really difficult time with subregister related issues when I turn<br class="">on subregister liveness tracking.<br class=""><br class="">Before RA:<br class=""><span style="font-family: monospace;" class="">79760B %2216:vsrc = LXVDSX %5551:g8rc_and_g8rc_nox0, %2215:g8rc :: (load 8 from %ir.scevgep1857.cast, !alias.scope !92, !noalias !93)<br class="">79872B %2225:vsrprc = LXVP 352, %661:g8rc_and_g8rc_nox0<br class="">84328B %5540:vsrc = contract nofpexcept XVMADDADP %5540:vsrc(tied-def 0), %2225.sub_vsx0:vsrprc, %2216:vsrc, implicit $rm<br class=""></span><br class="">After RA (greedy):<br class=""><span style="font-family: monospace;" class="">79744B %2214:vsrc = LXVDSX %5551:g8rc_and_g8rc_nox0, %6477:g8rc :: (load 8 from %ir.scevgep1860.cast, !alias.scope !92, !noalias !93)<br class="">79872B %7503:vsrprc = LXVP 352, %661:g8rc_and_g8rc_nox0<br class="">80248B %7527:vsrprc = COPY %7503:vsrprc<br class="">80988B undef %7526.sub_64:vsrprc = COPY %7527.sub_64:vsrprc<br class="">84324B undef %7501.sub_64:vsrprc = COPY %7526.sub_64:vsrprc<br class="">84328B %5546:vsrc = contract nofpexcept XVMADDADP %5546:vsrc(tied-def 0), %7501.sub_vsx0:vsrprc, %2214:vsrc, implicit $rm<br class=""></span><br class="">Subregister definitions for PPC:<br class=""><span style="font-family: monospace;" class="">def sub_64 : SubRegIndex<64>;<br class="">def sub_vsx0 : SubRegIndex<128>;<br class="">def sub_vsx1 : SubRegIndex<128, 128>;<br class="">def sub_pair0 : SubRegIndex<256>;<br class="">def sub_pair1 : SubRegIndex<256, 256>;<br class=""></span><br class="">So the instruction at 84328B uses the full register %2216 and the high order<br class="">128 bits of (256-bit) register %2225. However, the register allocator splits<br class="">the live range and introduces a copy of the high order 64 bits of that 256-bit<br class="">register, then another copy of that copy and rewrites the use in instruction<br class="">84328B to that copy. The copy is marked undef so the register allocator<br class="">assigns just some random register to the use of that copy in 84328B.<br class=""><br class="">Or maybe I am completely misinterpreting the meaning of the debug dumps<br class="">from the register allocator.<br class=""><br class="">This appears to be related to lane masks and dead lane detection although<br class="">I don't see dead lane detection marking anything unexpected as undef (seems<br class=""><div class="">to just be INSERT_SUBREG and PHI).</div></div></div></blockquote><div class=""><br class=""></div><div class="">Are the copies added by dead lane detection or by live-range splitting?</div><div class=""><br class=""></div><div class="">The undef flag on the definition of %7501 is suspicious and depending on how you look at it, so is the one on %7526. Essentially, we are losing the full copy in this chain of copies and I wonder what is at fault here.</div><div class=""><br class=""></div>Could you share the debug output of regalloc?</div><div class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">If anyone has suggestions on what might be the issue and/or how to go about figuring this out and fixing it, I would really appreciate it.</div><div class=""><br class=""></div><div class="">Nemanja<br class=""></div></div>_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a><br class=""><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank" class="">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br class=""></div></blockquote></div><br class=""></div></blockquote></div><span id="cid:f_kq9t1ajd0"><ra-before-after-debug.txt></span></div></blockquote></div><br class=""></body></html>