<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div style="margin: 0;">Hi,</div><div style="margin: 0;"><br></div><div style="margin: 0;">After I lowered "store vpr" into "move vpr to vr, store vr" at the SelectionDAG Legalize step, I met another problem.</div><div style="margin: 0;">I expanded the pseudo instruction "move vpr to vr" in <span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">XXXTargetLowering::EmitInstrWithCustomInserter().</span></div><div style="margin: 0;"><span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">But the instructions after expansion don't meet the SSA requirement.</span></div><div style="margin: 0;"><font style="background-color: rgb(255, 255, 255);">Before expansion:</font></div><div style="margin: 0;"><font style="background-color: rgb(255, 255, 255);">  %4:vprregs = VSLT_u16 killed %2:vregs, killed %3:vregs   // vector set when less than, element type is 16-bit unsigned integer, %2 and %3 are vector registers, %4 is vector predicate register<br>  %5:vregs = PseudoMoveVPR2VR_e32 killed %4:vprregs  // move vpr to vr, 32 elements total<br>  VSTORE killed %5:vregs, %stack.3.v, 0 :: (store 64 into %ir.v, align 128) // store vr into stack<br><div>After expansion:</div><div>112B      %4:vprregs = VSLT_u16 killed %2:vregs, killed %3:vregs<br>128B      %14:vregs = VCLR_VR    // clear vr<br>144B      %19:gregs = MOVi32 65537 // move 0x10001(compare bit mask) to scalar register<br>160B      %15:vregs = MOVR2VR_DUP %19:gregs  // duplicate content of scalar into vector register<br>176B      %16:vregs, %4:vprregs = V_ADD_t_u16 %14:vregs, %15:vregs, %16:vregs(tied-def 0), %4:vprregs(tied-def 1) // conditional vector add, do element add if element compare bit is set, %16 = %14 + %15, it reads vpr compare bits,                                                                                                                                                                                                    //and update vpr carry bits <br>192B      %20:gregs = MOVi32 131074   // move imm 0x20002(carry bit mask) to scalar register                         <br>208B      %17:vregs = MOVR2VR_DUP %20:gregs  // <span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">duplicate content of scalar into vector register</span><br>224B      %18:vregs, %4:vprregs = V_ADD_c_u16 %14:vregs, %17:vregs, %18:vregs(tied-def 0), %4:vprregs(tied-def 1)  <span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">// conditional vector add, do element add if element carry bit is set, %18 = %14 + %17, it reads vpr carry bits,                                                                                                                                                                                                               //and update vpr carry bits </span><br>240B      %5:vregs = V_OR_a_u16 %16:vregs, %18:vregs</div><div><br></div><div>The instruction definition of <span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">V_ADD_t_u16 has the vpr register in ins and outs in td file, and there is a constraint that the two vpr register in ins and outs should be same.</span></div><div><span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">llc will crash after expansion.</span></div><div><span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;"><br></span></div><div><span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;"> ********** PROCESS IMPLICIT DEFS **********<br>********** Function: test<br>llc: /home/jerry/Develop/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:404: llvm::MachineInstr* llvm::MachineRegisterInfo::getVRegDef(llvm::Register) const: Assertion `(I.atEnd() || std::next(I) == def_instr_end()) && "getVRegDef assumes a single definition or no definition"' failed.<br>Stack dump:<br>0.      Program arguments: llc -march=dtu -mcpu=x -debug dtu-vcc-u16.ll<br>1.      Running pass 'Function Pass Manager' on module 'dtu-vcc-u16.ll'.<br>2.      Running pass 'Live Variable Analysis' on function '@test'<br> #0 0x00007efcc508c4e1 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/jerry/Develop/llvm-project/llvm/lib/Support/Unix/Signals.inc:564:0<br> #1 0x00007efcc508c574 PrintStackTraceSignalHandler(void*) /home/jerry/Develop/llvm-project/llvm/lib/Support/Unix/Signals.inc:625:0<br> #2 0x00007efcc508a2fc llvm::sys::RunSignalHandlers() /home/jerry/Develop/llvm-project/llvm/lib/Support/Signals.cpp:68:0<br> #3 0x00007efcc508be5b SignalHandler(int) /home/jerry/Develop/llvm-project/llvm/lib/Support/Unix/Signals.inc:406:0<br> #4 0x00007efcc37484b0 (/lib/x86_64-linux-gnu/libc.so.6+0x354b0)<br> #5 0x00007efcc3748428 raise /build/glibc-Cl5G7W/glibc-2.23/signal/../sysdeps/unix/sysv/linux/raise.c:54:0<br> #6 0x00007efcc374a02a abort /build/glibc-Cl5G7W/glibc-2.23/stdlib/abort.c:91:0<br> #7 0x00007efcc3740bd7 __assert_fail_base /build/glibc-Cl5G7W/glibc-2.23/assert/assert.c:92:0<br> #8 0x00007efcc3740c82 (/lib/x86_64-linux-gnu/libc.so.6+0x2dc82)<br> #9 0x00007efcc88e04b0 llvm::MachineRegisterInfo::getVRegDef(llvm::Register) const /home/jerry/Develop/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:403:0<br>#10 0x00007efcc8747235 llvm::LiveVariables::HandleVirtRegUse(unsigned int, llvm::MachineBasicBlock*, llvm::MachineInstr&) /home/jerry/Develop/llvm-project/llvm/lib/CodeGen/LiveVariables.cpp:133:0<br>#11 0x00007efcc87498b4 llvm::LiveVariables::runOnInstr(llvm::MachineInstr&, llvm::SmallVectorImpl<unsigned int>&) /home/jerry/Develop/llvm-project/llvm/lib/CodeGen/LiveVariables.cpp:544:0<br>#12 0x00007efcc8749d53 llvm::LiveVariables::runOnBlock(llvm::MachineBasicBlock*, unsigned int) /home/jerry/Develop/llvm-project/llvm/lib/CodeGen/LiveVariables.cpp:581:0<br>#13 0x00007efcc874a3fe llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&) /home/jerry/Develop/llvm-project/llvm/lib/CodeGen/LiveVariables.cpp:649:0<br>#14 0x00007efcc8817b8c llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/jerry/Develop/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:73:0<br>#15 0x00007efcc78cca01 llvm::FPPassManager::runOnFunction(llvm::Function&) /home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1482:0<br>#16 0x00007efcc78ccc9b llvm::FPPassManager::runOnModule(llvm::Module&) /home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1518:0<br>#17 0x00007efcc78cd0cf (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1583:0<br>#18 0x00007efcc78cd88b llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1695:0<br>#19 0x00007efcc78cda9b llvm::legacy::PassManager::run(llvm::Module&) /home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1727:0<br>#20 0x0000000000445ba9 compileModule(char**, llvm::LLVMContext&) /home/jerry/Develop/llvm-project/llvm/tools/llc/llc.cpp:620:0<br>#21 0x0000000000444064 main /home/jerry/Develop/llvm-project/llvm/tools/llc/llc.cpp:356:0<br>#22 0x00007efcc3733830 __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:325:0<br>#23 0x0000000000441bf9 _start (/home/jerry/Develop/llvm-project/build/bin/llc+0x441bf9)<br><div>Aborted (core dumped)</div><div><br></div><div>I think the reason is that there are three definitions of %4.</div><div>Is there a method to work around this? What should I do?</div><div><br></div><div><br></div><div><br></div><div>Thanks,</div><div>Jerry</div></span><br><br></div></font></div><p style="margin: 0;"><b></b><i></i><u></u><sub></sub><sup></sup><strike></strike><font style="background-color: rgb(255, 255, 255);"></font><font style="background-color: rgb(255, 255, 255);"></font><br></p><p style="margin: 0;"><br></p><p style="margin: 0;"><br></p><p style="margin: 0;"><br></p><div style="position:relative;zoom:1"></div><div id="divNeteaseMailCard"></div><p style="margin: 0;"><br></p><p>�� 2020-06-27 01:11:07��"Hal Finkel" <hfinkel@anl.gov> д����</p><blockquote id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">


    <p><br>

    </p>

    <div class="moz-cite-prefix">On 6/26/20 1:58 AM, ������ wrote:<br>

    </div>

    <blockquote cite="mid:8aa594c.13e9.172ef6c1b4c.Coremail.jackie_linzz@126.com" type="cite">

      
      <div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">

        <div style="margin: 0;">Hi,</div>

        <div style="margin: 0;"><br>

        </div>

        <div style="margin: 0;">I am planning to expanding the pseudo

          instructions in

          XXXTargetLowering::EmitInstrWithCustomInserter(), and use

          temporary virtual registers as operands.</div>

        <div style="margin: 0;">If I use virtual registers, do I need to

          mark them as "early clobber"?</div>

      </div>

    </blockquote>

    <p><br>

    </p>

    <p>If I have an instruction XYZ, and it takes an input register VI,

      and an output register VO, such that the instruction:</p>

    <p>  VO = XYZ VI</p>

    <p>reads VI and computes VO, and if the value in VI is no longer

      needed after this instruction (or was undef in the first place),

      then the register allocator might assign the same physical

      register to both VI and VO. You might end up with:</p>

    <p>RA = XYZ RA.</p>

    <p>If XYZ is really a pseudo instruction, this might not be

      acceptable. You might need two distinct registers just because of

      how the expansion works. For example, maybe this expands to:</p>

      VO = OP1 VI<br>

      VO = OP2 VO, VI<br>

    <p>note that, in this case, the expansion needs VI in two different

      places. If VO and VI are assigned to be the same register, the

      expansion just won't work correctly. In this case, you need

      earlyclobber on your pseudo-instruction.<br>

    </p>

    <p><br>

    </p>

    <blockquote cite="mid:8aa594c.13e9.172ef6c1b4c.Coremail.jackie_linzz@126.com" type="cite">

      <div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">

        <div style="margin: 0;">I saw that sometimes they marked virtual

          register as "early clobber" in <span style="display: inline

            !important; float: none; background-color: rgb(255, 255,

            255); color: rgb(0, 0, 0); font-family: Arial; font-size:

            14px; font-style: normal; font-variant: normal; font-weight:

            400; letter-spacing: normal; orphans: 2; text-align: left;

            text-decoration: none; text-indent: 0px; text-transform:

            none; -webkit-text-stroke-width: 0px; white-space: normal;

            word-spacing: 0px;">EmitInstrWithCustomInserter() in </span>MIPS

          backend.</div>

        <div style="margin: 0;">What is the effect of marking a virtual

          register as "early clobber" before RA?</div>

      </div>

    </blockquote>

    <p><br>

    </p>

    <p>I don't recall any effect.</p>

    <p> -Hal<br>

    </p>

    <p><br>

    </p>

    <blockquote cite="mid:8aa594c.13e9.172ef6c1b4c.Coremail.jackie_linzz@126.com" type="cite">

      <div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">

        <p style="margin: 0;"><br>

        </p>

        <div style="margin: 0;">Thanks,</div>

        <div style="margin: 0;">Jerry</div>

        <p style="margin: 0;"><br>

        </p>

        <p style="margin: 0;"><br>

        </p>

        <p style="margin: 0;"><br>

        </p>

        <p>�� 2020-06-25 20:29:30��"Hal Finkel" <a class="moz-txt-link-rfc2396E" href="mailto:hfinkel@anl.gov"><hfinkel@anl.gov></a>

          д����</p>

        <blockquote id="isReplyContent" style="PADDING-LEFT: 1ex;

          MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">

          <div class="moz-cite-prefix">On 6/25/20 1:11 AM, ������ via

            llvm-dev wrote:<br>

          </div>

          <blockquote cite="mid:28a315d6.11eb.172ea1a6f4f.Coremail.jackie_linzz@126.com" type="cite">

            <div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">

              <div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">

                <div style="margin:0;">Hi, there</div>

                <div style="margin:0;">I am writing an backend, and I

                  met a problem.</div>

                <div style="margin:0;">We don't have load/store

                  instructions for vector predicate registers(vpr for

                  short). </div>

                <div style="margin:0;">The hardware has 64 vector

                  registers(vr for short) and 8 vector predicate

                  registers. And there is no move instructions between

                  vr and vpr.</div>

                <div style="margin:0;">vr supports many operations, and

                  vpr supports vpror, vprxor, vprand and vprinv

                  operations.</div>

                <div style="margin:0;"> A vr has 512 bits, and a vpr has

                  128 bits. vr is used for v16i32, v32i16, v64i8. And a

                  scalar register has 32 bits.</div>

                <div style="margin:0;">If we compare or add two v16i32,

                  a element in vpr has 8 bits. If we compare or add two

                  v64i8, then a element in vpr has 2 bits(one bit for

                  compare flag and one bit for carry flag). </div>

                <div style="margin:0;">A element in vpr contains carry

                  flag and compare flag.</div>

                <div style="margin:0;"> We have defined registers and a

                  new type(vpr) for vector predicate registers in

                  backend.</div>

                <div style="margin:0;">Although there is no direct

                  instruction to move vpr to vr or to move vr to vpr,

                  there is a method to work around this. And we have

                  load/store instructions for vr.</div>

                <div style="margin:0;">move vpr to vr for v32i16 (from

                  vpr0 to vr1):</div>

                <div style="margin:0;">1    vclr    vr0   // clear vr0</div>

                <div style="margin:0;">2    ldi    r5, 0x00010001  //

                  load immediate (compare bit mask for v32i16) to scalar

                  register r5</div>

                <div style="margin:0;">3    movr2vr.dup    vr2, r5  //

                  duplicate content in r5 into vr2, </div>

                <div style="margin:0;">4    vadd.t.s16    vr1, vr0, vr2,

                  vpr0  //vector add if element compare bit is set,

                  element type is 16 bit signed integer, now we have

                  moved compare bits from vpr0 to vr1</div>

                <div style="margin:0;">5    ldi    r5, 0x00020002  //

                  load immediate (carry bit mask for v32i16) to scalar

                  register r5</div>

                <div style="margin:0;">6    movr2vr.dup   vr2, r5  //

                  duplicate content in r5 into vr2</div>

                <div style="margin:0;">7    vadd.c.s16    vr1, vr1, vr2,

                  vpr0 // vr1 = vr1 + vr2, vector add if element carry

                  bit is set, element type is 16 bit signed integer, now

                  we moved carry bits from vpr0 to vr1 too.</div>

                <div style="margin:0;"><br>

                </div>

                <div style="margin:0;">mov vr to vpr for v32i16 (from

                  vr1 to vpr0):</div>

                <div style="margin:0;">8    vclr    vr0  // clear vr0</div>

                <div style="margin:0;">9    ldi    r5, 0x00010001 //

                  load immediate (compare bit mask for v32i16) to r5</div>

                <div style="margin:0;">10  movr2vr.dup    vr2, r5 //

                  duplicate content of r5 into vr2</div>

                <div style="margin:0;">11  vand.u16    vr2, vr1, vr2  //

                  vector and, element type is 16 bit unsigned integer,

                  vr2 = vr1 & vr2, now we have moved compare bits

                  from vr1 to vr2 now</div>

                <div style="margin:0;">12  vslt.s16    vpr0, vr0, vr2 

                  // vector set when less than, element type is 16 bit

                  signed integer, now we have moved compare bits from

                  vr1 to vpr0</div>

                <div style="margin:0;">13  ldi    r5, 0x00020002 // load

                  immediate (carry bit mask for v32i16) to r5</div>

                <div style="margin:0;">14  movr2vr.dup    vr2, r5  //

                  duplicate content of r5 into vr2</div>

                <div style="margin:0;">15  vand.u16    vr2, vr1, vr2  //

                  vector and for element type 16 bit unsigned integer,

                  vr2 has carry bits now</div>

                <div style="margin:0;">16  ldi    r5, 0x7FFF7FFF  // max

                  number for 16 bit signed integer</div>

                <div style="margin:0;">17  movr2vr.dup    vr3, r5  //

                  duplicate r5 into vr3</div>

                <div style="margin:0;">18  vadd.s16  vr1, vr2, vr3,

                  vpr0  // vpr0 has carry bits set now</div>

                <div style="margin:0;"><br>

                </div>

                <div style="margin:0;">Each vector type has a different

                  instruction sequence, because the bit mask and element

                  type is different.</div>

                <div style="margin:0;">I have tried to lower load/store

                  for vpr in XXXISelLowering.cpp. But there is no

                  guarantee that line 12 and line 18 would assign the

                  same register for vpr0. vpr0 in line18 is an output

                  and is not an input.</div>

                <div style="margin:0;">And vpr0 in line 12 and line 18

                  is parallel in SelectionDAG graph. They are both

                  output.</div>

                <div style="margin:0;">I think I would try to define

                  three pseudo instructions for three vector type, and

                  expand the pseudo instruction into instruction

                  sequence before register allocation at next step. But

                  I'm not sure it will work.</div>

                <div style="margin:0;">What should I do? <br>

                </div>

              </div>

            </div>

          </blockquote>

          <p><br>

          </p>

          <p>This somewhat depends on how you're modeling things, but a

            late-expanded pseud-instructions seems like a workable

            approach. If the pseudo-instruction needs temporary

            registers (and it looks like it does), then the

            pseudo-instruction should take them as register operands (so

            that RA will allocate them for you and you don't need to

            worry about scavenging them later). You might, however, need

            to mark such operands as "early clobber" to prevent RA  from

            assigning the same register as an input and output

            (sometimes, depending on how the expanded code uses the

            registers, this is necessary).</p>

          <p> -Hal<br>

          </p>

          <p><br>

          </p>

          <blockquote cite="mid:28a315d6.11eb.172ea1a6f4f.Coremail.jackie_linzz@126.com" type="cite">

            <div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">

              <div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">

                <div style="margin:0;"><br>

                </div>

                <div style="margin:0;">Thanks and best regards,</div>

                <div style="margin:0;">Jerry</div>

                <div style="margin:0;"><br>

                </div>

                <div style="margin:0;"><br>

                </div>

                <div style="margin:0;"><br>

                </div>

                <div style="margin:0;"><br>

                </div>

                <div style="margin:0;"><br>

                </div>

                <div style="margin:0;"> </div>

                <div style="margin:0;"><br>

                </div>

              </div>

            </div>

            <br>

            <br>

            <span title="neteasefooter">

              <p> </p>

            </span><br>

            <fieldset class="mimeAttachmentHeader"></fieldset>

            <pre class="moz-quote-pre" wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

          </blockquote>

          <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

        </blockquote>

      </div>

      <br>

      <br>

      <span title="neteasefooter">

        <p> </p>

      </span>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  
</blockquote></div><br><br><span title="neteasefooter"><p> </p></span>