<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div style="margin: 0;">Hi,</div><div style="margin: 0;"><br></div><div style="margin: 0;">After I lowered "store vpr" into "move vpr to vr, store vr" at the SelectionDAG Legalize step, I met another problem.</div><div style="margin: 0;">I expanded the pseudo instruction "move vpr to vr" in <span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">XXXTargetLowering::EmitInstrWithCustomInserter().</span></div><div style="margin: 0;"><span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">But the instructions after expansion don't meet the SSA requirement.</span></div><div style="margin: 0;"><font style="background-color: rgb(255, 255, 255);">Before expansion:</font></div><div style="margin: 0;"><font style="background-color: rgb(255, 255, 255);">  %4:vprregs = VSLT_u16 killed %2:vregs, killed %3:vregs   // vector set when less than, element type is 16-bit unsigned integer, %2 and %3 are vector registers, %4 is vector predicate register<br>  %5:vregs = PseudoMoveVPR2VR_e32 killed %4:vprregs  // move vpr to vr, 32 elements total<br>  VSTORE killed %5:vregs, %stack.3.v, 0 :: (store 64 into %ir.v, align 128) // store vr into stack<br><div>After expansion:</div><div>112B      %4:vprregs = VSLT_u16 killed %2:vregs, killed %3:vregs<br>128B      %14:vregs = VCLR_VR    // clear vr<br>144B      %19:gregs = MOVi32 65537 // move 0x10001(compare bit mask) to scalar register<br>160B      %15:vregs = MOVR2VR_DUP %19:gregs  // duplicate content of scalar into vector register<br>176B      %16:vregs, %4:vprregs = V_ADD_t_u16 %14:vregs, %15:vregs, %16:vregs(tied-def 0), %4:vprregs(tied-def 1) // conditional vector add, do element add if element compare bit is set, %16 = %14 + %15, it reads vpr compare bits,                                                                                                                                                                                                    //and update vpr carry bits <br>192B      %20:gregs = MOVi32 131074   // move imm 0x20002(carry bit mask) to scalar register                         <br>208B      %17:vregs = MOVR2VR_DUP %20:gregs  // <span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">duplicate content of scalar into vector register</span><br>224B      %18:vregs, %4:vprregs = V_ADD_c_u16 %14:vregs, %17:vregs, %18:vregs(tied-def 0), %4:vprregs(tied-def 1)  <span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">// conditional vector add, do element add if element carry bit is set, %18 = %14 + %17, it reads vpr carry bits,                                                                                                                                                                                                               //and update vpr carry bits </span><br>240B      %5:vregs = V_OR_a_u16 %16:vregs, %18:vregs</div><div><br></div><div>The instruction definition of <span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">V_ADD_t_u16 has the vpr register in ins and outs in td file, and there is a constraint that the two vpr register in ins and outs should be same.</span></div><div><span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">llc will crash after expansion.</span></div><div><span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;"><br></span></div><div><span style="display: inline !important; float: none; background-color: rgb(255, 255, 255); color: rgb(0, 0, 0); font-family: Arial; font-size: 14px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;"> ********** PROCESS IMPLICIT DEFS **********<br>********** Function: test<br>llc: /home/jerry/Develop/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:404: llvm::MachineInstr* llvm::MachineRegisterInfo::getVRegDef(llvm::Register) const: Assertion `(I.atEnd() || std::next(I) == def_instr_end()) && "getVRegDef assumes a single definition or no definition"' failed.<br>Stack dump:<br>0.      Program arguments: llc -march=dtu -mcpu=x -debug dtu-vcc-u16.ll<br>1.      Running pass 'Function Pass Manager' on module 'dtu-vcc-u16.ll'.<br>2.      Running pass 'Live Variable Analysis' on function '@test'<br> #0 0x00007efcc508c4e1 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/jerry/Develop/llvm-project/llvm/lib/Support/Unix/Signals.inc:564:0<br> #1 0x00007efcc508c574 PrintStackTraceSignalHandler(void*) /home/jerry/Develop/llvm-project/llvm/lib/Support/Unix/Signals.inc:625:0<br> #2 0x00007efcc508a2fc llvm::sys::RunSignalHandlers() /home/jerry/Develop/llvm-project/llvm/lib/Support/Signals.cpp:68:0<br> #3 0x00007efcc508be5b SignalHandler(int) /home/jerry/Develop/llvm-project/llvm/lib/Support/Unix/Signals.inc:406:0<br> #4 0x00007efcc37484b0 (/lib/x86_64-linux-gnu/libc.so.6+0x354b0)<br> #5 0x00007efcc3748428 raise /build/glibc-Cl5G7W/glibc-2.23/signal/../sysdeps/unix/sysv/linux/raise.c:54:0<br> #6 0x00007efcc374a02a abort /build/glibc-Cl5G7W/glibc-2.23/stdlib/abort.c:91:0<br> #7 0x00007efcc3740bd7 __assert_fail_base /build/glibc-Cl5G7W/glibc-2.23/assert/assert.c:92:0<br> #8 0x00007efcc3740c82 (/lib/x86_64-linux-gnu/libc.so.6+0x2dc82)<br> #9 0x00007efcc88e04b0 llvm::MachineRegisterInfo::getVRegDef(llvm::Register) const /home/jerry/Develop/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:403:0<br>#10 0x00007efcc8747235 llvm::LiveVariables::HandleVirtRegUse(unsigned int, llvm::MachineBasicBlock*, llvm::MachineInstr&) /home/jerry/Develop/llvm-project/llvm/lib/CodeGen/LiveVariables.cpp:133:0<br>#11 0x00007efcc87498b4 llvm::LiveVariables::runOnInstr(llvm::MachineInstr&, llvm::SmallVectorImpl<unsigned int>&) /home/jerry/Develop/llvm-project/llvm/lib/CodeGen/LiveVariables.cpp:544:0<br>#12 0x00007efcc8749d53 llvm::LiveVariables::runOnBlock(llvm::MachineBasicBlock*, unsigned int) /home/jerry/Develop/llvm-project/llvm/lib/CodeGen/LiveVariables.cpp:581:0<br>#13 0x00007efcc874a3fe llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&) /home/jerry/Develop/llvm-project/llvm/lib/CodeGen/LiveVariables.cpp:649:0<br>#14 0x00007efcc8817b8c llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/jerry/Develop/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:73:0<br>#15 0x00007efcc78cca01 llvm::FPPassManager::runOnFunction(llvm::Function&) /home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1482:0<br>#16 0x00007efcc78ccc9b llvm::FPPassManager::runOnModule(llvm::Module&) /home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1518:0<br>#17 0x00007efcc78cd0cf (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1583:0<br>#18 0x00007efcc78cd88b llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1695:0<br>#19 0x00007efcc78cda9b llvm::legacy::PassManager::run(llvm::Module&) /home/jerry/Develop/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1727:0<br>#20 0x0000000000445ba9 compileModule(char**, llvm::LLVMContext&) /home/jerry/Develop/llvm-project/llvm/tools/llc/llc.cpp:620:0<br>#21 0x0000000000444064 main /home/jerry/Develop/llvm-project/llvm/tools/llc/llc.cpp:356:0<br>#22 0x00007efcc3733830 __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:325:0<br>#23 0x0000000000441bf9 _start (/home/jerry/Develop/llvm-project/build/bin/llc+0x441bf9)<br><div>Aborted (core dumped)</div><div><br></div><div>I think the reason is that there are three definitions of %4.</div><div>Is there a method to work around this? What should I do?</div><div><br></div><div><br></div><div><br></div><div>Thanks,</div><div>Jerry</div></span><br><br></div></font></div><p style="margin: 0;"><b></b><i></i><u></u><sub></sub><sup></sup><strike></strike><font style="background-color: rgb(255, 255, 255);"></font><font style="background-color: rgb(255, 255, 255);"></font><br></p><p style="margin: 0;"><br></p><p style="margin: 0;"><br></p><p style="margin: 0;"><br></p><div style="position:relative;zoom:1"></div><div id="divNeteaseMailCard"></div><p style="margin: 0;"><br></p><p>在 2020-06-27 01:11:07,"Hal Finkel" <hfinkel@anl.gov> 写道:</p><blockquote id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">

  
  
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 6/26/20 1:58 AM, 林政宗 wrote:<br>
    </div>
    <blockquote cite="mid:8aa594c.13e9.172ef6c1b4c.Coremail.jackie_linzz@126.com" type="cite">
      
      <div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">
        <div style="margin: 0;">Hi,</div>
        <div style="margin: 0;"><br>
        </div>
        <div style="margin: 0;">I am planning to expanding the pseudo
          instructions in
          XXXTargetLowering::EmitInstrWithCustomInserter(), and use
          temporary virtual registers as operands.</div>
        <div style="margin: 0;">If I use virtual registers, do I need to
          mark them as "early clobber"?</div>
      </div>
    </blockquote>
    <p><br>
    </p>
    <p>If I have an instruction XYZ, and it takes an input register VI,
      and an output register VO, such that the instruction:</p>
    <p>  VO = XYZ VI</p>
    <p>reads VI and computes VO, and if the value in VI is no longer
      needed after this instruction (or was undef in the first place),
      then the register allocator might assign the same physical
      register to both VI and VO. You might end up with:</p>
    <p>RA = XYZ RA.</p>
    <p>If XYZ is really a pseudo instruction, this might not be
      acceptable. You might need two distinct registers just because of
      how the expansion works. For example, maybe this expands to:</p>
      VO = OP1 VI<br>
      VO = OP2 VO, VI<br>
    <p>note that, in this case, the expansion needs VI in two different
      places. If VO and VI are assigned to be the same register, the
      expansion just won't work correctly. In this case, you need
      earlyclobber on your pseudo-instruction.<br>
    </p>
    <p><br>
    </p>
    <blockquote cite="mid:8aa594c.13e9.172ef6c1b4c.Coremail.jackie_linzz@126.com" type="cite">
      <div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">
        <div style="margin: 0;">I saw that sometimes they marked virtual
          register as "early clobber" in <span style="display: inline
            !important; float: none; background-color: rgb(255, 255,
            255); color: rgb(0, 0, 0); font-family: Arial; font-size:
            14px; font-style: normal; font-variant: normal; font-weight:
            400; letter-spacing: normal; orphans: 2; text-align: left;
            text-decoration: none; text-indent: 0px; text-transform:
            none; -webkit-text-stroke-width: 0px; white-space: normal;
            word-spacing: 0px;">EmitInstrWithCustomInserter() in </span>MIPS
          backend.</div>
        <div style="margin: 0;">What is the effect of marking a virtual
          register as "early clobber" before RA?</div>
      </div>
    </blockquote>
    <p><br>
    </p>
    <p>I don't recall any effect.</p>
    <p> -Hal<br>
    </p>
    <p><br>
    </p>
    <blockquote cite="mid:8aa594c.13e9.172ef6c1b4c.Coremail.jackie_linzz@126.com" type="cite">
      <div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">
        <p style="margin: 0;"><br>
        </p>
        <div style="margin: 0;">Thanks,</div>
        <div style="margin: 0;">Jerry</div>
        <p style="margin: 0;"><br>
        </p>
        <p style="margin: 0;"><br>
        </p>
        <p style="margin: 0;"><br>
        </p>
        <p>在 2020-06-25 20:29:30,"Hal Finkel" <a class="moz-txt-link-rfc2396E" href="mailto:hfinkel@anl.gov"><hfinkel@anl.gov></a>
          写道:</p>
        <blockquote id="isReplyContent" style="PADDING-LEFT: 1ex;
          MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">
          <div class="moz-cite-prefix">On 6/25/20 1:11 AM, 林政宗 via
            llvm-dev wrote:<br>
          </div>
          <blockquote cite="mid:28a315d6.11eb.172ea1a6f4f.Coremail.jackie_linzz@126.com" type="cite">
            <div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">
              <div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">
                <div style="margin:0;">Hi, there</div>
                <div style="margin:0;">I am writing an backend, and I
                  met a problem.</div>
                <div style="margin:0;">We don't have load/store
                  instructions for vector predicate registers(vpr for
                  short). </div>
                <div style="margin:0;">The hardware has 64 vector
                  registers(vr for short) and 8 vector predicate
                  registers. And there is no move instructions between
                  vr and vpr.</div>
                <div style="margin:0;">vr supports many operations, and
                  vpr supports vpror, vprxor, vprand and vprinv
                  operations.</div>
                <div style="margin:0;"> A vr has 512 bits, and a vpr has
                  128 bits. vr is used for v16i32, v32i16, v64i8. And a
                  scalar register has 32 bits.</div>
                <div style="margin:0;">If we compare or add two v16i32,
                  a element in vpr has 8 bits. If we compare or add two
                  v64i8, then a element in vpr has 2 bits(one bit for
                  compare flag and one bit for carry flag). </div>
                <div style="margin:0;">A element in vpr contains carry
                  flag and compare flag.</div>
                <div style="margin:0;"> We have defined registers and a
                  new type(vpr) for vector predicate registers in
                  backend.</div>
                <div style="margin:0;">Although there is no direct
                  instruction to move vpr to vr or to move vr to vpr,
                  there is a method to work around this. And we have
                  load/store instructions for vr.</div>
                <div style="margin:0;">move vpr to vr for v32i16 (from
                  vpr0 to vr1):</div>
                <div style="margin:0;">1    vclr    vr0   // clear vr0</div>
                <div style="margin:0;">2    ldi    r5, 0x00010001  //
                  load immediate (compare bit mask for v32i16) to scalar
                  register r5</div>
                <div style="margin:0;">3    movr2vr.dup    vr2, r5  //
                  duplicate content in r5 into vr2, </div>
                <div style="margin:0;">4    vadd.t.s16    vr1, vr0, vr2,
                  vpr0  //vector add if element compare bit is set,
                  element type is 16 bit signed integer, now we have
                  moved compare bits from vpr0 to vr1</div>
                <div style="margin:0;">5    ldi    r5, 0x00020002  //
                  load immediate (carry bit mask for v32i16) to scalar
                  register r5</div>
                <div style="margin:0;">6    movr2vr.dup   vr2, r5  //
                  duplicate content in r5 into vr2</div>
                <div style="margin:0;">7    vadd.c.s16    vr1, vr1, vr2,
                  vpr0 // vr1 = vr1 + vr2, vector add if element carry
                  bit is set, element type is 16 bit signed integer, now
                  we moved carry bits from vpr0 to vr1 too.</div>
                <div style="margin:0;"><br>
                </div>
                <div style="margin:0;">mov vr to vpr for v32i16 (from
                  vr1 to vpr0):</div>
                <div style="margin:0;">8    vclr    vr0  // clear vr0</div>
                <div style="margin:0;">9    ldi    r5, 0x00010001 //
                  load immediate (compare bit mask for v32i16) to r5</div>
                <div style="margin:0;">10  movr2vr.dup    vr2, r5 //
                  duplicate content of r5 into vr2</div>
                <div style="margin:0;">11  vand.u16    vr2, vr1, vr2  //
                  vector and, element type is 16 bit unsigned integer,
                  vr2 = vr1 & vr2, now we have moved compare bits
                  from vr1 to vr2 now</div>
                <div style="margin:0;">12  vslt.s16    vpr0, vr0, vr2 
                  // vector set when less than, element type is 16 bit
                  signed integer, now we have moved compare bits from
                  vr1 to vpr0</div>
                <div style="margin:0;">13  ldi    r5, 0x00020002 // load
                  immediate (carry bit mask for v32i16) to r5</div>
                <div style="margin:0;">14  movr2vr.dup    vr2, r5  //
                  duplicate content of r5 into vr2</div>
                <div style="margin:0;">15  vand.u16    vr2, vr1, vr2  //
                  vector and for element type 16 bit unsigned integer,
                  vr2 has carry bits now</div>
                <div style="margin:0;">16  ldi    r5, 0x7FFF7FFF  // max
                  number for 16 bit signed integer</div>
                <div style="margin:0;">17  movr2vr.dup    vr3, r5  //
                  duplicate r5 into vr3</div>
                <div style="margin:0;">18  vadd.s16  vr1, vr2, vr3,
                  vpr0  // vpr0 has carry bits set now</div>
                <div style="margin:0;"><br>
                </div>
                <div style="margin:0;">Each vector type has a different
                  instruction sequence, because the bit mask and element
                  type is different.</div>
                <div style="margin:0;">I have tried to lower load/store
                  for vpr in XXXISelLowering.cpp. But there is no
                  guarantee that line 12 and line 18 would assign the
                  same register for vpr0. vpr0 in line18 is an output
                  and is not an input.</div>
                <div style="margin:0;">And vpr0 in line 12 and line 18
                  is parallel in SelectionDAG graph. They are both
                  output.</div>
                <div style="margin:0;">I think I would try to define
                  three pseudo instructions for three vector type, and
                  expand the pseudo instruction into instruction
                  sequence before register allocation at next step. But
                  I'm not sure it will work.</div>
                <div style="margin:0;">What should I do? <br>
                </div>
              </div>
            </div>
          </blockquote>
          <p><br>
          </p>
          <p>This somewhat depends on how you're modeling things, but a
            late-expanded pseud-instructions seems like a workable
            approach. If the pseudo-instruction needs temporary
            registers (and it looks like it does), then the
            pseudo-instruction should take them as register operands (so
            that RA will allocate them for you and you don't need to
            worry about scavenging them later). You might, however, need
            to mark such operands as "early clobber" to prevent RA  from
            assigning the same register as an input and output
            (sometimes, depending on how the expanded code uses the
            registers, this is necessary).</p>
          <p> -Hal<br>
          </p>
          <p><br>
          </p>
          <blockquote cite="mid:28a315d6.11eb.172ea1a6f4f.Coremail.jackie_linzz@126.com" type="cite">
            <div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">
              <div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">
                <div style="margin:0;"><br>
                </div>
                <div style="margin:0;">Thanks and best regards,</div>
                <div style="margin:0;">Jerry</div>
                <div style="margin:0;"><br>
                </div>
                <div style="margin:0;"><br>
                </div>
                <div style="margin:0;"><br>
                </div>
                <div style="margin:0;"><br>
                </div>
                <div style="margin:0;"><br>
                </div>
                <div style="margin:0;"> </div>
                <div style="margin:0;"><br>
                </div>
              </div>
            </div>
            <br>
            <br>
            <span title="neteasefooter">
              <p> </p>
            </span><br>
            <fieldset class="mimeAttachmentHeader"></fieldset>
            <pre class="moz-quote-pre" wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
          </blockquote>
          <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
        </blockquote>
      </div>
      <br>
      <br>
      <span title="neteasefooter">
        <p> </p>
      </span>
    </blockquote>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  

</blockquote></div><br><br><span title="neteasefooter"><p> </p></span>