<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span class="Apple-style-span" style="font-family: monospace; ">Please test if r115571 has fixed it.</span><div><font class="Apple-style-span" face="monospace"><br></font></div><div><font class="Apple-style-span" face="monospace">Evan</font></div><div><font class="Apple-style-span" face="monospace"><br></font><div><div>On Oct 4, 2010, at 5:00 AM, Heikki Kultala wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>Bill Wendling wrote:<br><blockquote type="cite">On Sep 30, 2010, at 2:13 AM, Heikki Kultala wrote:<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><blockquote type="cite">Bill Wendling wrote:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">On Sep 29, 2010, at 12:36 AM, Heikki Kultala wrote:<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">On 29 Sep 2010, at 06:25, Heikki Kultala wrote:<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Our architecture has 1-bit boolean predicate registers.<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">I've defined comparison<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">def NErrb : InstTCE<(outs I1Regs:$op3), (ins I32Regs:$op1,I32Regs:$op2), "", [(set I1Regs:$op3, (setne I32Regs:$op1, I32Regs:$op2))]>;<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">But then I end up having the following bug:<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Code<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">%0 = zext i8 %data to i32<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">%1 = zext i16 %crc to i32<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">%2 = xor i32 %1, %0<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">%3 = and i32 %2, 1<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">%4 = icmp eq i32 %3, 0<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">which compares the lowest bits of the 2 variables<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">ends up being compiled as<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">     %reg16384<def> = LDWi <fi#-2>, 0; mem:LD4[FixedStack-2] I32Regs:%reg16384<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">     %reg16385<def> = LDWi <fi#-1>, 0; mem:LD4[FixedStack-1] I32Regs:%reg16385<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">     %reg16386<def> = COPY %reg16384; I32Regs:%reg16386,16384<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">     %reg16390<def> = NErrb %reg16384, %reg16385; I1Regs:%reg16390 I32Regs:%reg16384,16385<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">which just compares ALL BITS of the variables.<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">I also have a pattern:<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">def XORrrb : InstTCE<(outs I1Regs:$op3), (ins I32Regs:$op1,I32Regs:$op2), "", [(set I1Regs:$op3, (trunc (xor I32Regs:$op1, I32Regs:$op2)))]>;<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Which can do the whole 3-operation code sequence correctly with one operation.<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">With LLVM 2.7 this correct operation is selected, with LLVM 2.8 the wrong operation(which compares all bits) is chosen<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">So this looks like a bug in LLVM 2.8 isel?<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Hi Heikki,<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">We need a better example of what's going on. What's the original code? Also, I don't have access to your back-end's code so it's hard to tell just from these snippets what's going on. For instance, it's not clear whether it's the instruction selector that's at fault or if your .td files have a bug in them somewhere.<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">The original code is:<br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">[snip]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><blockquote type="cite">where the interesting lines are lines 12-13:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">                x16 = (e_u8)(((data) ^ ((e_u8)crc))&1);<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">                if (x16 == 1)<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">The code which goes into isel is:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">bb.nph:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">  %0 = zext i8 %data to i32<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">  %1 = zext i16 %crc to i32<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">  %2 = xor i32 %1, %0<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">  %3 = and i32 %2, 1<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">  %4 = icmp eq i32 %3, 0<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">  br i1 %4, label %bb.nph._crit_edge, label %5<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">inside selectiondag this becomes:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Legalized selection DAG:<br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">[snip]<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><blockquote type="cite">        0x248d280: <multiple use><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">        0x248d980: <multiple use><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">      0x25bb7f0: i32 = xor 0x248d280, 0x248d980 [ORD=3] [ID=15]<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">    0x25bbbf0: i1 = truncate 0x25bb7f0 [ID=18]<br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">This truncate is weird to me. If anything, it should be an "and" instruction. I have a feeling that your back-end is telling instruction selection and the type legalizer that it's okay to replace the normal "and" with this truncate call, which leads to your troubles later on.<br></blockquote><br>It would seem that the truncate is created by:<br><br>TargetLowering::SimplifySetCC<br><br>...<br><br><br>      if (N0.getOpcode() == ISD::SETCC &&<br>           isTypeLegal(VT) && VT.bitsLE(N0.getValueType())) {<br>         bool TrueWhenTrue = (Cond == ISD::SETEQ) ^ <br>(N1C->getAPIntValue() != 1);<br>         if (TrueWhenTrue)<br>           return DAG.getNode(ISD::TRUNCATE, dl, VT, N0);<br><br>         // Invert the condition.<br>         ISD::CondCode CC = cast<CondCodeSDNode>(N0.getOperand(2))->get();<br>         CC = ISD::getSetCCInverse(CC,<br><br>N0.getOperand(0).getValueType().isInteger());<br>         return DAG.getSetCC(dl, VT, N0.getOperand(0), N0.getOperand(1), <br>CC);<br>       }<br><br><br>and the AND is then dropped by<br><br>TargetLowering::SimplifyDemandedBits<br><br>...<br><br><br>   switch (Op.getOpcode()) {<br>...<br>   case ISD::AND:<br>     // If the RHS is a constant, check to see if the LHS would be zero <br>without<br>     // using the bits from the RHS.  Below, we use knowledge about the <br>RHS to<br>     // simplify the LHS, here we're using information from the LHS to <br>simplify<br>     // the RHS.<br>     if (ConstantSDNode *RHSC = <br>dyn_cast<ConstantSDNode>(Op.getOperand(1))) {<br>       APInt LHSZero, LHSOne;<br>       TLO.DAG.ComputeMaskedBits(Op.getOperand(0), NewMask,<br>                                 LHSZero, LHSOne, Depth+1);<br>       // If the LHS already has zeros where RHSC does, this and is dead.<br>       if ((LHSZero & NewMask) == (~RHSC->getAPIntValue() & NewMask))<br><span class="Apple-tab-span" style="white-space:pre"> </span>  return TLO.CombineTo(Op, Op.getOperand(0));<br><br><br><br><br><br>As neither of these are virtual functions, we cannot create an <br>workaround hack for our backend to easily circumvent this bug.<br><br><br><br><br>It would now seem that TCE users cannot use the default LLVM 2.8 but <br>we'll have to distribute our own patch to disable the invalid dropping <br>of the trunc and make all our users compile LLVM themselves with the <br>patch :(<br>_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br></div></blockquote></div><br></div></body></html>