<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Sun, Dec 24, 2017 at 10:48 PM Craig Topper via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: ctopper<br>

Date: Sun Dec 24 22:47:10 2017<br>

New Revision: 321437<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=321437&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=321437&view=rev</a><br>

Log:<br>

[X86] Add a DAG combines to turn vXi64 muls into VPMULDQ/VPMULUDQ if the upper bits are all sign bits or zeros.<br>

<br>

Normally we catch this during lowering, but vXi64 mul is considered legal when we have AVX512DQ.<br>

<br>

This DAG combine allows us to avoid PMULLQ with AVX512DQ if we can prove its unnecessary. PMULLQ is 3 uops that take 4 cycles each. While pmuldq/pmuludq is only one 4 cycle uop.<br></blockquote><div><br></div><div>We're seeing ISEL crashes as a consequence of this patch... I think I see the issue here. See below:</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

+static SDValue combineVMUL(SDNode *N, SelectionDAG &DAG,<br>

+                           const X86Subtarget &Subtarget) {<br>

+  EVT VT = N->getValueType(0);<br>

+  SDLoc dl(N);<br>

+<br>

+  if (VT.getScalarType() != MVT::i64)<br>

+    return SDValue();<br>

+<br>

+  MVT MulVT = MVT::getVectorVT(MVT::i32, VT.getVectorNumElements() * 2);<br>

+<br>

+  SDValue LHS = N->getOperand(0);<br>

+  SDValue RHS = N->getOperand(1);<br>

+<br>

+  // MULDQ returns the 64-bit result of the signed multiplication of the lower<br>

+  // 32-bits. We can lower with this if the sign bits stretch that far.<br>

+  if (Subtarget.hasSSE41() && DAG.ComputeNumSignBits(LHS) > 32 &&<br>

+      DAG.ComputeNumSignBits(RHS) > 32) {<br>

+    return DAG.getNode(X86ISD::PMULDQ, dl, VT, DAG.getBitcast(MulVT, LHS),<br>

+                       DAG.getBitcast(MulVT, RHS));<br>

+  }<br>

+<br>

+  // If the upper bits are zero we can use a single pmuludq.<br>

+  APInt Mask = APInt::getHighBitsSet(64, 32);<br>

+  if (DAG.MaskedValueIsZero(LHS, Mask) && DAG.MaskedValueIsZero(RHS, Mask)) {<br>

+    return DAG.getNode(X86ISD::PMULUDQ, dl, VT, DAG.getBitcast(MulVT, LHS),<br>

+                       DAG.getBitcast(MulVT, RHS));<br></blockquote><div><br></div><div>Does this need a predicate to check that PMULUDQ is legal here? Specifically, I'm worried about stuff like a v4i64 vector which is a legal type on AVX1 but this instruction wouldn't select correctly until AVX2...</div><div><br></div><div><br></div><div>The crash happens in some code using Halide. I think the Halide tests are green (but you can check -- they're in the test suite) so its something particular. We'll try to extract a test case if my theory above doesn't point to something obvious you see here in the code.</div><div><br></div><div>The crash definitely comes due to a failure to select PMULUDQ:</div><div><br></div><div><div>LLVM ERROR: Cannot select: 0x7f08ff9621a0: v4i64 = X86ISD::PMULUDQ 0x7f08ff8029c0, 0x7f08fee4da28</div><div>  0x7f08ff8029c0: v8i32 = bitcast 0x7f08ff571a28</div><div>    0x7f08ff571a28: v4i64 = insert_subvector 0x7f08ff571d68, 0x7f08fedc6a28, Constant:i64<2></div><div>      0x7f08ff571d68: v4i64 = insert_subvector undef:v4i64, 0x7f08fedc6958, Constant:i64<0></div><div>        0x7f08ff571e38: v4i64 = undef</div><div>        0x7f08fedc6958: v2i64 = zero_extend_vector_inreg 0x7f08ff311208</div><div>          0x7f08ff311208: v4i32 = bitcast 0x7f08ff29ae38</div><div>            0x7f08ff29ae38: v2i64 = xor 0x7f08ff29af70, 0x7f08ff802958</div><div>              0x7f08ff29af70: v2i64 = bitcast 0x7f08ffde7b60</div><div>                0x7f08ffde7b60: v4i32 = X86ISD::VSRAI 0x7f08ffde7270, Constant:i8<31></div><div>                  0x7f08ffde7270: v4i32 = add 0x7f08ff56e618, 0x7f08fedc8d68</div><div><br></div><div><br></div><div>                  0x7f08ffde7208: i8 = Constant<31></div><div>              0x7f08ff802958: v2i64 = bitcast 0x7f08ffde7270</div><div>                0x7f08ffde7270: v4i32 = add 0x7f08ff56e618, 0x7f08fedc8d68</div><div>                  0x7f08ff56e618: v4i32 = X86ISD::PSHUFD 0x7f08fee4de38, Constant:i8<0></div><div><br></div><div><br></div><div>                  0x7f08fedc8d68: v4i32 = bitcast 0x7f08fee4d888</div><div><br></div><div>        0x7f08ffbef5b0: i64 = Constant<0></div><div>      0x7f08fedc6a28: v2i64 = zero_extend_vector_inreg 0x7f08ff8022d8</div><div>        0x7f08ff8022d8: v4i32 = X86ISD::PSHUFD 0x7f08ff311208, Constant:i8<78></div><div>          0x7f08ff311208: v4i32 = bitcast 0x7f08ff29ae38</div><div>            0x7f08ff29ae38: v2i64 = xor 0x7f08ff29af70, 0x7f08ff802958</div><div>              0x7f08ff29af70: v2i64 = bitcast 0x7f08ffde7b60</div><div>                0x7f08ffde7b60: v4i32 = X86ISD::VSRAI 0x7f08ffde7270, Constant:i8<31></div><div>                  0x7f08ffde7270: v4i32 = add 0x7f08ff56e618, 0x7f08fedc8d68</div><div><br></div><div><br></div><div>                  0x7f08ffde7208: i8 = Constant<31></div><div>              0x7f08ff802958: v2i64 = bitcast 0x7f08ffde7270</div><div>                0x7f08ffde7270: v4i32 = add 0x7f08ff56e618, 0x7f08fedc8d68</div><div>                  0x7f08ff56e618: v4i32 = X86ISD::PSHUFD 0x7f08fee4de38, Constant:i8<0></div><div><br></div><div><br></div><div>                  0x7f08fedc8d68: v4i32 = bitcast 0x7f08fee4d888</div><div><br></div><div>          0x7f08ff56ed68: i8 = Constant<78></div><div>      0x7f08ffde7f70: i64 = Constant<2></div><div>  0x7f08fee4da28: v8i32 = bitcast 0x7f08ff962ea0</div><div>    0x7f08ff962ea0: v4i64,ch = load<LD32[ConstantPool]> 0x7f08ff79c840, 0x7f08ff56f6e8, undef:i64</div><div>      0x7f08ff56f6e8: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<<4 x i64> <...>> 0</div><div>        0x7f08ff739888: i64 = TargetConstantPool<<4 x i64> <...>> 0</div><div>      0x7f08ff571af8: i64 = undef</div></div><div><br></div></div></div>