[llvm] r321437 - [X86] Add a DAG combines to turn vXi64 muls into VPMULDQ/VPMULUDQ if the upper bits are all sign bits or zeros.

Wed Dec 27 05:32:58 PST 2017

r321490.

On Wed, Dec 27, 2017 at 1:10 PM, Chandler Carruth via llvm-commits
<llvm-commits at lists.llvm.org> wrote:
> On Sun, Dec 24, 2017 at 10:48 PM Craig Topper via llvm-commits
> <llvm-commits at lists.llvm.org> wrote:
>>
>> Author: ctopper
>> Date: Sun Dec 24 22:47:10 2017
>> New Revision: 321437
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=321437&view=rev
>> Log:
>> [X86] Add a DAG combines to turn vXi64 muls into VPMULDQ/VPMULUDQ if the
>> upper bits are all sign bits or zeros.
>>
>> Normally we catch this during lowering, but vXi64 mul is considered legal
>> when we have AVX512DQ.
>>
>> This DAG combine allows us to avoid PMULLQ with AVX512DQ if we can prove
>> its unnecessary. PMULLQ is 3 uops that take 4 cycles each. While
>> pmuldq/pmuludq is only one 4 cycle uop.
>
>
> We're seeing ISEL crashes as a consequence of this patch... I think I see
> the issue here. See below:
>
>>
>> +static SDValue combineVMUL(SDNode *N, SelectionDAG &DAG,
>> +                           const X86Subtarget &Subtarget) {
>> +  EVT VT = N->getValueType(0);
>> +  SDLoc dl(N);
>> +
>> +  if (VT.getScalarType() != MVT::i64)
>> +    return SDValue();
>> +
>> +  MVT MulVT = MVT::getVectorVT(MVT::i32, VT.getVectorNumElements() * 2);
>> +
>> +  SDValue LHS = N->getOperand(0);
>> +  SDValue RHS = N->getOperand(1);
>> +
>> +  // MULDQ returns the 64-bit result of the signed multiplication of the
>> lower
>> +  // 32-bits. We can lower with this if the sign bits stretch that far.
>> +  if (Subtarget.hasSSE41() && DAG.ComputeNumSignBits(LHS) > 32 &&
>> +      DAG.ComputeNumSignBits(RHS) > 32) {
>> +    return DAG.getNode(X86ISD::PMULDQ, dl, VT, DAG.getBitcast(MulVT,
>> LHS),
>> +                       DAG.getBitcast(MulVT, RHS));
>> +  }
>> +
>> +  // If the upper bits are zero we can use a single pmuludq.
>> +  APInt Mask = APInt::getHighBitsSet(64, 32);
>> +  if (DAG.MaskedValueIsZero(LHS, Mask) && DAG.MaskedValueIsZero(RHS,
>> Mask)) {
>> +    return DAG.getNode(X86ISD::PMULUDQ, dl, VT, DAG.getBitcast(MulVT,
>> LHS),
>> +                       DAG.getBitcast(MulVT, RHS));
>
>
> Does this need a predicate to check that PMULUDQ is legal here?
> Specifically, I'm worried about stuff like a v4i64 vector which is a legal
> type on AVX1 but this instruction wouldn't select correctly until AVX2...
>
>
> The crash happens in some code using Halide. I think the Halide tests are
> green (but you can check -- they're in the test suite) so its something
> particular. We'll try to extract a test case if my theory above doesn't
> point to something obvious you see here in the code.
>
> The crash definitely comes due to a failure to select PMULUDQ:
>
> LLVM ERROR: Cannot select: 0x7f08ff9621a0: v4i64 = X86ISD::PMULUDQ
> 0x7f08ff8029c0, 0x7f08fee4da28
>   0x7f08ff8029c0: v8i32 = bitcast 0x7f08ff571a28
>     0x7f08ff571a28: v4i64 = insert_subvector 0x7f08ff571d68, 0x7f08fedc6a28,
> Constant:i64<2>
>       0x7f08ff571d68: v4i64 = insert_subvector undef:v4i64, 0x7f08fedc6958,
> Constant:i64<0>
>         0x7f08ff571e38: v4i64 = undef
>         0x7f08fedc6958: v2i64 = zero_extend_vector_inreg 0x7f08ff311208
>           0x7f08ff311208: v4i32 = bitcast 0x7f08ff29ae38
>             0x7f08ff29ae38: v2i64 = xor 0x7f08ff29af70, 0x7f08ff802958
>               0x7f08ff29af70: v2i64 = bitcast 0x7f08ffde7b60
>                 0x7f08ffde7b60: v4i32 = X86ISD::VSRAI 0x7f08ffde7270,
> Constant:i8<31>
>                   0x7f08ffde7270: v4i32 = add 0x7f08ff56e618, 0x7f08fedc8d68
>
>
>                   0x7f08ffde7208: i8 = Constant<31>
>               0x7f08ff802958: v2i64 = bitcast 0x7f08ffde7270
>                 0x7f08ffde7270: v4i32 = add 0x7f08ff56e618, 0x7f08fedc8d68
>                   0x7f08ff56e618: v4i32 = X86ISD::PSHUFD 0x7f08fee4de38,
> Constant:i8<0>
>
>
>                   0x7f08fedc8d68: v4i32 = bitcast 0x7f08fee4d888
>
>         0x7f08ffbef5b0: i64 = Constant<0>
>       0x7f08fedc6a28: v2i64 = zero_extend_vector_inreg 0x7f08ff8022d8
>         0x7f08ff8022d8: v4i32 = X86ISD::PSHUFD 0x7f08ff311208,
> Constant:i8<78>
>           0x7f08ff311208: v4i32 = bitcast 0x7f08ff29ae38
>             0x7f08ff29ae38: v2i64 = xor 0x7f08ff29af70, 0x7f08ff802958
>               0x7f08ff29af70: v2i64 = bitcast 0x7f08ffde7b60
>                 0x7f08ffde7b60: v4i32 = X86ISD::VSRAI 0x7f08ffde7270,
> Constant:i8<31>
>                   0x7f08ffde7270: v4i32 = add 0x7f08ff56e618, 0x7f08fedc8d68
>
>
>                   0x7f08ffde7208: i8 = Constant<31>
>               0x7f08ff802958: v2i64 = bitcast 0x7f08ffde7270
>                 0x7f08ffde7270: v4i32 = add 0x7f08ff56e618, 0x7f08fedc8d68
>                   0x7f08ff56e618: v4i32 = X86ISD::PSHUFD 0x7f08fee4de38,
> Constant:i8<0>
>
>
>                   0x7f08fedc8d68: v4i32 = bitcast 0x7f08fee4d888
>
>           0x7f08ff56ed68: i8 = Constant<78>
>       0x7f08ffde7f70: i64 = Constant<2>
>   0x7f08fee4da28: v8i32 = bitcast 0x7f08ff962ea0
>     0x7f08ff962ea0: v4i64,ch = load<LD32[ConstantPool]> 0x7f08ff79c840,
> 0x7f08ff56f6e8, undef:i64
>       0x7f08ff56f6e8: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<<4 x
> i64> <...>> 0
>         0x7f08ff739888: i64 = TargetConstantPool<<4 x i64> <...>> 0
>       0x7f08ff571af8: i64 = undef
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>