[llvm] r321437 - [X86] Add a DAG combines to turn vXi64 muls into VPMULDQ/VPMULUDQ if the upper bits are all sign bits or zeros.

Chandler Carruth via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 27 04:10:17 PST 2017


On Sun, Dec 24, 2017 at 10:48 PM Craig Topper via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> Author: ctopper
> Date: Sun Dec 24 22:47:10 2017
> New Revision: 321437
>
> URL: http://llvm.org/viewvc/llvm-project?rev=321437&view=rev
> Log:
> [X86] Add a DAG combines to turn vXi64 muls into VPMULDQ/VPMULUDQ if the
> upper bits are all sign bits or zeros.
>
> Normally we catch this during lowering, but vXi64 mul is considered legal
> when we have AVX512DQ.
>
> This DAG combine allows us to avoid PMULLQ with AVX512DQ if we can prove
> its unnecessary. PMULLQ is 3 uops that take 4 cycles each. While
> pmuldq/pmuludq is only one 4 cycle uop.
>

We're seeing ISEL crashes as a consequence of this patch... I think I see
the issue here. See below:


> +static SDValue combineVMUL(SDNode *N, SelectionDAG &DAG,
> +                           const X86Subtarget &Subtarget) {
> +  EVT VT = N->getValueType(0);
> +  SDLoc dl(N);
> +
> +  if (VT.getScalarType() != MVT::i64)
> +    return SDValue();
> +
> +  MVT MulVT = MVT::getVectorVT(MVT::i32, VT.getVectorNumElements() * 2);
> +
> +  SDValue LHS = N->getOperand(0);
> +  SDValue RHS = N->getOperand(1);
> +
> +  // MULDQ returns the 64-bit result of the signed multiplication of the
> lower
> +  // 32-bits. We can lower with this if the sign bits stretch that far.
> +  if (Subtarget.hasSSE41() && DAG.ComputeNumSignBits(LHS) > 32 &&
> +      DAG.ComputeNumSignBits(RHS) > 32) {
> +    return DAG.getNode(X86ISD::PMULDQ, dl, VT, DAG.getBitcast(MulVT, LHS),
> +                       DAG.getBitcast(MulVT, RHS));
> +  }
> +
> +  // If the upper bits are zero we can use a single pmuludq.
> +  APInt Mask = APInt::getHighBitsSet(64, 32);
> +  if (DAG.MaskedValueIsZero(LHS, Mask) && DAG.MaskedValueIsZero(RHS,
> Mask)) {
> +    return DAG.getNode(X86ISD::PMULUDQ, dl, VT, DAG.getBitcast(MulVT,
> LHS),
> +                       DAG.getBitcast(MulVT, RHS));
>

Does this need a predicate to check that PMULUDQ is legal here?
Specifically, I'm worried about stuff like a v4i64 vector which is a legal
type on AVX1 but this instruction wouldn't select correctly until AVX2...


The crash happens in some code using Halide. I think the Halide tests are
green (but you can check -- they're in the test suite) so its something
particular. We'll try to extract a test case if my theory above doesn't
point to something obvious you see here in the code.

The crash definitely comes due to a failure to select PMULUDQ:

LLVM ERROR: Cannot select: 0x7f08ff9621a0: v4i64 = X86ISD::PMULUDQ
0x7f08ff8029c0, 0x7f08fee4da28
  0x7f08ff8029c0: v8i32 = bitcast 0x7f08ff571a28
    0x7f08ff571a28: v4i64 = insert_subvector 0x7f08ff571d68,
0x7f08fedc6a28, Constant:i64<2>
      0x7f08ff571d68: v4i64 = insert_subvector undef:v4i64, 0x7f08fedc6958,
Constant:i64<0>
        0x7f08ff571e38: v4i64 = undef
        0x7f08fedc6958: v2i64 = zero_extend_vector_inreg 0x7f08ff311208
          0x7f08ff311208: v4i32 = bitcast 0x7f08ff29ae38
            0x7f08ff29ae38: v2i64 = xor 0x7f08ff29af70, 0x7f08ff802958
              0x7f08ff29af70: v2i64 = bitcast 0x7f08ffde7b60
                0x7f08ffde7b60: v4i32 = X86ISD::VSRAI 0x7f08ffde7270,
Constant:i8<31>
                  0x7f08ffde7270: v4i32 = add 0x7f08ff56e618, 0x7f08fedc8d68


                  0x7f08ffde7208: i8 = Constant<31>
              0x7f08ff802958: v2i64 = bitcast 0x7f08ffde7270
                0x7f08ffde7270: v4i32 = add 0x7f08ff56e618, 0x7f08fedc8d68
                  0x7f08ff56e618: v4i32 = X86ISD::PSHUFD 0x7f08fee4de38,
Constant:i8<0>


                  0x7f08fedc8d68: v4i32 = bitcast 0x7f08fee4d888

        0x7f08ffbef5b0: i64 = Constant<0>
      0x7f08fedc6a28: v2i64 = zero_extend_vector_inreg 0x7f08ff8022d8
        0x7f08ff8022d8: v4i32 = X86ISD::PSHUFD 0x7f08ff311208,
Constant:i8<78>
          0x7f08ff311208: v4i32 = bitcast 0x7f08ff29ae38
            0x7f08ff29ae38: v2i64 = xor 0x7f08ff29af70, 0x7f08ff802958
              0x7f08ff29af70: v2i64 = bitcast 0x7f08ffde7b60
                0x7f08ffde7b60: v4i32 = X86ISD::VSRAI 0x7f08ffde7270,
Constant:i8<31>
                  0x7f08ffde7270: v4i32 = add 0x7f08ff56e618, 0x7f08fedc8d68


                  0x7f08ffde7208: i8 = Constant<31>
              0x7f08ff802958: v2i64 = bitcast 0x7f08ffde7270
                0x7f08ffde7270: v4i32 = add 0x7f08ff56e618, 0x7f08fedc8d68
                  0x7f08ff56e618: v4i32 = X86ISD::PSHUFD 0x7f08fee4de38,
Constant:i8<0>


                  0x7f08fedc8d68: v4i32 = bitcast 0x7f08fee4d888

          0x7f08ff56ed68: i8 = Constant<78>
      0x7f08ffde7f70: i64 = Constant<2>
  0x7f08fee4da28: v8i32 = bitcast 0x7f08ff962ea0
    0x7f08ff962ea0: v4i64,ch = load<LD32[ConstantPool]> 0x7f08ff79c840,
0x7f08ff56f6e8, undef:i64
      0x7f08ff56f6e8: i64 = X86ISD::WrapperRIP TargetConstantPool:i64<<4 x
i64> <...>> 0
        0x7f08ff739888: i64 = TargetConstantPool<<4 x i64> <...>> 0
      0x7f08ff571af8: i64 = undef
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171227/39e09685/attachment.html>


More information about the llvm-commits mailing list