[llvm] r313054 - [X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI instruction to custom isel
Yung, Douglas via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 12 22:52:06 PDT 2017
Hi Craig,
One of our internal tests seems to have found an issue with your commit. I have filed PR34589 with the details. Can you take a look?
Douglas Yung
> -----Original Message-----
> From: llvm-commits [mailto:llvm-commits-bounces at lists.llvm.org] On Behalf Of
> Craig Topper via llvm-commits
> Sent: Tuesday, September 12, 2017 10:40
> To: llvm-commits at lists.llvm.org
> Subject: [llvm] r313054 - [X86] Move matching of (and (srl/sra, C), (1<<C) -
> 1) to BEXTR/BEXTRI instruction to custom isel
>
> Author: ctopper
> Date: Tue Sep 12 10:40:25 2017
> New Revision: 313054
>
> URL: http://llvm.org/viewvc/llvm-project?rev=313054&view=rev
> Log:
> [X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI
> instruction to custom isel
>
> Recognizing this pattern during DAG combine hides information about the 'and'
> and the shift from other combines. I think it should be recognized at isel so
> its as late as possible. But it can't be done with table based isel because
> you need to be able to look at both immediates. This patch moves it to custom
> isel in X86ISelDAGToDAG.cpp.
>
> This does break a couple tests in tbm_patterns because we are now emitting an
> and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this
> problem for several other TBM patterns so I think this fine and we can address
> of them together.
>
> I've also fixed a bug where the combine to BEXTR was preventing us from using
> a trick of zero extending AH to handle extracts of bits 15:8. We might still
> want to use BEXTR if it enables load folding. But honestly I hope we narrowed
> the load instead before got to isel.
>
> I think we should probably also support matching BEXTR from (srl/srl (and mask
> << C), C). But that should be a different patch.
>
> Differential Revision: https://reviews.llvm.org/D37592
>
> Modified:
> llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp
> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> llvm/trunk/lib/Target/X86/X86ISelLowering.h
> llvm/trunk/lib/Target/X86/X86InstrInfo.td
> llvm/trunk/test/CodeGen/X86/bmi.ll
> llvm/trunk/test/CodeGen/X86/tbm_patterns.ll
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp?rev=313054&r1=313053&r2=
> 313054&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp Tue Sep 12 10:40:25
> +++ 2017
> @@ -422,6 +422,8 @@ namespace {
> }
>
> bool foldLoadStoreIntoMemOperand(SDNode *Node);
> +
> + bool matchBEXTRFromAnd(SDNode *Node);
> };
> }
>
> @@ -2280,6 +2282,90 @@ bool X86DAGToDAGISel::foldLoadStoreIntoM
> return true;
> }
>
> +// See if this is an (X >> C1) & C2 that we can match to BEXTR/BEXTRI.
> +bool X86DAGToDAGISel::matchBEXTRFromAnd(SDNode *Node) {
> + MVT NVT = Node->getSimpleValueType(0);
> + SDLoc dl(Node);
> +
> + SDValue N0 = Node->getOperand(0);
> + SDValue N1 = Node->getOperand(1);
> +
> + if (!Subtarget->hasBMI() && !Subtarget->hasTBM())
> + return false;
> +
> + // Must have a shift right.
> + if (N0->getOpcode() != ISD::SRL && N0->getOpcode() != ISD::SRA)
> + return false;
> +
> + // Shift can't have additional users.
> + if (!N0->hasOneUse())
> + return false;
> +
> + // Only supported for 32 and 64 bits.
> + if (NVT != MVT::i32 && NVT != MVT::i64)
> + return false;
> +
> + // Shift amount and RHS of and must be constant.
> + ConstantSDNode *MaskCst = dyn_cast<ConstantSDNode>(N1);
> + ConstantSDNode *ShiftCst =
> + dyn_cast<ConstantSDNode>(N0->getOperand(1));
> + if (!MaskCst || !ShiftCst)
> + return false;
> +
> + // And RHS must be a mask.
> + uint64_t Mask = MaskCst->getZExtValue(); if (!isMask_64(Mask))
> + return false;
> +
> + uint64_t Shift = ShiftCst->getZExtValue(); uint64_t MaskSize =
> + countPopulation(Mask);
> +
> + // Don't interfere with something that can be handled by extracting AH.
> + // TODO: If we are able to fold a load, BEXTR might still be better than
> AH.
> + if (Shift == 8 && MaskSize == 8)
> + return false;
> +
> + // Make sure we are only using bits that were in the original value,
> + not // shifted in.
> + if (Shift + MaskSize > NVT.getSizeInBits())
> + return false;
> +
> + SDValue New = CurDAG->getTargetConstant(Shift | (MaskSize << 8), dl,
> + NVT); unsigned ROpc = NVT == MVT::i64 ? X86::BEXTRI64ri :
> + X86::BEXTRI32ri; unsigned MOpc = NVT == MVT::i64 ? X86::BEXTRI64mi :
> + X86::BEXTRI32mi;
> +
> + // BMI requires the immediate to placed in a register.
> + if (!Subtarget->hasTBM()) {
> + ROpc = NVT == MVT::i64 ? X86::BEXTR64rr : X86::BEXTR32rr;
> + MOpc = NVT == MVT::i64 ? X86::BEXTR64rm : X86::BEXTR32rm;
> + SDNode *Move = CurDAG->getMachineNode(X86::MOV32ri, dl, NVT, New);
> + New = SDValue(Move, 0);
> + }
> +
> + MachineSDNode *NewNode;
> + SDValue Input = N0->getOperand(0);
> + SDValue Tmp0, Tmp1, Tmp2, Tmp3, Tmp4; if (tryFoldLoad(Node, Input,
> + Tmp0, Tmp1, Tmp2, Tmp3, Tmp4)) {
> + SDValue Ops[] = { Tmp0, Tmp1, Tmp2, Tmp3, Tmp4, New, Input.getOperand(0)
> };
> + SDVTList VTs = CurDAG->getVTList(NVT, MVT::Other);
> + NewNode = CurDAG->getMachineNode(MOpc, dl, VTs, Ops);
> + // Update the chain.
> + ReplaceUses(N1.getValue(1), SDValue(NewNode, 1));
> + // Record the mem-refs
> + LoadSDNode *LoadNode = cast<LoadSDNode>(Input);
> + if (LoadNode) {
> + MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);
> + MemOp[0] = LoadNode->getMemOperand();
> + NewNode->setMemRefs(MemOp, MemOp + 1);
> + }
> + } else {
> + NewNode = CurDAG->getMachineNode(ROpc, dl, NVT, Input, New); }
> +
> + ReplaceUses(SDValue(Node, 0), SDValue(NewNode, 0));
> + CurDAG->RemoveDeadNode(Node);
> + return true;
> +}
> +
> void X86DAGToDAGISel::Select(SDNode *Node) {
> MVT NVT = Node->getSimpleValueType(0);
> unsigned Opc, MOpc;
> @@ -2333,8 +2419,14 @@ void X86DAGToDAGISel::Select(SDNode *Nod
> }
>
> case ISD::AND:
> + // Try to match BEXTR/BEXTRI instruction.
> + if (matchBEXTRFromAnd(Node))
> + return;
> +
> + LLVM_FALLTHROUGH;
> case ISD::OR:
> case ISD::XOR: {
> +
> // For operations of the form (x << C1) op C2, check if we can use a
> smaller
> // encoding for C2 by transforming it into (x op (C2>>C1)) << C1.
> SDValue N0 = Node->getOperand(0);
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=313054&r1=313053&r2=
> 313054&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Tue Sep 12 10:40:25
> +++ 2017
> @@ -24586,7 +24586,6 @@ const char *X86TargetLowering::getTarget
> case X86ISD::OR: return "X86ISD::OR";
> case X86ISD::XOR: return "X86ISD::XOR";
> case X86ISD::AND: return "X86ISD::AND";
> - case X86ISD::BEXTR: return "X86ISD::BEXTR";
> case X86ISD::MUL_IMM: return "X86ISD::MUL_IMM";
> case X86ISD::MOVMSK: return "X86ISD::MOVMSK";
> case X86ISD::PTEST: return "X86ISD::PTEST";
> @@ -32162,9 +32161,6 @@ static SDValue combineAnd(SDNode *N, Sel
> return ShiftRight;
>
> EVT VT = N->getValueType(0);
> - SDValue N0 = N->getOperand(0);
> - SDValue N1 = N->getOperand(1);
> - SDLoc DL(N);
>
> // Attempt to recursively combine a bitmask AND with shuffles.
> if (VT.isVector() && (VT.getScalarSizeInBits() % 8) == 0) { @@ -32177,29
> +32173,6 @@ static SDValue combineAnd(SDNode *N, Sel
> return SDValue(); // This routine will use CombineTo to replace N.
> }
>
> - // Create BEXTR instructions
> - // BEXTR is ((X >> imm) & (2**size-1))
> - if (VT != MVT::i32 && VT != MVT::i64)
> - return SDValue();
> -
> - if (!Subtarget.hasBMI() && !Subtarget.hasTBM())
> - return SDValue();
> - if (N0.getOpcode() != ISD::SRA && N0.getOpcode() != ISD::SRL)
> - return SDValue();
> -
> - ConstantSDNode *MaskNode = dyn_cast<ConstantSDNode>(N1);
> - ConstantSDNode *ShiftNode = dyn_cast<ConstantSDNode>(N0.getOperand(1));
> - if (MaskNode && ShiftNode) {
> - uint64_t Mask = MaskNode->getZExtValue();
> - uint64_t Shift = ShiftNode->getZExtValue();
> - if (isMask_64(Mask)) {
> - uint64_t MaskSize = countPopulation(Mask);
> - if (Shift + MaskSize <= VT.getSizeInBits())
> - return DAG.getNode(X86ISD::BEXTR, DL, VT, N0.getOperand(0),
> - DAG.getConstant(Shift | (MaskSize << 8), DL,
> - VT));
> - }
> - }
> return SDValue();
> }
>
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.h
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/lib/Target/X86/X86ISelLowering.h?rev=313054&r1=313053&r2=31
> 3054&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.h (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.h Tue Sep 12 10:40:25 2017
> @@ -346,9 +346,6 @@ namespace llvm {
> ADD, SUB, ADC, SBB, SMUL,
> INC, DEC, OR, XOR, AND,
>
> - // Bit field extract.
> - BEXTR,
> -
> // LOW, HI, FLAGS = umul LHS, RHS.
> UMUL,
>
>
> Modified: llvm/trunk/lib/Target/X86/X86InstrInfo.td
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/lib/Target/X86/X86InstrInfo.td?rev=313054&r1=313053&r2=3130
> 54&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86InstrInfo.td (original)
> +++ llvm/trunk/lib/Target/X86/X86InstrInfo.td Tue Sep 12 10:40:25 2017
> @@ -271,8 +271,6 @@ def X86lock_and : SDNode<"X86ISD::LAND"
> [SDNPHasChain, SDNPMayStore, SDNPMayLoad,
> SDNPMemOperand]>;
>
> -def X86bextr : SDNode<"X86ISD::BEXTR", SDTIntBinOp>;
> -
> def X86mul_imm : SDNode<"X86ISD::MUL_IMM", SDTIntBinOp>;
>
> def X86WinAlloca : SDNode<"X86ISD::WIN_ALLOCA", SDT_X86WIN_ALLOCA, @@ -
> 2439,17 +2437,6 @@ let Predicates = [HasBMI2] in {
> (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GR32:$lz, sub_32bit))>; }
> // HasBMI2
>
> -let Predicates = [HasBMI] in {
> - def : Pat<(X86bextr GR32:$src1, GR32:$src2),
> - (BEXTR32rr GR32:$src1, GR32:$src2)>;
> - def : Pat<(X86bextr (loadi32 addr:$src1), GR32:$src2),
> - (BEXTR32rm addr:$src1, GR32:$src2)>;
> - def : Pat<(X86bextr GR64:$src1, GR64:$src2),
> - (BEXTR64rr GR64:$src1, GR64:$src2)>;
> - def : Pat<(X86bextr (loadi64 addr:$src1), GR64:$src2),
> - (BEXTR64rm addr:$src1, GR64:$src2)>;
> -} // HasBMI
> -
> multiclass bmi_pdep_pext<string mnemonic, RegisterClass RC,
> X86MemOperand x86memop, Intrinsic Int,
> PatFrag ld_frag> { @@ -2651,15 +2638,6 @@ def :
> InstAlias<"clzero\t{%rax|rax}", (C //===-------------------------------------
> ---------------------------------===//
>
> let Predicates = [HasTBM] in {
> - def : Pat<(X86bextr GR32:$src1, (i32 imm:$src2)),
> - (BEXTRI32ri GR32:$src1, imm:$src2)>;
> - def : Pat<(X86bextr (loadi32 addr:$src1), (i32 imm:$src2)),
> - (BEXTRI32mi addr:$src1, imm:$src2)>;
> - def : Pat<(X86bextr GR64:$src1, i64immSExt32:$src2),
> - (BEXTRI64ri GR64:$src1, i64immSExt32:$src2)>;
> - def : Pat<(X86bextr (loadi64 addr:$src1), i64immSExt32:$src2),
> - (BEXTRI64mi addr:$src1, i64immSExt32:$src2)>;
> -
> // FIXME: patterns for the load versions are not implemented
> def : Pat<(and GR32:$src, (add GR32:$src, 1)),
> (BLCFILL32rr GR32:$src)>;
>
> Modified: llvm/trunk/test/CodeGen/X86/bmi.ll
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/test/CodeGen/X86/bmi.ll?rev=313054&r1=313053&r2=313054&view
> =diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/bmi.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/bmi.ll Tue Sep 12 10:40:25 2017
> @@ -311,6 +311,18 @@ define i32 @bextr32b(i32 %x) uwtable s
> ret i32 %2
> }
>
> +; Make sure we still use AH subreg trick to extract 15:8 define i32
> + at bextr32_subreg(i32 %x) uwtable ssp { ; CHECK-LABEL: bextr32_subreg:
> +; CHECK: # BB#0:
> +; CHECK-NEXT: movl %edi, %eax
> +; CHECK-NEXT: movzbl %ah, %eax # NOREX
> +; CHECK-NEXT: retq
> + %1 = lshr i32 %x, 8
> + %2 = and i32 %1, 255
> + ret i32 %2
> +}
> +
> define i32 @bextr32b_load(i32* %x) uwtable ssp { ; CHECK-LABEL:
> bextr32b_load:
> ; CHECK: # BB#0:
> @@ -357,6 +369,18 @@ define i64 @bextr64b(i64 %x) uwtable s
> ret i64 %2
> }
>
> +; Make sure we still use the AH subreg trick to extract 15:8 define i64
> + at bextr64_subreg(i64 %x) uwtable ssp { ; CHECK-LABEL: bextr64_subreg:
> +; CHECK: # BB#0:
> +; CHECK-NEXT: movq %rdi, %rax
> +; CHECK-NEXT: movzbl %ah, %eax # NOREX
> +; CHECK-NEXT: retq
> + %1 = lshr i64 %x, 8
> + %2 = and i64 %1, 255
> + ret i64 %2
> +}
> +
> define i64 @bextr64b_load(i64* %x) {
> ; CHECK-LABEL: bextr64b_load:
> ; CHECK: # BB#0:
>
> Modified: llvm/trunk/test/CodeGen/X86/tbm_patterns.ll
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/test/CodeGen/X86/tbm_patterns.ll?rev=313054&r1=313053&r2=31
> 3054&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/tbm_patterns.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/tbm_patterns.ll Tue Sep 12 10:40:25 2017
> @@ -13,6 +13,18 @@ define i32 @test_x86_tbm_bextri_u32(i32
> ret i32 %t1
> }
>
> +; Make sure we still use AH subreg trick for extracting bits 15:8
> +define i32 @test_x86_tbm_bextri_u32_subreg(i32 %a) nounwind { ;
> +CHECK-LABEL: test_x86_tbm_bextri_u32_subreg:
> +; CHECK: # BB#0:
> +; CHECK-NEXT: movl %edi, %eax
> +; CHECK-NEXT: movzbl %ah, %eax # NOREX
> +; CHECK-NEXT: retq
> + %t0 = lshr i32 %a, 8
> + %t1 = and i32 %t0, 255
> + ret i32 %t1
> +}
> +
> define i32 @test_x86_tbm_bextri_u32_m(i32* nocapture %a) nounwind { ; CHECK-
> LABEL: test_x86_tbm_bextri_u32_m:
> ; CHECK: # BB#0:
> @@ -40,7 +52,8 @@ define i32 @test_x86_tbm_bextri_u32_z(i3 define i32
> @test_x86_tbm_bextri_u32_z2(i32 %a, i32 %b, i32 %c) nounwind { ; CHECK-LABEL:
> test_x86_tbm_bextri_u32_z2:
> ; CHECK: # BB#0:
> -; CHECK-NEXT: bextr $3076, %edi, %eax # imm = 0xC04
> +; CHECK-NEXT: shrl $4, %edi
> +; CHECK-NEXT: testw $4095, %di # imm = 0xFFF
> ; CHECK-NEXT: cmovnel %edx, %esi
> ; CHECK-NEXT: movl %esi, %eax
> ; CHECK-NEXT: retq
> @@ -61,6 +74,18 @@ define i64 @test_x86_tbm_bextri_u64(i64
> ret i64 %t1
> }
>
> +; Make sure we still use AH subreg trick for extracting bits 15:8
> +define i64 @test_x86_tbm_bextri_u64_subreg(i64 %a) nounwind { ;
> +CHECK-LABEL: test_x86_tbm_bextri_u64_subreg:
> +; CHECK: # BB#0:
> +; CHECK-NEXT: movq %rdi, %rax
> +; CHECK-NEXT: movzbl %ah, %eax # NOREX
> +; CHECK-NEXT: retq
> + %t0 = lshr i64 %a, 8
> + %t1 = and i64 %t0, 255
> + ret i64 %t1
> +}
> +
> define i64 @test_x86_tbm_bextri_u64_m(i64* nocapture %a) nounwind { ; CHECK-
> LABEL: test_x86_tbm_bextri_u64_m:
> ; CHECK: # BB#0:
> @@ -88,7 +113,8 @@ define i64 @test_x86_tbm_bextri_u64_z(i6 define i64
> @test_x86_tbm_bextri_u64_z2(i64 %a, i64 %b, i64 %c) nounwind { ; CHECK-LABEL:
> test_x86_tbm_bextri_u64_z2:
> ; CHECK: # BB#0:
> -; CHECK-NEXT: bextr $3076, %edi, %eax # imm = 0xC04
> +; CHECK-NEXT: shrl $4, %edi
> +; CHECK-NEXT: testw $4095, %di # imm = 0xFFF
> ; CHECK-NEXT: cmovneq %rdx, %rsi
> ; CHECK-NEXT: movq %rsi, %rax
> ; CHECK-NEXT: retq
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list