[PATCH] ARM64: Combine shifts and uses to bit-extract instruction from different basic block

Mon Apr 21 12:42:10 PDT 2014

committed:r206774.
On Apr 21, 2014, at 10:51 AM, Quentin Colombet <qcolombet at apple.com> wrote:

> Hi Yi,
> 
> LGTM.
> 
> Please commit directly after you've fixed these typos:
> +          // If the type of the truncate is legal, no trucates will be
> +          // introduce in other basic blocks.
> 
> =>
> s/trucates/truncates/
> s/introduce/introduced/
> 
> Thanks,
> -Quentin
> 
> 
> On Apr 21, 2014, at 10:45 AM, yijiang <yjiang at apple.com> wrote:
> 
>> Hi Quentin, 
>> 
>> Thank you for the comments. The new patch: 
>> 
>> 
>> <extractbit-v4-3.patch>
>> 
>> On Apr 21, 2014, at 9:53 AM, Quentin Colombet <qcolombet at apple.com> wrote:
>> 
>>> Hi Yi,
>>> 
>>> This LGTM with some minor nitpicks:
>>> 
>>> #1
>>> +// isExtractBitsCandidateUse - Check if the candidates could
>>> +// be combined with shift instruction, which includes:
>>> +// 1. Truncate instruction
>>> +// 2. And instruction and the imm is a mask of the low bits:
>>> +// imm & (imm+1) == 0
>>> Make sure your comments use the doxygen style, i.e., replace // by ///.
>>> At least isExtractBitsCandidateUse and SinkShiftAndTruncate are not using the right format.
>>> 
>>> 
>>> #2
>>> +    // If the use is actually a legal node, there will not be a implicit
>>> +    // truncate.
>>> s/a implicit/an implicit/
>>> 
>>> 
>>> #3
>>> +          // The type of User is not legal does not necessary introduce
>>> +          // truncate. We just want to filter most of the cases here for
>>> +          // compile time sake.
>>> =>
>>> Something along these lines I guess:
>>> If the type of the truncate is legal, this will not introduce truncates in other basic blocks. Thus, do not traverse its users.
>>> 
>>> 
>>> #4
>>> +; OPT: if.end
>>> OPT-LABEL instead?
>>> 
>>> +; OPT: if.then17
>>> Ditto.
>>> 
>>> -Quentin
>>> 
>>> On Apr 18, 2014, at 5:42 PM, yijiang <yjiang at apple.com> wrote:
>>> 
>>>> Hi Quentin,
>>>> 
>>>> Thank you for your detailed comments! I attached the new patch based on our discussion. Could you help to take a look at it? Thank you!
>>>> 
>>>> <extractbit-v4-2.patch>
>>>> 
>>>> On Apr 16, 2014, at 5:23 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>>>> 
>>>>> Hi Yi,
>>>>> 
>>>>> I hope the format of the review will not be too difficult to digest. For the next iterations you may want to create a phabricator (http://reviews.llvm.org).
>>>>> 
>>>>> I have a couple of concerns that I split into two categories:
>>>>> - Thing to fix: Are fundamental things that would produce correctness issue or hard to maintain problems.
>>>>> - Nice to have: Are nitpick on the code.
>>>>> 
>>>>> ** Things to Fix **
>>>>> 
>>>>> * Logic *
>>>>> 
>>>>> #1
>>>>> +          (!TLI.isTypeLegal(TLI.getValueType(User->getType())
>>>>> This does not seem a sufficient check to guess whether or not a truncate will be issued at legalization.
>>>>> We should instead check if the operation is legal for this type.
>>>>> Indeed, i16 may not be a legal type, but for instance store i16 may and no truncate will be issued.
>>>>> 
>>>>> #2
>>>>> +            TheUse = InsertedShift;     /// <———— This does not look right to me.
>>>>> +            TruncTheUse = InsertedTrunc;
>>>>> +          }
>>>>> Unless I am mistaken, you are transferring the use, in the original truncate, of the original shift into a use of the new shift in the original truncate.
>>>>> This use must be left untouched.
>>>>> 
>>>>> You may want to check if the original truncate still have uses and nuke it if it does not, though.
>>>>> 
>>>>>  #3
>>>>> +  assert((Op0->getOpcode() == ISD::TRUNCATE || BiggerPattern ||
>>>>> +          (Srl_imm > 0 && Srl_imm < VT.getSizeInBits())) &&
>>>>> 
>>>>> Based on the code this patch change, you shouldn’t have to modify this assert. If you do have to modify it, it seems to me that there is something wrong.
>>>>> Please double check.
>>>>> 
>>>>> #4
>>>>>    // we're looking for a shift of a shift
>>>>>    uint64_t Shl_imm = 0;
>>>>>    if (isOpcWithIntImmediate(N->getOperand(0).getNode(), ISD::SHL, Shl_imm)) {
>>>>>      Opd0 = N->getOperand(0).getOperand(0);
>>>>> +  } else if (VT == MVT::i32 && N->getOpcode() == ISD::SRL &&
>>>>> +             N->getOperand(0).getNode()->getOpcode() == ISD::TRUNCATE) {
>>>>> +    // In this case, 64bit UBFM and 32bit UBFM have the same semantics. Our
>>>>> +    // strategy here is to always generate 64bit UBFM. This consistency will
>>>>> +    // help the CSE pass later find more redundancy.
>>>>> +    Opd0 = N->getOperand(0).getOperand(0);
>>>>> +    DstVT = MVT::i64;
>>>>> 
>>>>> There are several problems here.
>>>>> A - This part of the code is looking for SHL, not SRL. The part looking for the SRL is later in the same function:
>>>>>   if (!isIntImmediate(N->getOperand(1), Srl_imm))
>>>>>     return false;
>>>>> Your logic should be added somewhere here.
>>>>> B - You did not update the Srl_imm information. Thus the resulting code may be wrong.
>>>>> 
>>>>> #5
>>>>> +    DstVT = MVT::i64;
>>>>> You should reuse VT here (assuming you moved the related code at the right place) like you did in the AND case.
>>>>> More importantly, please use the type from the node (and return false on VT != MVT::i64 or at least assert that MVT::i64), not a hard coded one.
>>>>> 
>>>>> Remove this change:
>>>>> -  if (VT == MVT::i32)
>>>>> +  if (DstVT == MVT::i32)
>>>>> 
>>>>> #6
>>>>> +  // If the bit extract operation is 64bit but the original type is 32bit, we
>>>>> +  // need to add one EXTRACT_SUBREG.
>>>>> +  if (Opc == ARM64::SBFMXri || Opc == ARM64::UBFMXri) {
>>>>> +    if (VT == MVT::i32) {
>>>>> +      SDNode *BFM = CurDAG->getMachineNode(Opc, SDLoc(N), MVT::i64, Ops);
>>>>> 
>>>>> Technically speaking you should create different Ops when promoting to the i64 type.
>>>>> Indeed, current Ops were created with:
>>>>>   SDValue Ops[] = { Opd0, CurDAG->getTargetConstant(LSB, VT),
>>>>>                     CurDAG->getTargetConstant(MSB, VT) };
>>>>> Thus, the VTs do not match for the constants.
>>>>> 
>>>>> #7
>>>>> +  // If the bit extract operation is 64bit but the original type is 32bit, we
>>>>> +  // need to add one EXTRACT_SUBREG.
>>>>> +  if (Opc == ARM64::SBFMXri || Opc == ARM64::UBFMXri) {
>>>>> +    if (VT == MVT::i32) {
>>>>> Sink the second if into the first one. That will reduce the indentation as a bonus :).
>>>>> 
>>>>> #8
>>>>> +  if (Opc == ARM64::SBFMXri || Opc == ARM64::UBFMXri) {
>>>>> +    if (VT == MVT::i32) {
>>>>> I am not a big fan of the way you detect the fact that the type has been promoted.
>>>>> I think it would be cleaner to add (yet) another reference argument to know whether or not we have to do that.
>>>>> Ultimately, it looks like we could benefit from a helper class here, but the refactoring is out of the scope of this patch.
>>>>> 
>>>>> * Tests *
>>>>> 
>>>>> #9
>>>>> I’d like you add an additional RUN line with opt and check the actual transformation done on the IR.
>>>>> 
>>>>> #10
>>>>> +; CHECK-LABEL: LBB22_2:
>>>>> Please do not rely on basic blocks labels. That looks fragile.
>>>>> +; CHECK-LABEL: Lloh4:
>>>>> This is even more true for these that are added by the ARM64CollectLOH pass.
>>>>>  
>>>>>  
>>>>> * Comments *
>>>>> 
>>>>> #11
>>>>> +/// OptimizeExtractBits - sink the given shift instruction into user blocks if
>>>>> +/// the uses could potentially be combined with the shift instruction and
>>>>> +/// generate BitExtract instruction. It will only be applied if the architecture
>>>>> +/// supports BitExtract instruction.
>>>>> +///
>>>>> +/// Return true if any changes are made.
>>>>> You should be more specific here. Like specifying the pattern you are recognizing.
>>>>> Indeed the shifts that are supported in this patch are logical and arithmetic right shift.
>>>>> In other words, this optimization should help for pattern like this:
>>>>> Input:
>>>>> XXXXXXPPPXXX
>>>>> Output:
>>>>> 000000000PPP
>>>>> (shift right + and of low bits)
>>>>> 
>>>>> But not for pattern like this:
>>>>> Input:
>>>>> XXXXXXPPPXXX
>>>>> Output:
>>>>> 0000PPP00000
>>>>> (shift left + and of contiguous bits)
>>>>> 
>>>>> #12
>>>>> +    if (UserBB == DefBB) {
>>>>> +      // The type of the use is not legal in this architecture so there is a
>>>>> +      // implicit truncation.
>>>>> +      // for example:
>>>>> +      // BB1:
>>>>> +      // i64 shift.result = shlr i64 opnd, imm
>>>>> +      // trunc.result = trunc shift.result to i16
>>>>> +      //
>>>>> +      // BB2:
>>>>> +      //   ----> We will have a implicit truncate here if the architecture does
>>>>> +      //   not have i16 register.
>>>>> +      // use i16 trunc.result
>>>>> 
>>>>> This comment is confusing because:
>>>>> A - You mention two basic blocks after a check that both the defBB and the useBB are equal.
>>>>> B - The first sentence is missing a ‘if’ I guess.
>>>>> 
>>>>> What about something like (do not copy/paste, this is not a wonderful wording either :)):
>>>>> If UseBB and DefBB are equal and use is a truncate, we may still want to sink both the shift and the truncate within the block of the use of the truncate.
>>>>> Indeed, if the use of the truncate is not legal it may produce a truncate at legalization.
>>>>> 
>>>>> 
>>>>> ** Nice to Have **
>>>>> 
>>>>> * lib/CodeGen/CodeGenPrepare.cpp *
>>>>> 
>>>>> #13
>>>>> The body of OptimizeExtractBits gets a bit complex, I think some refactoring (essentially helper functions) would help.
>>>>> 
>>>>> +    // The candidates that could be combined with shift instruction.
>>>>> +    // 1. Truncate instruction
>>>>> +    // 2. And instruction and the imm is a mask of the low bits: 
>>>>> +    // imm & (imm+1) == 0
>>>>> +    if (!isa<TruncInst>(User)) {
>>>>> +      if (User->getOpcode() != Instruction::And ||
>>>>> +          !isa<ConstantInt>(User->getOperand(1)))
>>>>> +        continue;
>>>>> +
>>>>> +      unsigned Cimm =
>>>>> +          dyn_cast<ConstantInt>(User->getOperand(1))->getZExtValue();
>>>>> +
>>>>> +      if (Cimm & (Cimm + 1))
>>>>> +        continue;
>>>>> +    }
>>>>> =>
>>>>> move that into a helper function like 'bool isExtractBitsCandidateUse’.
>>>>> Then, adapt the loop accordingly.
>>>>> 
>>>>> #14
>>>>> Same for this:
>>>>> +      // The type of the use is not legal in this architecture so there is a
>>>>> +      // implicit truncation.
>>>>> +      // for example:
>>>>> +      // BB1:
>>>>> +      // i64 shift.result = shlr i64 opnd, imm
>>>>> +      // trunc.result = trunc shift.result to i16
>>>>> +      //
>>>>> +      // BB2:
>>>>> +      //   ----> We will have a implicit truncate here if the architecture does
>>>>> +      //   not have i16 register.
>>>>> +      // use i16 trunc.result
>>>>> +      //
>>>>> +      if (isa<TruncInst>(User) &&
>>>>> +          (TLI.isTypeLegal(TLI.getValueType(ShiftI->getType()))) &&
>>>>> +          (!TLI.isTypeLegal(TLI.getValueType(User->getType())))) {
>>>>> +        TruncInst *TruncI = dyn_cast<TruncInst>(User);
>>>>> +
>>>>> +        for (Value::user_iterator TruncUI = TruncI->user_begin(),
>>>>> +                                  TruncE = TruncI->user_end();
>>>>> +             TruncUI != TruncE;) {
>>>>> +
>>>>> +          Use &TruncTheUse = TruncUI.getUse();
>>>>> +          Instruction *TruncUser = cast<Instruction>(*TruncUI);
>>>>> +          // Preincrement use iterator so we don't invalidate it.
>>>>> +
>>>>> +          ++TruncUI;
>>>>> +
>>>>> +          // Don't bother for PHI nodes.
>>>>> +          if (isa<PHINode>(TruncUser))
>>>>> +            continue;
>>>>> +
>>>>> +          BasicBlock *TruncUserBB = TruncUser->getParent();
>>>>> +
>>>>> +          if (UserBB == TruncUserBB)
>>>>> +            continue;
>>>>> +
>>>>> +          BinaryOperator *&InsertedShift = InsertedShifts[TruncUserBB];
>>>>> +          CastInst *&InsertedTrunc = InsertedTruncs[TruncUserBB];
>>>>> +
>>>>> +          if (!InsertedShift && !InsertedTrunc) {
>>>>> +            BasicBlock::iterator InsertPt = TruncUserBB->getFirstInsertionPt();
>>>>> +            // Sink the shift
>>>>> +            if (ShiftI->getOpcode() == Instruction::AShr)
>>>>> +              InsertedShift = BinaryOperator::CreateAShr(ShiftI->getOperand(0),
>>>>> +                                                         CI, "", InsertPt);
>>>>> +            else
>>>>> +              InsertedShift = BinaryOperator::CreateLShr(ShiftI->getOperand(0),
>>>>> +                                                         CI, "", InsertPt);
>>>>> +
>>>>> +            // Sink the trunc
>>>>> +            BasicBlock::iterator TruncInsertPt =
>>>>> +                TruncUserBB->getFirstInsertionPt();
>>>>> +            TruncInsertPt++;
>>>>> +
>>>>> +            InsertedTrunc =
>>>>> +                CastInst::Create(TruncI->getOpcode(), InsertedShift,
>>>>> +                                 TruncI->getType(), "", TruncInsertPt);
>>>>> +
>>>>> +            MadeChange = true;
>>>>> +
>>>>> +            TheUse = InsertedShift;
>>>>> +            TruncTheUse = InsertedTrunc;
>>>>> +          }
>>>>> +        }
>>>>> +      }
>>>>> 
>>>>> You have a big chunk of code to handle a special case. Move that into a helper function.
>>>>> 
>>>>> * lib/Target/ARM64/ARM64ISelDAGToDAG.cpp *
>>>>> 
>>>>> #15
>>>>> +    // If the shift result was truncated, we can still combine them.
>>>>> +    Opd0 = Op0->getOperand(0).getNode()->getOperand(0);
>>>>> You could simply use
>>>>> Op0->getOperand(0).getOperand(0)
>>>>> 
>>>>> Thanks,
>>>>> -Quentin
>>>>> 
>>>>> On Apr 16, 2014, at 9:36 AM, yijiang <yjiang at apple.com> wrote:
>>>>> 
>>>>>> Here is the new patch. More comments are appreciated!
>>>>>> <extractbit-v2.patch>
>>>>>> On Apr 16, 2014, at 9:29 AM, Yi Jiang <yjiang at apple.com> wrote:
>>>>>> 
>>>>>>> You are right. Thank you. 
>>>>>>> On Apr 14, 2014, at 6:59 PM, Matt Arsenault <Matthew.Arsenault at amd.com> wrote:
>>>>>>> 
>>>>>>>> On 04/14/2014 05:25 PM, yijiang wrote:
>>>>>>>>> +  void setHasExtractBitsInsn(bool hasExtractInsn = false) {
>>>>>>>>> +    HasExtractBitsInsn = hasExtractInsn;
>>>>>>>>> +  }
>>>>>>>> The default argument should be true, since that's what all the others do and since the default value is already false
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>> 
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>> 
>>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140421/ef09cf79/attachment.html>