[PATCH] ARM64: Combine shifts and uses to bit-extract instruction from different basic block
yijiang
yjiang at apple.com
Fri Apr 18 17:42:24 PDT 2014
Hi Quentin,
Thank you for your detailed comments! I attached the new patch based on our discussion. Could you help to take a look at it? Thank you!
On Apr 16, 2014, at 5:23 PM, Quentin Colombet <qcolombet at apple.com> wrote:
> Hi Yi,
>
> I hope the format of the review will not be too difficult to digest. For the next iterations you may want to create a phabricator (http://reviews.llvm.org).
>
> I have a couple of concerns that I split into two categories:
> - Thing to fix: Are fundamental things that would produce correctness issue or hard to maintain problems.
> - Nice to have: Are nitpick on the code.
>
> ** Things to Fix **
>
> * Logic *
>
> #1
> + (!TLI.isTypeLegal(TLI.getValueType(User->getType())
> This does not seem a sufficient check to guess whether or not a truncate will be issued at legalization.
> We should instead check if the operation is legal for this type.
> Indeed, i16 may not be a legal type, but for instance store i16 may and no truncate will be issued.
>
> #2
> + TheUse = InsertedShift; /// <———— This does not look right to me.
> + TruncTheUse = InsertedTrunc;
> + }
> Unless I am mistaken, you are transferring the use, in the original truncate, of the original shift into a use of the new shift in the original truncate.
> This use must be left untouched.
>
> You may want to check if the original truncate still have uses and nuke it if it does not, though.
>
> #3
> + assert((Op0->getOpcode() == ISD::TRUNCATE || BiggerPattern ||
> + (Srl_imm > 0 && Srl_imm < VT.getSizeInBits())) &&
>
> Based on the code this patch change, you shouldn’t have to modify this assert. If you do have to modify it, it seems to me that there is something wrong.
> Please double check.
>
> #4
> // we're looking for a shift of a shift
> uint64_t Shl_imm = 0;
> if (isOpcWithIntImmediate(N->getOperand(0).getNode(), ISD::SHL, Shl_imm)) {
> Opd0 = N->getOperand(0).getOperand(0);
> + } else if (VT == MVT::i32 && N->getOpcode() == ISD::SRL &&
> + N->getOperand(0).getNode()->getOpcode() == ISD::TRUNCATE) {
> + // In this case, 64bit UBFM and 32bit UBFM have the same semantics. Our
> + // strategy here is to always generate 64bit UBFM. This consistency will
> + // help the CSE pass later find more redundancy.
> + Opd0 = N->getOperand(0).getOperand(0);
> + DstVT = MVT::i64;
>
> There are several problems here.
> A - This part of the code is looking for SHL, not SRL. The part looking for the SRL is later in the same function:
> if (!isIntImmediate(N->getOperand(1), Srl_imm))
> return false;
> Your logic should be added somewhere here.
> B - You did not update the Srl_imm information. Thus the resulting code may be wrong.
>
> #5
> + DstVT = MVT::i64;
> You should reuse VT here (assuming you moved the related code at the right place) like you did in the AND case.
> More importantly, please use the type from the node (and return false on VT != MVT::i64 or at least assert that MVT::i64), not a hard coded one.
>
> Remove this change:
> - if (VT == MVT::i32)
> + if (DstVT == MVT::i32)
>
> #6
> + // If the bit extract operation is 64bit but the original type is 32bit, we
> + // need to add one EXTRACT_SUBREG.
> + if (Opc == ARM64::SBFMXri || Opc == ARM64::UBFMXri) {
> + if (VT == MVT::i32) {
> + SDNode *BFM = CurDAG->getMachineNode(Opc, SDLoc(N), MVT::i64, Ops);
>
> Technically speaking you should create different Ops when promoting to the i64 type.
> Indeed, current Ops were created with:
> SDValue Ops[] = { Opd0, CurDAG->getTargetConstant(LSB, VT),
> CurDAG->getTargetConstant(MSB, VT) };
> Thus, the VTs do not match for the constants.
>
> #7
> + // If the bit extract operation is 64bit but the original type is 32bit, we
> + // need to add one EXTRACT_SUBREG.
> + if (Opc == ARM64::SBFMXri || Opc == ARM64::UBFMXri) {
> + if (VT == MVT::i32) {
> Sink the second if into the first one. That will reduce the indentation as a bonus :).
>
> #8
> + if (Opc == ARM64::SBFMXri || Opc == ARM64::UBFMXri) {
> + if (VT == MVT::i32) {
> I am not a big fan of the way you detect the fact that the type has been promoted.
> I think it would be cleaner to add (yet) another reference argument to know whether or not we have to do that.
> Ultimately, it looks like we could benefit from a helper class here, but the refactoring is out of the scope of this patch.
>
> * Tests *
>
> #9
> I’d like you add an additional RUN line with opt and check the actual transformation done on the IR.
>
> #10
> +; CHECK-LABEL: LBB22_2:
> Please do not rely on basic blocks labels. That looks fragile.
> +; CHECK-LABEL: Lloh4:
> This is even more true for these that are added by the ARM64CollectLOH pass.
>
>
> * Comments *
>
> #11
> +/// OptimizeExtractBits - sink the given shift instruction into user blocks if
> +/// the uses could potentially be combined with the shift instruction and
> +/// generate BitExtract instruction. It will only be applied if the architecture
> +/// supports BitExtract instruction.
> +///
> +/// Return true if any changes are made.
> You should be more specific here. Like specifying the pattern you are recognizing.
> Indeed the shifts that are supported in this patch are logical and arithmetic right shift.
> In other words, this optimization should help for pattern like this:
> Input:
> XXXXXXPPPXXX
> Output:
> 000000000PPP
> (shift right + and of low bits)
>
> But not for pattern like this:
> Input:
> XXXXXXPPPXXX
> Output:
> 0000PPP00000
> (shift left + and of contiguous bits)
>
> #12
> + if (UserBB == DefBB) {
> + // The type of the use is not legal in this architecture so there is a
> + // implicit truncation.
> + // for example:
> + // BB1:
> + // i64 shift.result = shlr i64 opnd, imm
> + // trunc.result = trunc shift.result to i16
> + //
> + // BB2:
> + // ----> We will have a implicit truncate here if the architecture does
> + // not have i16 register.
> + // use i16 trunc.result
>
> This comment is confusing because:
> A - You mention two basic blocks after a check that both the defBB and the useBB are equal.
> B - The first sentence is missing a ‘if’ I guess.
>
> What about something like (do not copy/paste, this is not a wonderful wording either :)):
> If UseBB and DefBB are equal and use is a truncate, we may still want to sink both the shift and the truncate within the block of the use of the truncate.
> Indeed, if the use of the truncate is not legal it may produce a truncate at legalization.
>
>
> ** Nice to Have **
>
> * lib/CodeGen/CodeGenPrepare.cpp *
>
> #13
> The body of OptimizeExtractBits gets a bit complex, I think some refactoring (essentially helper functions) would help.
>
> + // The candidates that could be combined with shift instruction.
> + // 1. Truncate instruction
> + // 2. And instruction and the imm is a mask of the low bits:
> + // imm & (imm+1) == 0
> + if (!isa<TruncInst>(User)) {
> + if (User->getOpcode() != Instruction::And ||
> + !isa<ConstantInt>(User->getOperand(1)))
> + continue;
> +
> + unsigned Cimm =
> + dyn_cast<ConstantInt>(User->getOperand(1))->getZExtValue();
> +
> + if (Cimm & (Cimm + 1))
> + continue;
> + }
> =>
> move that into a helper function like 'bool isExtractBitsCandidateUse’.
> Then, adapt the loop accordingly.
>
> #14
> Same for this:
> + // The type of the use is not legal in this architecture so there is a
> + // implicit truncation.
> + // for example:
> + // BB1:
> + // i64 shift.result = shlr i64 opnd, imm
> + // trunc.result = trunc shift.result to i16
> + //
> + // BB2:
> + // ----> We will have a implicit truncate here if the architecture does
> + // not have i16 register.
> + // use i16 trunc.result
> + //
> + if (isa<TruncInst>(User) &&
> + (TLI.isTypeLegal(TLI.getValueType(ShiftI->getType()))) &&
> + (!TLI.isTypeLegal(TLI.getValueType(User->getType())))) {
> + TruncInst *TruncI = dyn_cast<TruncInst>(User);
> +
> + for (Value::user_iterator TruncUI = TruncI->user_begin(),
> + TruncE = TruncI->user_end();
> + TruncUI != TruncE;) {
> +
> + Use &TruncTheUse = TruncUI.getUse();
> + Instruction *TruncUser = cast<Instruction>(*TruncUI);
> + // Preincrement use iterator so we don't invalidate it.
> +
> + ++TruncUI;
> +
> + // Don't bother for PHI nodes.
> + if (isa<PHINode>(TruncUser))
> + continue;
> +
> + BasicBlock *TruncUserBB = TruncUser->getParent();
> +
> + if (UserBB == TruncUserBB)
> + continue;
> +
> + BinaryOperator *&InsertedShift = InsertedShifts[TruncUserBB];
> + CastInst *&InsertedTrunc = InsertedTruncs[TruncUserBB];
> +
> + if (!InsertedShift && !InsertedTrunc) {
> + BasicBlock::iterator InsertPt = TruncUserBB->getFirstInsertionPt();
> + // Sink the shift
> + if (ShiftI->getOpcode() == Instruction::AShr)
> + InsertedShift = BinaryOperator::CreateAShr(ShiftI->getOperand(0),
> + CI, "", InsertPt);
> + else
> + InsertedShift = BinaryOperator::CreateLShr(ShiftI->getOperand(0),
> + CI, "", InsertPt);
> +
> + // Sink the trunc
> + BasicBlock::iterator TruncInsertPt =
> + TruncUserBB->getFirstInsertionPt();
> + TruncInsertPt++;
> +
> + InsertedTrunc =
> + CastInst::Create(TruncI->getOpcode(), InsertedShift,
> + TruncI->getType(), "", TruncInsertPt);
> +
> + MadeChange = true;
> +
> + TheUse = InsertedShift;
> + TruncTheUse = InsertedTrunc;
> + }
> + }
> + }
>
> You have a big chunk of code to handle a special case. Move that into a helper function.
>
> * lib/Target/ARM64/ARM64ISelDAGToDAG.cpp *
>
> #15
> + // If the shift result was truncated, we can still combine them.
> + Opd0 = Op0->getOperand(0).getNode()->getOperand(0);
> You could simply use
> Op0->getOperand(0).getOperand(0)
>
> Thanks,
> -Quentin
>
> On Apr 16, 2014, at 9:36 AM, yijiang <yjiang at apple.com> wrote:
>
>> Here is the new patch. More comments are appreciated!
>> <extractbit-v2.patch>
>> On Apr 16, 2014, at 9:29 AM, Yi Jiang <yjiang at apple.com> wrote:
>>
>>> You are right. Thank you.
>>> On Apr 14, 2014, at 6:59 PM, Matt Arsenault <Matthew.Arsenault at amd.com> wrote:
>>>
>>>> On 04/14/2014 05:25 PM, yijiang wrote:
>>>>> + void setHasExtractBitsInsn(bool hasExtractInsn = false) {
>>>>> + HasExtractBitsInsn = hasExtractInsn;
>>>>> + }
>>>> The default argument should be true, since that's what all the others do and since the default value is already false
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140418/082ba9d4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: extractbit-v4-2.patch
Type: application/octet-stream
Size: 18587 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140418/082ba9d4/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140418/082ba9d4/attachment-0001.html>
More information about the llvm-commits
mailing list