[PATCH] ARM64: Combine shifts and uses to bit-extract instruction from different basic block
Quentin Colombet
qcolombet at apple.com
Mon Apr 21 10:51:04 PDT 2014
Hi Yi,
LGTM.
Please commit directly after you've fixed these typos:
+ // If the type of the truncate is legal, no trucates will be
+ // introduce in other basic blocks.
=>
s/trucates/truncates/
s/introduce/introduced/
Thanks,
-Quentin
On Apr 21, 2014, at 10:45 AM, yijiang <yjiang at apple.com> wrote:
> Hi Quentin,
>
> Thank you for the comments. The new patch:
>
>
> <extractbit-v4-3.patch>
>
> On Apr 21, 2014, at 9:53 AM, Quentin Colombet <qcolombet at apple.com> wrote:
>
>> Hi Yi,
>>
>> This LGTM with some minor nitpicks:
>>
>> #1
>> +// isExtractBitsCandidateUse - Check if the candidates could
>> +// be combined with shift instruction, which includes:
>> +// 1. Truncate instruction
>> +// 2. And instruction and the imm is a mask of the low bits:
>> +// imm & (imm+1) == 0
>> Make sure your comments use the doxygen style, i.e., replace // by ///.
>> At least isExtractBitsCandidateUse and SinkShiftAndTruncate are not using the right format.
>>
>>
>> #2
>> + // If the use is actually a legal node, there will not be a implicit
>> + // truncate.
>> s/a implicit/an implicit/
>>
>>
>> #3
>> + // The type of User is not legal does not necessary introduce
>> + // truncate. We just want to filter most of the cases here for
>> + // compile time sake.
>> =>
>> Something along these lines I guess:
>> If the type of the truncate is legal, this will not introduce truncates in other basic blocks. Thus, do not traverse its users.
>>
>>
>> #4
>> +; OPT: if.end
>> OPT-LABEL instead?
>>
>> +; OPT: if.then17
>> Ditto.
>>
>> -Quentin
>>
>> On Apr 18, 2014, at 5:42 PM, yijiang <yjiang at apple.com> wrote:
>>
>>> Hi Quentin,
>>>
>>> Thank you for your detailed comments! I attached the new patch based on our discussion. Could you help to take a look at it? Thank you!
>>>
>>> <extractbit-v4-2.patch>
>>>
>>> On Apr 16, 2014, at 5:23 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>>>
>>>> Hi Yi,
>>>>
>>>> I hope the format of the review will not be too difficult to digest. For the next iterations you may want to create a phabricator (http://reviews.llvm.org).
>>>>
>>>> I have a couple of concerns that I split into two categories:
>>>> - Thing to fix: Are fundamental things that would produce correctness issue or hard to maintain problems.
>>>> - Nice to have: Are nitpick on the code.
>>>>
>>>> ** Things to Fix **
>>>>
>>>> * Logic *
>>>>
>>>> #1
>>>> + (!TLI.isTypeLegal(TLI.getValueType(User->getType())
>>>> This does not seem a sufficient check to guess whether or not a truncate will be issued at legalization.
>>>> We should instead check if the operation is legal for this type.
>>>> Indeed, i16 may not be a legal type, but for instance store i16 may and no truncate will be issued.
>>>>
>>>> #2
>>>> + TheUse = InsertedShift; /// <———— This does not look right to me.
>>>> + TruncTheUse = InsertedTrunc;
>>>> + }
>>>> Unless I am mistaken, you are transferring the use, in the original truncate, of the original shift into a use of the new shift in the original truncate.
>>>> This use must be left untouched.
>>>>
>>>> You may want to check if the original truncate still have uses and nuke it if it does not, though.
>>>>
>>>> #3
>>>> + assert((Op0->getOpcode() == ISD::TRUNCATE || BiggerPattern ||
>>>> + (Srl_imm > 0 && Srl_imm < VT.getSizeInBits())) &&
>>>>
>>>> Based on the code this patch change, you shouldn’t have to modify this assert. If you do have to modify it, it seems to me that there is something wrong.
>>>> Please double check.
>>>>
>>>> #4
>>>> // we're looking for a shift of a shift
>>>> uint64_t Shl_imm = 0;
>>>> if (isOpcWithIntImmediate(N->getOperand(0).getNode(), ISD::SHL, Shl_imm)) {
>>>> Opd0 = N->getOperand(0).getOperand(0);
>>>> + } else if (VT == MVT::i32 && N->getOpcode() == ISD::SRL &&
>>>> + N->getOperand(0).getNode()->getOpcode() == ISD::TRUNCATE) {
>>>> + // In this case, 64bit UBFM and 32bit UBFM have the same semantics. Our
>>>> + // strategy here is to always generate 64bit UBFM. This consistency will
>>>> + // help the CSE pass later find more redundancy.
>>>> + Opd0 = N->getOperand(0).getOperand(0);
>>>> + DstVT = MVT::i64;
>>>>
>>>> There are several problems here.
>>>> A - This part of the code is looking for SHL, not SRL. The part looking for the SRL is later in the same function:
>>>> if (!isIntImmediate(N->getOperand(1), Srl_imm))
>>>> return false;
>>>> Your logic should be added somewhere here.
>>>> B - You did not update the Srl_imm information. Thus the resulting code may be wrong.
>>>>
>>>> #5
>>>> + DstVT = MVT::i64;
>>>> You should reuse VT here (assuming you moved the related code at the right place) like you did in the AND case.
>>>> More importantly, please use the type from the node (and return false on VT != MVT::i64 or at least assert that MVT::i64), not a hard coded one.
>>>>
>>>> Remove this change:
>>>> - if (VT == MVT::i32)
>>>> + if (DstVT == MVT::i32)
>>>>
>>>> #6
>>>> + // If the bit extract operation is 64bit but the original type is 32bit, we
>>>> + // need to add one EXTRACT_SUBREG.
>>>> + if (Opc == ARM64::SBFMXri || Opc == ARM64::UBFMXri) {
>>>> + if (VT == MVT::i32) {
>>>> + SDNode *BFM = CurDAG->getMachineNode(Opc, SDLoc(N), MVT::i64, Ops);
>>>>
>>>> Technically speaking you should create different Ops when promoting to the i64 type.
>>>> Indeed, current Ops were created with:
>>>> SDValue Ops[] = { Opd0, CurDAG->getTargetConstant(LSB, VT),
>>>> CurDAG->getTargetConstant(MSB, VT) };
>>>> Thus, the VTs do not match for the constants.
>>>>
>>>> #7
>>>> + // If the bit extract operation is 64bit but the original type is 32bit, we
>>>> + // need to add one EXTRACT_SUBREG.
>>>> + if (Opc == ARM64::SBFMXri || Opc == ARM64::UBFMXri) {
>>>> + if (VT == MVT::i32) {
>>>> Sink the second if into the first one. That will reduce the indentation as a bonus :).
>>>>
>>>> #8
>>>> + if (Opc == ARM64::SBFMXri || Opc == ARM64::UBFMXri) {
>>>> + if (VT == MVT::i32) {
>>>> I am not a big fan of the way you detect the fact that the type has been promoted.
>>>> I think it would be cleaner to add (yet) another reference argument to know whether or not we have to do that.
>>>> Ultimately, it looks like we could benefit from a helper class here, but the refactoring is out of the scope of this patch.
>>>>
>>>> * Tests *
>>>>
>>>> #9
>>>> I’d like you add an additional RUN line with opt and check the actual transformation done on the IR.
>>>>
>>>> #10
>>>> +; CHECK-LABEL: LBB22_2:
>>>> Please do not rely on basic blocks labels. That looks fragile.
>>>> +; CHECK-LABEL: Lloh4:
>>>> This is even more true for these that are added by the ARM64CollectLOH pass.
>>>>
>>>>
>>>> * Comments *
>>>>
>>>> #11
>>>> +/// OptimizeExtractBits - sink the given shift instruction into user blocks if
>>>> +/// the uses could potentially be combined with the shift instruction and
>>>> +/// generate BitExtract instruction. It will only be applied if the architecture
>>>> +/// supports BitExtract instruction.
>>>> +///
>>>> +/// Return true if any changes are made.
>>>> You should be more specific here. Like specifying the pattern you are recognizing.
>>>> Indeed the shifts that are supported in this patch are logical and arithmetic right shift.
>>>> In other words, this optimization should help for pattern like this:
>>>> Input:
>>>> XXXXXXPPPXXX
>>>> Output:
>>>> 000000000PPP
>>>> (shift right + and of low bits)
>>>>
>>>> But not for pattern like this:
>>>> Input:
>>>> XXXXXXPPPXXX
>>>> Output:
>>>> 0000PPP00000
>>>> (shift left + and of contiguous bits)
>>>>
>>>> #12
>>>> + if (UserBB == DefBB) {
>>>> + // The type of the use is not legal in this architecture so there is a
>>>> + // implicit truncation.
>>>> + // for example:
>>>> + // BB1:
>>>> + // i64 shift.result = shlr i64 opnd, imm
>>>> + // trunc.result = trunc shift.result to i16
>>>> + //
>>>> + // BB2:
>>>> + // ----> We will have a implicit truncate here if the architecture does
>>>> + // not have i16 register.
>>>> + // use i16 trunc.result
>>>>
>>>> This comment is confusing because:
>>>> A - You mention two basic blocks after a check that both the defBB and the useBB are equal.
>>>> B - The first sentence is missing a ‘if’ I guess.
>>>>
>>>> What about something like (do not copy/paste, this is not a wonderful wording either :)):
>>>> If UseBB and DefBB are equal and use is a truncate, we may still want to sink both the shift and the truncate within the block of the use of the truncate.
>>>> Indeed, if the use of the truncate is not legal it may produce a truncate at legalization.
>>>>
>>>>
>>>> ** Nice to Have **
>>>>
>>>> * lib/CodeGen/CodeGenPrepare.cpp *
>>>>
>>>> #13
>>>> The body of OptimizeExtractBits gets a bit complex, I think some refactoring (essentially helper functions) would help.
>>>>
>>>> + // The candidates that could be combined with shift instruction.
>>>> + // 1. Truncate instruction
>>>> + // 2. And instruction and the imm is a mask of the low bits:
>>>> + // imm & (imm+1) == 0
>>>> + if (!isa<TruncInst>(User)) {
>>>> + if (User->getOpcode() != Instruction::And ||
>>>> + !isa<ConstantInt>(User->getOperand(1)))
>>>> + continue;
>>>> +
>>>> + unsigned Cimm =
>>>> + dyn_cast<ConstantInt>(User->getOperand(1))->getZExtValue();
>>>> +
>>>> + if (Cimm & (Cimm + 1))
>>>> + continue;
>>>> + }
>>>> =>
>>>> move that into a helper function like 'bool isExtractBitsCandidateUse’.
>>>> Then, adapt the loop accordingly.
>>>>
>>>> #14
>>>> Same for this:
>>>> + // The type of the use is not legal in this architecture so there is a
>>>> + // implicit truncation.
>>>> + // for example:
>>>> + // BB1:
>>>> + // i64 shift.result = shlr i64 opnd, imm
>>>> + // trunc.result = trunc shift.result to i16
>>>> + //
>>>> + // BB2:
>>>> + // ----> We will have a implicit truncate here if the architecture does
>>>> + // not have i16 register.
>>>> + // use i16 trunc.result
>>>> + //
>>>> + if (isa<TruncInst>(User) &&
>>>> + (TLI.isTypeLegal(TLI.getValueType(ShiftI->getType()))) &&
>>>> + (!TLI.isTypeLegal(TLI.getValueType(User->getType())))) {
>>>> + TruncInst *TruncI = dyn_cast<TruncInst>(User);
>>>> +
>>>> + for (Value::user_iterator TruncUI = TruncI->user_begin(),
>>>> + TruncE = TruncI->user_end();
>>>> + TruncUI != TruncE;) {
>>>> +
>>>> + Use &TruncTheUse = TruncUI.getUse();
>>>> + Instruction *TruncUser = cast<Instruction>(*TruncUI);
>>>> + // Preincrement use iterator so we don't invalidate it.
>>>> +
>>>> + ++TruncUI;
>>>> +
>>>> + // Don't bother for PHI nodes.
>>>> + if (isa<PHINode>(TruncUser))
>>>> + continue;
>>>> +
>>>> + BasicBlock *TruncUserBB = TruncUser->getParent();
>>>> +
>>>> + if (UserBB == TruncUserBB)
>>>> + continue;
>>>> +
>>>> + BinaryOperator *&InsertedShift = InsertedShifts[TruncUserBB];
>>>> + CastInst *&InsertedTrunc = InsertedTruncs[TruncUserBB];
>>>> +
>>>> + if (!InsertedShift && !InsertedTrunc) {
>>>> + BasicBlock::iterator InsertPt = TruncUserBB->getFirstInsertionPt();
>>>> + // Sink the shift
>>>> + if (ShiftI->getOpcode() == Instruction::AShr)
>>>> + InsertedShift = BinaryOperator::CreateAShr(ShiftI->getOperand(0),
>>>> + CI, "", InsertPt);
>>>> + else
>>>> + InsertedShift = BinaryOperator::CreateLShr(ShiftI->getOperand(0),
>>>> + CI, "", InsertPt);
>>>> +
>>>> + // Sink the trunc
>>>> + BasicBlock::iterator TruncInsertPt =
>>>> + TruncUserBB->getFirstInsertionPt();
>>>> + TruncInsertPt++;
>>>> +
>>>> + InsertedTrunc =
>>>> + CastInst::Create(TruncI->getOpcode(), InsertedShift,
>>>> + TruncI->getType(), "", TruncInsertPt);
>>>> +
>>>> + MadeChange = true;
>>>> +
>>>> + TheUse = InsertedShift;
>>>> + TruncTheUse = InsertedTrunc;
>>>> + }
>>>> + }
>>>> + }
>>>>
>>>> You have a big chunk of code to handle a special case. Move that into a helper function.
>>>>
>>>> * lib/Target/ARM64/ARM64ISelDAGToDAG.cpp *
>>>>
>>>> #15
>>>> + // If the shift result was truncated, we can still combine them.
>>>> + Opd0 = Op0->getOperand(0).getNode()->getOperand(0);
>>>> You could simply use
>>>> Op0->getOperand(0).getOperand(0)
>>>>
>>>> Thanks,
>>>> -Quentin
>>>>
>>>> On Apr 16, 2014, at 9:36 AM, yijiang <yjiang at apple.com> wrote:
>>>>
>>>>> Here is the new patch. More comments are appreciated!
>>>>> <extractbit-v2.patch>
>>>>> On Apr 16, 2014, at 9:29 AM, Yi Jiang <yjiang at apple.com> wrote:
>>>>>
>>>>>> You are right. Thank you.
>>>>>> On Apr 14, 2014, at 6:59 PM, Matt Arsenault <Matthew.Arsenault at amd.com> wrote:
>>>>>>
>>>>>>> On 04/14/2014 05:25 PM, yijiang wrote:
>>>>>>>> + void setHasExtractBitsInsn(bool hasExtractInsn = false) {
>>>>>>>> + HasExtractBitsInsn = hasExtractInsn;
>>>>>>>> + }
>>>>>>> The default argument should be true, since that's what all the others do and since the default value is already false
>>>>>>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140421/617e196d/attachment.html>
More information about the llvm-commits
mailing list