[PATCH] Optimize redundant insertvalue instructions

Thu May 8 09:24:35 PDT 2014

On May 8, 2014, at 9:18 AM, Juergen Ributzka <juergen at apple.com> wrote:

> Hi Michael,
> 
> thanks for fixing the tiny nitpicks :-)
> 
> Let me elaborate a little bit more on the one-use removal idea I mentioned off-list.
> %0 = insertvalue { i8, i32 } undef, i8 0, 0
> %1 = insertvalue { i8, i32 } %0, i8 1, 0
> %2 = insertvalue { i8, i32 } %0, i8 2, 0
> 
> Currently you have to check every time that there is only one use when you traverse the def-use chain and
> you need to make sure that it is the same type, because the use might not be the first operand.
> 
> The idea I had in mind was to traverse the use-def chain instead. This way you are guaranteed to have only
> one use and don’t need all that extra checking.

Oops, I obviously meant only one “def” here.

> Furthermore when you hit instruction %0 coming from
> instruction %1, you can optimize as usual by using instruction’s %0 first operand. Now when InstCombine
> does the same for instruction %2, then instruction %0 won’t have any uses anymore and is trivially dead.
> By adding %0 to the worklist again will automatically clean it up for you.
> 
> Cheers,
> Juergen
> 
> 
> On May 8, 2014, at 1:27 AM, Michael Zolotukhin <mzolotukhin at apple.com> wrote:
> 
>> Hi,
>> 
>> Yesterday Juergen gave a couple or remarks, so here is a small add-on for the patch.
>> 
>> Besides some nits, I changed a check from (U->getType() != I.getType()) to (U->getOperand(0) != V). That check should guard us from propagating results in cases like the following (this case was also added to the test):
>> %0 = insertvalue { i16, i32 } undef, i16 %x, 0
>> %1 = insertvalue { i8, {i16, i32} } undef, { i16, i32 } %0, 1
>> Here %0 has only one use and this use is also insertvalue, but that’s not the case we’re trying to catch. Though the original check also dealt with that, I think the new version is more strict and clear.
>> 
>> Is it ok for trunk?
>> 
>> <redundant-insertvalue-followup.patch>
>> 
>> Thanks,
>> Michael
>> 
>> On May 7, 2014, at 5:26 PM, Michael Zolotukhin <mzolotukhin at apple.com> wrote:
>> 
>>> 
>>> On May 7, 2014, at 5:03 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>> 
>>>> ----- Original Message -----
>>>>> From: "Michael Zolotukhin" <mzolotukhin at apple.com>
>>>>> To: "Hal J. Finkel" <hfinkel at anl.gov>
>>>>> Cc: "LLVM Commits" <llvm-commits at cs.uiuc.edu>
>>>>> Sent: Wednesday, May 7, 2014 7:07:35 AM
>>>>> Subject: Re: [PATCH] Optimize redundant insertvalue instructions
>>>>> 
>>>>> 
>>>>> 
>>>>> Hi Hal,
>>>>> 
>>>>> Thank you for the review, I updated the patch accordingly to your
>>>>> remarks.
>>>>> 
>>>>> There is a more general alternative to the cut-off: we could move
>>>>> from uses to defs, maintaining a bitvector, that indicates which
>>>>> fields were written. In this case the complexity would be O(N) and
>>>>> we will also catch partially overlapping accesses as well. I did
>>>>> something like this in my ‘sophisticated’ implementation.
>>>>> 
>>>>> However, I’d prefer the most simple implementation, so I think that
>>>>> cut-off would be the best option here. I chose limit 10 - I don’t
>>>>> expect long chains of insertvalue instructions anyway, and it is
>>>>> enough for the original test.
>>>> 
>>>> Fair enough.
>>>> 
>>>>> 
>>>>> Here is the new patch:
>>>>> 
>>>>> 
>>>> 
>>>> I don't actually see a call to visitInsertValueInst being inserted by the patch; I suppose you're putting it somewhere appropriate or else the test case would fail, right? Assuming that's the case, LGTM.
>>>> 
>>> Thanks.
>>> There is no need to explicitly call visitInsertValueInst. We use InstVisitor interface (there is a call ‘visit’ in InstCombiner::DoOneIteration). The function ‘visit' is a big switch, constructed by macroses from IR/Instruction.def, and once we override visitInsertValueInst, it starts to be called. That’s how I understand this - if I’m mistaken I’d appreciate any corrections here. And yes, the test will fail otherwise.
>>> 
>>> Michael
>>> 
>>>> -Hal
>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Michael
>>>>> 
>>>>> On May 7, 2014, at 3:01 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>>> 
>>>>>> Michael,
>>>>>> 
>>>>>> +/// Here the second instruction inserts values at the same
>>>>>> indices, as the
>>>>>> 
>>>>>> +/// first one, making it redundant.
>>>>>> 
>>>>>> You mean, "making the first one redundant."
>>>>>> 
>>>>>> + Value *V = dyn_cast<Value>(&I);
>>>>>> 
>>>>>> There should be no cast necessary here (and it certainly shouldn't
>>>>>> be a dyn_cast).
>>>>>> 
>>>>>> + V = dyn_cast<Value>(UserInsInst);
>>>>>> 
>>>>>> Same here.
>>>>>> 
>>>>>> + while (V->hasOneUse()) {
>>>>>> 
>>>>>> Should we have a cut-off here? If this function is called on every
>>>>>> InsertValue instruction, then this algorithm will be O(N^2) in the
>>>>>> size of the InsertValue chain.
>>>>>> 
>>>>>> -Hal
>>>>>> 
>>>>>> ----- Original Message -----
>>>>>>> From: "Michael Zolotukhin" <mzolotukhin at apple.com>
>>>>>>> To: "LLVM Commits" <llvm-commits at cs.uiuc.edu>
>>>>>>> Sent: Tuesday, May 6, 2014 9:42:12 AM
>>>>>>> Subject: [PATCH] Optimize redundant insertvalue instructions
>>>>>>> 
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> 
>>>>>>> This patch implements a simple optimization of redundant
>>>>>>> insertvalue
>>>>>>> instructions.
>>>>>>> 
>>>>>>> 
>>>>>>> Here is a small example where it will help:
>>>>>>> 
>>>>>>> %0 = insertvalue { i8, i32 } undef, i8 %x, 0
>>>>>>> %1 = insertvalue { i8, i32 } %0, i8 %y, 0
>>>>>>> 
>>>>>>> 
>>>>>>> In this example, the second instruction inserts values at the same
>>>>>>> indices, as the first one, and consequently, the first instruction
>>>>>>> is redundant.
>>>>>>> This code will be transformed to:
>>>>>>> 
>>>>>>> %0 = insertvalue { i8, i32 } undef, i8 %y, 0
>>>>>>> 
>>>>>>> 
>>>>>>> This optimization catches nothing on specs and test-suite, but the
>>>>>>> example was inspired by a real world application.
>>>>>>> 
>>>>>>> 
>>>>>>> As an experiment, I tried a more sophisticated implementation,
>>>>>>> which
>>>>>>> was capable of dealing with several users and with partially
>>>>>>> overlapping indices, but that also caught nothing on specs (and
>>>>>>> for
>>>>>>> my application the simple version was enough). If there is an
>>>>>>> interest, I could share a patch with this more complicated
>>>>>>> implementation as well.
>>>>>>> 
>>>>>>> 
>>>>>>> Is it ok to commit this patch?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Michael
>>>>>> 
>>>>>> --
>>>>>> Hal Finkel
>>>>>> Assistant Computational Scientist
>>>>>> Leadership Computing Facility
>>>>>> Argonne National Laboratory
>>>>> 
>>>>> 
>>>> 
>>>> -- 
>>>> Hal Finkel
>>>> Assistant Computational Scientist
>>>> Leadership Computing Facility
>>>> Argonne National Laboratory
>>> 
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits