[PATCH] Fix a performance problem in gep(gep ...) merging

Wed Apr 15 13:16:07 PDT 2015

Great, Thanks Mark! I will do the performance for it on x86.

On Wed, Apr 15, 2015 at 12:52 PM, Mark Heffernan <meheff at google.com> wrote:
> On Wed, Apr 15, 2015 at 12:16 PM, Xinliang David Li <xinliangli at gmail.com>
> wrote:
>>
>> Mark, sounds great. Can you send the patch to Wei for testing ?
>
>
> Patch is attached.  Not yet ready for review though ;-)
>
> For the attached patch I added it to the x86 target.  Currently I don't plan
> to do that when I check in the patch.  It'll just be added to those targets
> which use the existing pass which this transformation is added to.  It can
> be extended to other targets later.  The patch breaks a few x86 tests maybe
> because the codegen pipeline is now different (also has CSE and LICM) or
> maybe because there is a bug in my code.
>
> Mark
>>
>>
>> David
>>
>> On Wed, Apr 15, 2015 at 11:34 AM, Mark Heffernan <meheff at google.com>
>> wrote:
>>>
>>> There was a discussion on the mailing list about the exact issue this
>>> patch is addressing.  We (Google team working on NVPTX backend) ran into
>>> this same issue as did someone (Francois Pichet) working with the ppc
>>> backend.  Here's the link to the discussion:
>>>
>>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-March/083398.html
>>>
>>> I initially proposed what your patch does.  That is, don't merge geps
>>> unless the indices are constant (in which case the add is folded so no
>>> additional instructions are potentially created in loops).  However, after
>>> some discussion we came to the conclusion that the gep merging is a good
>>> canonicalization which we want to do.  Instead a better approach might be to
>>> split GEPs (undoing the early GEP merge) very late, right before codegen, if
>>> it is determined to be profitable.  This way, all the intermediate passes
>>> get the advantage of the canonicalization, while still being able to do
>>> LICM/CSE on GEP subexpressions which the gep merge had prevented.
>>>
>>> I have a patch I am in the last stages of testing which will perform this
>>> optimization in the pass SeparateConstOffsetFromGEP (to be renamed
>>> ReassociateGEPs... or maybe SplitGEPs now that I think about it).  I ran my
>>> patch with your example from bug 23163 and it looks to perform better than
>>> the prevent gep merging approach does.  With head compiler I get 29
>>> instructions in the inner loop (this matches what you report in the bug).
>>> You mention disabling merging reduces the loop instructions down to 25
>>> instructions.  With the patch I am working on, it gets reduced down to 16
>>> instructions.
>>>
>>> The pass is currently enabled only for select backends: nvptx, arm (-O3),
>>> and ppc (-O3).   For testing with your bug I added it in x86 addIRPasses()
>>> followed by CSE and LICM (these extra passes do nothing on your example
>>> without the GEP splitting pass).
>>>
>>> I should have the patch out for review later in the week.
>>>
>>> Sorry for the duplicate effort here.  I should have filed a bug on this
>>> issue before diving in :-(
>>>
>>> Mark
>>>
>>>
>>> REPOSITORY
>>>   rL LLVM
>>>
>>> http://reviews.llvm.org/D8911
>>>
>>> EMAIL PREFERENCES
>>>   http://reviews.llvm.org/settings/panel/emailpreferences/
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>