[LLVMdev] Optimizing out redundant alloca involving byval params

Wed Apr 1 17:32:58 PDT 2015

I dug a bit more. It appears the succession -memcpyopt -instcombine can
convert this:

%struct.Str = type { i32, i32, i32, i32, i32, i32 }

define void @_Z4test3Str(%struct.Str* byval align 8 %s) {

entry:

  %agg.tmp = alloca %struct.Str, align 8

  %0 = bitcast %struct.Str* %agg.tmp to i8*

  %1 = bitcast %struct.Str* %s to i8*

  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 24, i32 4, i1
false)

  call void @_Z6e_test3Str(%struct.Str* byval align 8 %agg.tmp)

  ret void

}

Into this:

define void @_Z4test3Str(%struct.Str* byval align 8 %s) {

entry:

  call void @_Z6e_test3Str(%struct.Str* byval align 8 %s)

  ret void

}

Which is great. This isn't however happening with a GEP and load/store -
based IR (so a total of 6 sets of GEP on %s, load, then GEP on %agg.tmp +
store , like the one discussed earlier in this thread).

I see 2 options:

1) convert the pass I'm working on to produce memcpy instead of load/store
successions, which would allow the resulting IR to fit in the canonical
patterns optimized today, or

2) add support (probably to memcpyopt) for converting load/store
successions into memcpy, then let the current optimizations reduce the
resulting IR.

I'm looking for feedback as to which path to take. Are there known
instances of successive load/store that would benefit from being replaced
with memcpy (option 2)?

Thank you,
Mircea.

On Sun, Mar 8, 2015 at 10:02 AM Mircea Trofin <mtrofin at google.com> wrote:

> errata: I am on 3.6 full stop. I *thought* there was a 3.7 available,
> based on the title of http://llvm.org/docs/ ("LLVM 3.7 documentation"). I
> suppose the docs are ahead of the release schedule?
>
> On Sun, Mar 8, 2015 at 9:44 AM Mircea Trofin <mtrofin at google.com> wrote:
>
>> Sorry, that phase is part of the PNaCl toolchain. This would be LLVM 3.6,
>> would your comments still apply?
>>
>> I tried -O3 to no avail. I suppose I'll get llvm 3.7, see if I can
>> optimize the latest snippet there (the one avoiding load/store), and see
>> from there.
>>
>> Thanks!
>>
>> On Fri, Mar 6, 2015 at 12:01 PM Philip Reames <listmail at philipreames.com>
>> wrote:
>>
>>>
>>> On 03/05/2015 06:16 PM, Mircea Trofin wrote:
>>>
>>>  Thanks!
>>>
>>>  Philip, do you mean I should transform the original IR to something
>>> like this?
>>>
>>>
>>> Yes.
>>>
>>> (...which is what -expand-struct-regs can do, when applied to my
>>> original input)
>>>
>>> Sorry, what?  This doesn't appear to be a pass in ToT.  Are you using an
>>> older version of LLVM?  If so, none of my comments will apply.
>>>
>>>
>>> define void @main(%struct* byval %ptr) {
>>>   %val.index = getelementptr %struct* %ptr, i32 0, i32 0
>>>   %val.field = load i32* %val.index
>>>   %val.index1 = getelementptr %struct* %ptr, i32 0, i32 1
>>>   %val.field2 = load i32* %val.index1
>>>   %val.ptr = alloca %struct
>>>   %val.ptr.index = getelementptr %struct* %val.ptr, i32 0, i32 0
>>>   store i32 %val.field, i32* %val.ptr.index
>>>   %val.ptr.index4 = getelementptr %struct* %val.ptr, i32 0, i32 1
>>>   store i32 %val.field2, i32* %val.ptr.index4
>>>   call void @extern_func(%struct* byval %val.ptr)
>>>   ret void
>>> }
>>>
>>>  If so, would you mind pointing me to the phase that would reduce this?
>>> (I'm assuming that's what you meant by "for free" - there's an existing
>>> phase I could use)
>>>
>>> I would expect GVN to get this.  If you can run this through a fully -O3
>>> pass order and get the right result, isolating the pass in question should
>>> be easy.
>>>
>>>
>>>  Thank you.
>>> Mircea.
>>>
>>>
>>> On Thu, Mar 5, 2015 at 4:39 PM Philip Reames <listmail at philipreames.com>
>>> wrote:
>>>
>>>>  Reid is right that this would go in memcpyopt, but... we there's an
>>>> active discussion on the commit list which will solve this through a
>>>> different mechanism.  There's an active desire to avoid teaching GVN and
>>>> related pieces (of which memcpyopt is one) about first class aggregates.
>>>> We don't have enough active users of the feature to justify and maintain
>>>> the complexity.
>>>>
>>>> If you haven't already seen it, this background may help:
>>>> http://llvm.org/docs/Frontend/PerformanceTips.html#avoid-loa
>>>> ds-and-stores-of-large-aggregate-type
>>>>
>>>> The current proposal is to convert such aggregate loads and stores into
>>>> their component pieces.  If that happens, you're example should come "for
>>>> free" provided that the same example works when you break down the FCA into
>>>> it's component pieces.  If it doesn't, please say so.
>>>>
>>>> Philip
>>>>
>>>>
>>>> On 03/05/2015 04:21 PM, Reid Kleckner wrote:
>>>>
>>>> I think lib/Transforms/Scalar/MemCpyOptimizer.cpp might be the right
>>>> place for this, considering that most frontends will use memcpy for that
>>>> copy anyway. It already has some logic for byval args.
>>>>
>>>> On Thu, Mar 5, 2015 at 3:51 PM, Mircea Trofin <mtrofin at google.com>
>>>> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>>  I'm trying to find the pass that would convert from:
>>>>>
>>>>>  define void @main(%struct* byval %ptr) {
>>>>>   %val = load %struct* %ptr
>>>>>   %val.ptr = alloca %struct
>>>>>   store %struct %val, %struct* %val.ptr
>>>>>   call void @extern_func(%struct* byval %val.ptr)
>>>>>   ret void
>>>>> }
>>>>>
>>>>>  to this:
>>>>>  define void @main(%struct* byval %ptr) {
>>>>>   call void @extern_func(%struct* byval %ptr)
>>>>>   ret void
>>>>> }
>>>>>
>>>>>  First, am I missing something - would this be a correct optimization?
>>>>>
>>>>>  Thank you,
>>>>> Mircea.
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>>
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150401/c366aedd/attachment.html>