[LLVMdev] Optimizing out redundant alloca involving byval params
Mircea Trofin
mtrofin at google.com
Wed Apr 1 17:32:58 PDT 2015
I dug a bit more. It appears the succession -memcpyopt -instcombine can
convert this:
%struct.Str = type { i32, i32, i32, i32, i32, i32 }
define void @_Z4test3Str(%struct.Str* byval align 8 %s) {
entry:
%agg.tmp = alloca %struct.Str, align 8
%0 = bitcast %struct.Str* %agg.tmp to i8*
%1 = bitcast %struct.Str* %s to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 24, i32 4, i1
false)
call void @_Z6e_test3Str(%struct.Str* byval align 8 %agg.tmp)
ret void
}
Into this:
define void @_Z4test3Str(%struct.Str* byval align 8 %s) {
entry:
call void @_Z6e_test3Str(%struct.Str* byval align 8 %s)
ret void
}
Which is great. This isn't however happening with a GEP and load/store -
based IR (so a total of 6 sets of GEP on %s, load, then GEP on %agg.tmp +
store , like the one discussed earlier in this thread).
I see 2 options:
1) convert the pass I'm working on to produce memcpy instead of load/store
successions, which would allow the resulting IR to fit in the canonical
patterns optimized today, or
2) add support (probably to memcpyopt) for converting load/store
successions into memcpy, then let the current optimizations reduce the
resulting IR.
I'm looking for feedback as to which path to take. Are there known
instances of successive load/store that would benefit from being replaced
with memcpy (option 2)?
Thank you,
Mircea.
On Sun, Mar 8, 2015 at 10:02 AM Mircea Trofin <mtrofin at google.com> wrote:
> errata: I am on 3.6 full stop. I *thought* there was a 3.7 available,
> based on the title of http://llvm.org/docs/ ("LLVM 3.7 documentation"). I
> suppose the docs are ahead of the release schedule?
>
> On Sun, Mar 8, 2015 at 9:44 AM Mircea Trofin <mtrofin at google.com> wrote:
>
>> Sorry, that phase is part of the PNaCl toolchain. This would be LLVM 3.6,
>> would your comments still apply?
>>
>> I tried -O3 to no avail. I suppose I'll get llvm 3.7, see if I can
>> optimize the latest snippet there (the one avoiding load/store), and see
>> from there.
>>
>> Thanks!
>>
>> On Fri, Mar 6, 2015 at 12:01 PM Philip Reames <listmail at philipreames.com>
>> wrote:
>>
>>>
>>> On 03/05/2015 06:16 PM, Mircea Trofin wrote:
>>>
>>> Thanks!
>>>
>>> Philip, do you mean I should transform the original IR to something
>>> like this?
>>>
>>>
>>> Yes.
>>>
>>> (...which is what -expand-struct-regs can do, when applied to my
>>> original input)
>>>
>>> Sorry, what? This doesn't appear to be a pass in ToT. Are you using an
>>> older version of LLVM? If so, none of my comments will apply.
>>>
>>>
>>> define void @main(%struct* byval %ptr) {
>>> %val.index = getelementptr %struct* %ptr, i32 0, i32 0
>>> %val.field = load i32* %val.index
>>> %val.index1 = getelementptr %struct* %ptr, i32 0, i32 1
>>> %val.field2 = load i32* %val.index1
>>> %val.ptr = alloca %struct
>>> %val.ptr.index = getelementptr %struct* %val.ptr, i32 0, i32 0
>>> store i32 %val.field, i32* %val.ptr.index
>>> %val.ptr.index4 = getelementptr %struct* %val.ptr, i32 0, i32 1
>>> store i32 %val.field2, i32* %val.ptr.index4
>>> call void @extern_func(%struct* byval %val.ptr)
>>> ret void
>>> }
>>>
>>> If so, would you mind pointing me to the phase that would reduce this?
>>> (I'm assuming that's what you meant by "for free" - there's an existing
>>> phase I could use)
>>>
>>> I would expect GVN to get this. If you can run this through a fully -O3
>>> pass order and get the right result, isolating the pass in question should
>>> be easy.
>>>
>>>
>>> Thank you.
>>> Mircea.
>>>
>>>
>>> On Thu, Mar 5, 2015 at 4:39 PM Philip Reames <listmail at philipreames.com>
>>> wrote:
>>>
>>>> Reid is right that this would go in memcpyopt, but... we there's an
>>>> active discussion on the commit list which will solve this through a
>>>> different mechanism. There's an active desire to avoid teaching GVN and
>>>> related pieces (of which memcpyopt is one) about first class aggregates.
>>>> We don't have enough active users of the feature to justify and maintain
>>>> the complexity.
>>>>
>>>> If you haven't already seen it, this background may help:
>>>> http://llvm.org/docs/Frontend/PerformanceTips.html#avoid-loa
>>>> ds-and-stores-of-large-aggregate-type
>>>>
>>>> The current proposal is to convert such aggregate loads and stores into
>>>> their component pieces. If that happens, you're example should come "for
>>>> free" provided that the same example works when you break down the FCA into
>>>> it's component pieces. If it doesn't, please say so.
>>>>
>>>> Philip
>>>>
>>>>
>>>> On 03/05/2015 04:21 PM, Reid Kleckner wrote:
>>>>
>>>> I think lib/Transforms/Scalar/MemCpyOptimizer.cpp might be the right
>>>> place for this, considering that most frontends will use memcpy for that
>>>> copy anyway. It already has some logic for byval args.
>>>>
>>>> On Thu, Mar 5, 2015 at 3:51 PM, Mircea Trofin <mtrofin at google.com>
>>>> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> I'm trying to find the pass that would convert from:
>>>>>
>>>>> define void @main(%struct* byval %ptr) {
>>>>> %val = load %struct* %ptr
>>>>> %val.ptr = alloca %struct
>>>>> store %struct %val, %struct* %val.ptr
>>>>> call void @extern_func(%struct* byval %val.ptr)
>>>>> ret void
>>>>> }
>>>>>
>>>>> to this:
>>>>> define void @main(%struct* byval %ptr) {
>>>>> call void @extern_func(%struct* byval %ptr)
>>>>> ret void
>>>>> }
>>>>>
>>>>> First, am I missing something - would this be a correct optimization?
>>>>>
>>>>> Thank you,
>>>>> Mircea.
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>>
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150401/c366aedd/attachment.html>
More information about the llvm-dev
mailing list