[PATCH] Add basic support for removal of load that are fed by a store of an aggregate

Mehdi Amini mehdi.amini at apple.com
Fri Jan 2 16:11:34 PST 2015


Hi,

So, basically you operate on an aggregate that contains two i8. In one case it is copied using memcpy and the other it is copied using load/store. 
What triggers your “bug” is that SROA does not handle these two copies the same way. 
The memcpy version is turned into two independent i8 stores: 

  %v.sroa.0.0.dst.sroa_idx = getelementptr inbounds { i8, i8 }* %0, i64 0, i32 0
  store i8 1, i8* %v.sroa.0.0.dst.sroa_idx, align 1
  %v.sroa.2.0.dst.sroa_idx = getelementptr inbounds { i8, i8 }* %0, i64 0, i32 1
  store i8 2, i8* %v.sroa.2.0.dst.sroa_idx, align 1

while the load/store version does not change the store:

  %tmp.fca.0.insert = insertvalue { i8, i8 } undef, i8 1, 0
  %tmp.fca.1.insert = insertvalue { i8, i8 } %tmp.fca.0.insert, i8 2, 1
  store { i8, i8 } %tmp.fca.1.insert, { i8, i8 }* %0


Now I am not sure if SROA shouldn’t produce the same result for the two inputs in this case? (i.e. splitting the store to the aggregate in store to the individual element)
If not then teaching GVN about this case is probably correct.

Best,

Mehdi



> On Jan 2, 2015, at 3:49 PM, Björn Steinbrink <bsteinbr at gmail.com> wrote:
> 
> On 2015.01.02 15:43:11 -0800, Chandler Carruth wrote:
>> Ok, how do i reproduce the problem you're seeing? For me:
> 
> Sorry again, after I had sent out the original email that was missing
> the attachment, I had changed the store to a memcpy to see if that
> triggers the correct optimization, and indeed it does, as you can see
> below because I sent the version with the memcpy.
> 
> Third time's a charm, right?
> 
> Björn
>> 
>> % ./bin/opt -S -O2 < test-fca.ll
>> ; ModuleID = '<stdin>'
>> target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"
>> target triple = "x86_64-unknown-linux-gnu"
>> 
>> declare void @foo()
>> 
>> define void @test({ i8, i8 }* nocapture) {
>> brancher.exit:
>>  %v.sroa.0.0.dst.sroa_idx = getelementptr inbounds { i8, i8 }* %0, i64 0,
>> i32 0
>>  store i8 1, i8* %v.sroa.0.0.dst.sroa_idx, align 1
>>  %v.sroa.2.0.dst.sroa_idx = getelementptr inbounds { i8, i8 }* %0, i64 0,
>> i32 1
>>  store i8 2, i8* %v.sroa.2.0.dst.sroa_idx, align 1
>>  tail call void @foo()
>>  ret void
>> }
>> 
>> On Fri, Jan 2, 2015 at 3:40 PM, Björn Steinbrink <bsteinbr at gmail.com> wrote:
>> 
>>> On 2015.01.02 23:29:00 +0100, Björn Steinbrink wrote:
>>>> [Added llvm-commit back to Cc, sorry, removing it wasn't intentional]
>>>> 
>>>> On 2015.01.02 14:05:28 -0800, Chandler Carruth wrote:
>>>>> On Fri, Jan 2, 2015 at 2:02 PM, Björn Steinbrink <bsteinbr at gmail.com>
>>> wrote:
>>>>> 
>>>>>> 2015-01-02 22:36 GMT+01:00 Chandler Carruth <chandlerc at google.com>:
>>>>>>> 
>>>>>>> On Fri, Jan 2, 2015 at 1:27 PM, Björn Steinbrink <
>>> bsteinbr at gmail.com> wrote:
>>>>>>>> 
>>>>>>>> If we have a simple load through a GEP that is fed by a store of
>>> an
>>>>>>>> aggregate, we can use the GEP indices to walk the stored
>>> aggregate and
>>>>>>>> extract the appropriate value to replace the load.
>>>>>>> 
>>>>>>> 
>>>>>>> What's the motivation for this change?
>>>>>> 
>>>>>> The rust compiler hit a case where not having this optimization
>>> caused
>>>>>> a branch not to be removed. The corresponding issue for rustc is
>>>>>> https://github.com/rust-lang/rust/issues/20149
>>>>>> 
>>>>>>> Note that SROA replaces all stores of aggregates with scalar
>>> stores of
>>>>>>> the components specifically so that neither it nor GVN needs to
>>>>>>> cope with aggregate loads or stores.
>>>>>> 
>>>>>> SROA did the opposite thing in this case, replacing individual stores
>>>>>> insertvalues and a store of an FCA. I'm attaching the full output of
>>>>>> opt -print-after-all -O2 -S for the failing test case given in the
>>>>>> rust issue. The first SROA creates the FCA store, and later when the
>>>>>> call to `unwrap` is inlined the existing optimizations can't
>>> eliminate
>>>>>> the branch that comes with it, because of the FCA store.
>>>>> 
>>>>> 
>>>>> SROA doesn't create FCA stores in any cases I'm aware of... Can you
>>> provide
>>>>> a somewhat reduced test case rather than a massive log?
>>>> 
>>>> I got myself confused because it introduced the insertvalue
>>>> instructions. The FCA store was already present and not constructed by
>>>> SROA. I've attach a small test case this time.
>>> 
>>> And... I forgot it again :-/ I'm really sorry, apparently this is not my
>>> day.
>>> 
>>> Björn
>>> 
> <test-fca.ll>_______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits





More information about the llvm-commits mailing list