[LLVMdev] RFC: PerfGuide for frontend authors

Philip Reames listmail at philipreames.com
Sun Mar 1 09:27:02 PST 2015





> On Mar 1, 2015, at 7:53 AM, Björn Steinbrink <bsteinbr at gmail.com> wrote:
> 
> On 2015.02.28 18:17:27 -0800, Philip Reames wrote:
>>> On Feb 28, 2015, at 3:01 PM, Björn Steinbrink <bsteinbr at gmail.com> wrote:
>>> 2015-02-28 23:50 GMT+01:00 Philip Reames <listmail at philipreames.com>:
>>>>>> On Feb 28, 2015, at 2:30 PM, Björn Steinbrink <bsteinbr at gmail.com> wrote:
>>>>> I should have clarified that that was a reduced, incomplete example, the
>>>>> actual code looks like this (after optimizations):
>>>>> 
>>>>> define void @_ZN9test_func20hdd8a534ccbedd903paaE(i1 zeroext) unnamed_addr #0 {
>>>>> entry-block:
>>>>>  %x = alloca [100000 x i32], align 4
>>>>>  %1 = bitcast [100000 x i32]* %x to i8*
>>>>>  %arg = alloca [100000 x i32], align 4
>>>>>  call void @llvm.lifetime.start(i64 400000, i8* %1)
>>>>>  call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 400000, i32 4, i1 false)
>>>>>  %2 = bitcast [100000 x i32]* %arg to i8*
>>>>>  call void @llvm.lifetime.start(i64 400000, i8* %2) ; this happens too late
>>>>>  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %1, i64 400000, i32 4, i1 false)
>>>>>  call void asm "", "r,~{dirflag},~{fpsr},~{flags}"([100000 x i32]* %arg) #2, !noalias !0, !srcloc !3
>>>>>  call void @llvm.lifetime.end(i64 400000, i8* %2) #2, !alias.scope !4, !noalias !0
>>>>>  call void @llvm.lifetime.end(i64 400000, i8* %2)
>>>>>  call void @llvm.lifetime.end(i64 400000, i8* %1)
>>>>>  ret void
>>>>> }
>>>>> 
>>>>> If the lifetime start for %arg is moved up, before the memset, the
>>>>> callslot optimization can take place and the %c alloca is eliminated,
>>>>> but with the lifetime starting after the memset, that isn't possible.
>>>> This bit of ir actually seems pretty reasonable given the inline asm.  The only thing I really see is that the memcpy could be a memset.  Are you expecting something else?
>>> 
>>> The only thing that is to be improved is that the memset should
>>> directly write to %arg and %x should be removed because it is dead
>>> then. This happens when there are no lifetime intrinsics or when the
>>> call to lifetime.start is moved before the call to memset. The latter
>>> is what my first mail was about, that it is usually better to have
>>> overlapping lifetimes all start at the same point, instead of starting
>>> them as late as possible.
>> Honestly, this sounds like a clear optimizer bug, not something a frontend should work around.
>> 
>> Can you file a bug with the four sets of ir?  (Both schedules, no intrinsics before and after). This should hopefully be easy to fix.  
> 
> I went ahead and made a fix: http://reviews.llvm.org/D7984
Fyi, I'm pretty sure this is the wrong approach. I need to confirm by running with some test cases tomorrow, but if my initial analysis holds up, the lifetime start may be just a red herring.  The underlying issue appears to be a canonicalization issue with multiple users of an alloca with conflicting types and the placement of the bitcast used by the lifetime instruction.

> 
>> Do you know of other cases like this with the lifetime intrinsics?
> 
> Not offhand, but I didn't check the IR closely for such issues. The
> memcpy thing was found by chance, too.
> 
> Björn




More information about the llvm-dev mailing list