Another problem with "Recommit r265547, and r265610,r265639,r265657"

Mikael Holmén via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 27 00:39:29 PDT 2016


Hi Wei and Quentin,

On 04/26/2016 08:19 PM, Quentin Colombet wrote:
> Hi Wei and Mikael,
>
>> On Apr 26, 2016, at 11:16 AM, Wei Mi <wmi at google.com
>> <mailto:wmi at google.com>> wrote:
>>
>> On Tue, Apr 26, 2016 at 11:01 AM, Wei Mi <wmi at google.com
>> <mailto:wmi at google.com>> wrote:
>>> +Quentin and llvm-commits.
>>>
>>> On Tue, Apr 26, 2016 at 1:05 AM, Mikael Holmén
>>> <mikael.holmen at ericsson.com <mailto:mikael.holmen at ericsson.com>> wrote:
>>>> Hi Wei,
>>>>
>>>> On 04/26/2016 02:24 AM, Wei Mi wrote:
>>>>>
>>>>> Hi Mikael, after some discussions with Quentin
>>>>>
>>>>> (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160425/350786.html),
>>>>> I think it is necessary for me to look at your testcases more closely,
>>>>> because in theory the errors shouldn't happen. I must be missing
>>>>> something.
>>>>>
>>>>> Is it possible to send out preprocessed files and the command lines to
>>>>> reproduce the problems you saw?
>>>>
>>>>
>>>> Well, this happens only when compiling for our out-of-tree target, so
>>>> unfortunately we can't do that. We need to have our hardware's set of
>>>> registers and intructions etc for the register allocator to behave
>>>> exactly
>>>> the way it does for us. Maybe it's possible to reproduce this on
>>>> some other
>>>> target, but from experience it's quite hard to do that with register
>>>> allocator problems. :/
>>>>
>>>> So I've compiled a bugpoint-reduced version of the code again now,
>>>> with and
>>>> without your latest patch "Keep dead inst copy from being deleted
>>>> only when
>>>> the inst is rematerializable", with debug printouts on.
>>>>
>>>> I added one printout in LiveRangeEdit::checkRematerializable
>>>>
>>>> +  DEBUG(dbgs() << "LiveRangeEdit::checkRematerializable: Checking
>>>> isTriviallyReMaterializable on: " << *DefMI);
>>>>   if (!TII.isTriviallyReMaterializable(DefMI, aa)) {
>>>> +     DEBUG(dbgs() << "false\n");
>>>>     return false;
>>>>   }
>>>> +   DEBUG(dbgs() << "true\n");
>>>>
>>>> and one in LiveRangeEdit::eliminateDeadDef just before your change
>>>>
>>>> +    DEBUG(dbgs() << "LiveRangeEdit::eliminateDeadDef: Checking
>>>> isTriviallyReMaterializable on: " << *MI);
>>>> +    if (TII.isTriviallyReMaterializable(MI, AA))
>>>> +      DEBUG(dbgs() << "true\n");
>>>> +    else
>>>> +      DEBUG(dbgs() << "false\n");
>>>> +
>>>>    if (isOrigDef && DeadRemats &&
>>>> TII.isTriviallyReMaterializable(MI, AA))
>>>> {
>>>>       LiveInterval &NewLI = createEmptyIntervalFrom(Dest);
>>>>       VNInfo *VNI = NewLI.getNextValue(Idx, LIS.getVNInfoAllocator());
>>>>
>>>> I attached the logs from the two runs.
>>>>
>>>> Note that with your patch the code compiles succesfully now, so it
>>>> doesn't
>>>> seem to be totally rubbish :)
>>>>
>>>> Of course the logs now have some stuff from our backend, some
>>>> instructions
>>>> and registers you don't know, but maybe you can figure out what's
>>>> going on
>>>> anyway. If I can add some more printouts that you want to be able to
>>>> understand better what's happening, please don't hesitate to tell me and
>>>> I'll add and rerun.
>>>>
>>>> Thanks for looking into this,
>>>> Mikael
>>>>
>>>
>>> Hi Mikael,
>>>
>>> Thank you very much for providing a crash log. It is very helpful. I
>>> look at foo.crash.log and find a speciality in foo.red.ll:
>>> The def to vreg29 at 528r is in an infinite loop. Although the def
>>> actually has no use, it is not marked as dead and these segments:
>>> [336B,448B:0)[528r,608B:1)[608B,720B:2) exists only because the dead
>>> def at 528r.
>>>
>>> Before greedy regalloc starts:
>>> %vreg29 [48r,336B:3)[336B,448B:0)[528r,608B:1)[608B,720B:2)[720B,976B:3)
>>> 0 at 336B-phi 1 at 528r 2 at 608B-phi 3 at 48r
>>>
>>> next step:
>>> %vreg29 is splitted into %vreg31 and %vreg32.
>>>
>>> next step:
>>> 752B:2 libcall_CRT_ll_div_r %vreg19, %vreg19, %vreg32, %vreg1,
>>> %a0_32<imp-def>, %a1_32<imp-def>, %CCReg<imp-def,dead>, ...;
>>> aN32_0_7:%vreg19,%vreg32,%vreg1
>>> is rematerialized, and %vreg32 at 312r becomes dead.
>>>
>>> next step:
>>> When it deletes 312r %vreg32<def,dead> = COPY %vreg31;
>>> aN32_0_7:%vreg32,%vreg31, it shrink the live range of vreg31.
>>> ---------- trace extracted from foo.crash.log -----------
>>> Shrink: %vreg31 [48r,312r:0)[336B,448B:1)[528r,608B:2)[608B,720B:3)
>>> 0 at 48r 1 at 336B-phi 2 at 528r 3 at 608B-phi
>>> live-in at 176B
>>> Dead PHI at 336B may separate interval
>>> All defs dead: 528r %vreg31<def,dead> = mv_ar16_ar16_lo16In32 %vreg13,
>>> pred:0, pred:%noreg, pred:0, %ac0<imp-use>, %ac1<imp-use>;
>>> aN32_0_7:%vreg31 aNh_0_7:%vreg13
>>> Dead PHI at 608B may separate interval
>>> Shrunk: %vreg31 [48r,128B:0)[176B,192r:0)[528r,528d:2)  0 at 48r 1 at x
>>> 2 at 528r 3 at x
>>>  Split 2 components: %vreg31 [48r,128B:0)[176B,192r:0)[528r,528d:1)
>>> 0 at 48r 1 at 528r
>>> Deleting dead def 528r %vreg34<def,dead> = mv_ar16_ar16_lo16In32
>>> %vreg13, pred:0, pred:%noreg, pred:0, %ac0<imp-use>, %ac1<imp-use>;
>>> aN32_0_7:%vreg34
>>> ---------------------------------------------------------------------
>>> Note that %vreg34 is generated from %vreg31. When 312r is deleted, it
>>> shrinks the live range and finds 528r is dead at the same time. !!!
>>> But the def of 528r has no relationship with the use of %vreg31 at
>>> 312r !!!
>>>
>>> 528r %vreg34<def,dead> = mv_ar16_ar16_lo16In32 %vreg13  is not
>>> triviallyRematerializable, but because it is an OriginalDef, it can
>>> satisfy the condition to go into DeadRemat. This is how we get a case
>>> that an instruction in DeadRemat is not triviallyRematerializable.
>>>
>>> Thanks,
>>> Wei.
>>
>> Hi Quentin,
>>
>> From the analysis of crash report given by Mikael, the error happens
>> in a very rare case -- dead live interval and dead VNI are created
>> before regalloc, and then stay there until live range shrinking is
>> called because of remat for other unrelated uses. In such case,
>> non-triviallyRematerializable instructions can slip into DeadRemat
>> set.
>>
>> I think adding triviallyRematerializable check can guard the safety
>> for this rare case for sure, but I don't know if we have a better way
>> to remove the weird case earlier -- dead live intervals and dead VNI
>> should be removed before regalloc.
>
> I agree, those must be removed before regalloc.
> Mikaeal, when are they introduced and why do they stick until regalloc?

My input ll-file looks like

@g = external global i32

declare i16 @f1()

declare i16 @f2(i64, i32)

define void @f3(i16 %p1.4.par) {
   %_tmp3 = xor i32 0, 32768
   br i1 undef, label %bb24.preheader, label %bb2

bb24.preheader:                                   ; preds = %0
   %_tmp53 = sext i32 %_tmp3 to i64
   %_tmp56 = sext i16 %p1.4.par to i64
   br label %bb24

bb2:                                              ; preds = %bb13, %0
   %h.5.0 = phi i32 [ %h.5.1, %bb13 ], [ %_tmp3, %0 ]
   br i1 undef, label %bb13, label %bb3

bb3:                                              ; preds = %bb2
   %_tmp12 = tail call i16 @f1()
   %_tmp14 = sext i16 %_tmp12 to i32
   br label %bb13

bb13:                                             ; preds = %bb3, %bb2
   %h.5.1 = phi i32 [ %_tmp14, %bb3 ], [ %h.5.0, %bb2 ]
   %_tmp20.2 = tail call i16 @f1()
   br label %bb2

bb24:                                             ; preds = %bb24, 
%bb24.preheader
   %_tmp54 = sdiv i64 0, %_tmp53
   %_tmp58 = xor i64 %_tmp54, %_tmp56
   %_tmp59 = tail call i16 @f2(i64 %_tmp58, i32 0)
   store i32 0, i32* @g, align 1
   br label %bb24
}

Already here there are infinite loops, and e.g. the settings of %h.5.0, 
%h.5.1 and %_tmp14 are actually rubbish, just used to update eachother, 
but there aren't any "real" uses of them. This is the result of running 
bugpoint on the original ll-file to get something managable to look at.

Then to see the problem I run llc with -O0. Note that we are using the 
Greedy regalloc also on -O0 because the Fast regalloc gives too much 
spill for us so the generated code won't fit in our limited program memory:

llc -verify-machineinstrs -O0 -regallocfoo.red.ll

If I run -O1 then "Loop Strength Reduction" cleans up the loops and then 
there is no problem. But on -O0 they stay.

I saw that before "Live Interval Analysis" we have
624B		%vreg7<def> = COPY %vreg29<kill>; aN32_rN_Pairs:%vreg7,%vreg29

but after "Live Interval Analysis" kill flags seems to be removed for 
some reason

624B		%vreg7<def> = COPY %vreg29; aN32_rN_Pairs:%vreg7,%vreg29

Then during "Simple Register Coalescing" this use of %reg29 is removed 
and this def is now actually dead, but not marked as such:

528B		%vreg29<def> = mv_ar16_ar16_lo16In32 %vreg13, pred:0, pred:%noreg, 
pred:0, %ac0<imp-use>, %ac1<imp-use>; aN32_0_7:%vreg29 aNh_0_7:%vreg13

And then it looks like this all the way to the register allocator.

So if running on -O0, who should remove these instructions if the 
regalloc shouldn't be able to handle this kind of input?

And does it matter that %vreg29 isn't marked as dead? If so, who should 
do that marking?

I attached the full "-print-before-all -print-after-all -debug" printout 
in case there is anything more interesting in the printouts that I'm not 
aware of.

Thanks,
Mikael

>
> Thanks,
> -Quentin
>
>>
>> Thanks,
>> Wei.
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: foo.crash.full.log.bz2
Type: application/x-bzip
Size: 17643 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160427/d2b8589b/attachment.bin>


More information about the llvm-commits mailing list