Another problem with "Recommit r265547, and r265610, r265639, r265657"
Quentin Colombet via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 27 09:57:15 PDT 2016
Hi Mikael,
Thanks for the follow-up.
> On Apr 27, 2016, at 12:39 AM, Mikael Holmén <mikael.holmen at ericsson.com> wrote:
>
> Hi Wei and Quentin,
>
> On 04/26/2016 08:19 PM, Quentin Colombet wrote:
>> Hi Wei and Mikael,
>>
>>> On Apr 26, 2016, at 11:16 AM, Wei Mi <wmi at google.com
>>> <mailto:wmi at google.com>> wrote:
>>>
>>> On Tue, Apr 26, 2016 at 11:01 AM, Wei Mi <wmi at google.com
>>> <mailto:wmi at google.com>> wrote:
>>>> +Quentin and llvm-commits.
>>>>
>>>> On Tue, Apr 26, 2016 at 1:05 AM, Mikael Holmén
>>>> <mikael.holmen at ericsson.com <mailto:mikael.holmen at ericsson.com>> wrote:
>>>>> Hi Wei,
>>>>>
>>>>> On 04/26/2016 02:24 AM, Wei Mi wrote:
>>>>>>
>>>>>> Hi Mikael, after some discussions with Quentin
>>>>>>
>>>>>> (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160425/350786.html),
>>>>>> I think it is necessary for me to look at your testcases more closely,
>>>>>> because in theory the errors shouldn't happen. I must be missing
>>>>>> something.
>>>>>>
>>>>>> Is it possible to send out preprocessed files and the command lines to
>>>>>> reproduce the problems you saw?
>>>>>
>>>>>
>>>>> Well, this happens only when compiling for our out-of-tree target, so
>>>>> unfortunately we can't do that. We need to have our hardware's set of
>>>>> registers and intructions etc for the register allocator to behave
>>>>> exactly
>>>>> the way it does for us. Maybe it's possible to reproduce this on
>>>>> some other
>>>>> target, but from experience it's quite hard to do that with register
>>>>> allocator problems. :/
>>>>>
>>>>> So I've compiled a bugpoint-reduced version of the code again now,
>>>>> with and
>>>>> without your latest patch "Keep dead inst copy from being deleted
>>>>> only when
>>>>> the inst is rematerializable", with debug printouts on.
>>>>>
>>>>> I added one printout in LiveRangeEdit::checkRematerializable
>>>>>
>>>>> + DEBUG(dbgs() << "LiveRangeEdit::checkRematerializable: Checking
>>>>> isTriviallyReMaterializable on: " << *DefMI);
>>>>> if (!TII.isTriviallyReMaterializable(DefMI, aa)) {
>>>>> + DEBUG(dbgs() << "false\n");
>>>>> return false;
>>>>> }
>>>>> + DEBUG(dbgs() << "true\n");
>>>>>
>>>>> and one in LiveRangeEdit::eliminateDeadDef just before your change
>>>>>
>>>>> + DEBUG(dbgs() << "LiveRangeEdit::eliminateDeadDef: Checking
>>>>> isTriviallyReMaterializable on: " << *MI);
>>>>> + if (TII.isTriviallyReMaterializable(MI, AA))
>>>>> + DEBUG(dbgs() << "true\n");
>>>>> + else
>>>>> + DEBUG(dbgs() << "false\n");
>>>>> +
>>>>> if (isOrigDef && DeadRemats &&
>>>>> TII.isTriviallyReMaterializable(MI, AA))
>>>>> {
>>>>> LiveInterval &NewLI = createEmptyIntervalFrom(Dest);
>>>>> VNInfo *VNI = NewLI.getNextValue(Idx, LIS.getVNInfoAllocator());
>>>>>
>>>>> I attached the logs from the two runs.
>>>>>
>>>>> Note that with your patch the code compiles succesfully now, so it
>>>>> doesn't
>>>>> seem to be totally rubbish :)
>>>>>
>>>>> Of course the logs now have some stuff from our backend, some
>>>>> instructions
>>>>> and registers you don't know, but maybe you can figure out what's
>>>>> going on
>>>>> anyway. If I can add some more printouts that you want to be able to
>>>>> understand better what's happening, please don't hesitate to tell me and
>>>>> I'll add and rerun.
>>>>>
>>>>> Thanks for looking into this,
>>>>> Mikael
>>>>>
>>>>
>>>> Hi Mikael,
>>>>
>>>> Thank you very much for providing a crash log. It is very helpful. I
>>>> look at foo.crash.log and find a speciality in foo.red.ll:
>>>> The def to vreg29 at 528r is in an infinite loop. Although the def
>>>> actually has no use, it is not marked as dead and these segments:
>>>> [336B,448B:0)[528r,608B:1)[608B,720B:2) exists only because the dead
>>>> def at 528r.
>>>>
>>>> Before greedy regalloc starts:
>>>> %vreg29 [48r,336B:3)[336B,448B:0)[528r,608B:1)[608B,720B:2)[720B,976B:3)
>>>> 0 at 336B-phi 1 at 528r 2 at 608B-phi 3 at 48r
>>>>
>>>> next step:
>>>> %vreg29 is splitted into %vreg31 and %vreg32.
>>>>
>>>> next step:
>>>> 752B:2 libcall_CRT_ll_div_r %vreg19, %vreg19, %vreg32, %vreg1,
>>>> %a0_32<imp-def>, %a1_32<imp-def>, %CCReg<imp-def,dead>, ...;
>>>> aN32_0_7:%vreg19,%vreg32,%vreg1
>>>> is rematerialized, and %vreg32 at 312r becomes dead.
>>>>
>>>> next step:
>>>> When it deletes 312r %vreg32<def,dead> = COPY %vreg31;
>>>> aN32_0_7:%vreg32,%vreg31, it shrink the live range of vreg31.
>>>> ---------- trace extracted from foo.crash.log -----------
>>>> Shrink: %vreg31 [48r,312r:0)[336B,448B:1)[528r,608B:2)[608B,720B:3)
>>>> 0 at 48r 1 at 336B-phi 2 at 528r 3 at 608B-phi
>>>> live-in at 176B
>>>> Dead PHI at 336B may separate interval
>>>> All defs dead: 528r %vreg31<def,dead> = mv_ar16_ar16_lo16In32 %vreg13,
>>>> pred:0, pred:%noreg, pred:0, %ac0<imp-use>, %ac1<imp-use>;
>>>> aN32_0_7:%vreg31 aNh_0_7:%vreg13
>>>> Dead PHI at 608B may separate interval
>>>> Shrunk: %vreg31 [48r,128B:0)[176B,192r:0)[528r,528d:2) 0 at 48r 1 at x
>>>> 2 at 528r 3 at x
>>>> Split 2 components: %vreg31 [48r,128B:0)[176B,192r:0)[528r,528d:1)
>>>> 0 at 48r 1 at 528r
>>>> Deleting dead def 528r %vreg34<def,dead> = mv_ar16_ar16_lo16In32
>>>> %vreg13, pred:0, pred:%noreg, pred:0, %ac0<imp-use>, %ac1<imp-use>;
>>>> aN32_0_7:%vreg34
>>>> ---------------------------------------------------------------------
>>>> Note that %vreg34 is generated from %vreg31. When 312r is deleted, it
>>>> shrinks the live range and finds 528r is dead at the same time. !!!
>>>> But the def of 528r has no relationship with the use of %vreg31 at
>>>> 312r !!!
>>>>
>>>> 528r %vreg34<def,dead> = mv_ar16_ar16_lo16In32 %vreg13 is not
>>>> triviallyRematerializable, but because it is an OriginalDef, it can
>>>> satisfy the condition to go into DeadRemat. This is how we get a case
>>>> that an instruction in DeadRemat is not triviallyRematerializable.
>>>>
>>>> Thanks,
>>>> Wei.
>>>
>>> Hi Quentin,
>>>
>>> From the analysis of crash report given by Mikael, the error happens
>>> in a very rare case -- dead live interval and dead VNI are created
>>> before regalloc, and then stay there until live range shrinking is
>>> called because of remat for other unrelated uses. In such case,
>>> non-triviallyRematerializable instructions can slip into DeadRemat
>>> set.
>>>
>>> I think adding triviallyRematerializable check can guard the safety
>>> for this rare case for sure, but I don't know if we have a better way
>>> to remove the weird case earlier -- dead live intervals and dead VNI
>>> should be removed before regalloc.
>>
>> I agree, those must be removed before regalloc.
>> Mikaeal, when are they introduced and why do they stick until regalloc?
>
> My input ll-file looks like
>
> @g = external global i32
>
> declare i16 @f1()
>
> declare i16 @f2(i64, i32)
>
> define void @f3(i16 %p1.4.par) {
> %_tmp3 = xor i32 0, 32768
> br i1 undef, label %bb24.preheader, label %bb2
>
> bb24.preheader: ; preds = %0
> %_tmp53 = sext i32 %_tmp3 to i64
> %_tmp56 = sext i16 %p1.4.par to i64
> br label %bb24
>
> bb2: ; preds = %bb13, %0
> %h.5.0 = phi i32 [ %h.5.1, %bb13 ], [ %_tmp3, %0 ]
> br i1 undef, label %bb13, label %bb3
>
> bb3: ; preds = %bb2
> %_tmp12 = tail call i16 @f1()
> %_tmp14 = sext i16 %_tmp12 to i32
> br label %bb13
>
> bb13: ; preds = %bb3, %bb2
> %h.5.1 = phi i32 [ %_tmp14, %bb3 ], [ %h.5.0, %bb2 ]
> %_tmp20.2 = tail call i16 @f1()
> br label %bb2
>
> bb24: ; preds = %bb24, %bb24.preheader
> %_tmp54 = sdiv i64 0, %_tmp53
> %_tmp58 = xor i64 %_tmp54, %_tmp56
> %_tmp59 = tail call i16 @f2(i64 %_tmp58, i32 0)
> store i32 0, i32* @g, align 1
> br label %bb24
> }
>
> Already here there are infinite loops, and e.g. the settings of %h.5.0, %h.5.1 and %_tmp14 are actually rubbish, just used to update eachother, but there aren't any "real" uses of them. This is the result of running bugpoint on the original ll-file to get something managable to look at.
>
> Then to see the problem I run llc with -O0. Note that we are using the Greedy regalloc also on -O0 because the Fast regalloc gives too much spill for us so the generated code won't fit in our limited program memory:
>
> llc -verify-machineinstrs -O0 -regallocfoo.red.ll
>
> If I run -O1 then "Loop Strength Reduction" cleans up the loops and then there is no problem. But on -O0 they stay.
>
> I saw that before "Live Interval Analysis" we have
> 624B %vreg7<def> = COPY %vreg29<kill>; aN32_rN_Pairs:%vreg7,%vreg29
>
> but after "Live Interval Analysis" kill flags seems to be removed for some reason
>
> 624B %vreg7<def> = COPY %vreg29; aN32_rN_Pairs:%vreg7,%vreg29
Interesting, could you investigate further.
>
> Then during "Simple Register Coalescing" this use of %reg29 is removed and this def is now actually dead, but not marked as such:
>
> 528B %vreg29<def> = mv_ar16_ar16_lo16In32 %vreg13, pred:0, pred:%noreg, pred:0, %ac0<imp-use>, %ac1<imp-use>; aN32_0_7:%vreg29 aNh_0_7:%vreg13
>
> And then it looks like this all the way to the register allocator.
>
> So if running on -O0, who should remove these instructions if the regalloc shouldn't be able to handle this kind of input?
Hmm, I guess we would need to teach the regalloc/liveness/coalescer how to deal with that. The configuration you used is uncommon and as such the coverage is poor so I am not surprised we find such bugs.
>
> And does it matter that %vreg29 isn't marked as dead? If so, who should do that marking?
The liveness analysis is supposed to do that IIRC. The dead flags are supposed to be properly set otherwise it is considered a bug.
Cheers,
-Quentin
>
> I attached the full "-print-before-all -print-after-all -debug" printout in case there is anything more interesting in the printouts that I'm not aware of.
>
> Thanks,
> Mikael
>
>>
>> Thanks,
>> -Quentin
>>
>>>
>>> Thanks,
>>> Wei.
>>
> <foo.crash.full.log.bz2>
More information about the llvm-commits
mailing list