[PATCH] D59626: [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs

Mon Apr 8 11:19:08 PDT 2019

bjope added a comment.

In D59626#1458594 <https://reviews.llvm.org/D59626#1458594>, @rampitec wrote:

> Bjorn,
>
> did you enable MachineDCE in RA in some other target? So far it is only enabled in the AMDGPU, so I am curious where do you see these problems?

Well, our OOT target is using the DeadMachineInstructionElim pass in (maybe non-standard according to in-tree target) positions in the backend pipeline.

> 
> 
> In D59626#1458186 <https://reviews.llvm.org/D59626#1458186>, @bjope wrote:
> 
>> I see a couple of problems with the changes to DeadMachineInstructionElim.cpp in this patch:
>>
>> - I get MachineVerifier problems because the LIS update does not take into consideration that the removal of a dead use can introduce disjoint live ranges (resulting in verifier complaining about "Multiple connected components in live interval".
> 
> 
> I am recomputing liveness for all affected uses, how could that be?

Ok, so the test case is from the OOT target. Here is an excerpt of the MIR:

  0B	bb.0:

  16B	  %0:rn = ...
  32B	  %10:an32pairs = ...
  80B	  %4:an32_0_7 = ... %10.hiAcc:an32pairs, ...
  96B	  brr_cond %bb.3, ...
  112B	  brr_uncond %bb.1

  128B	bb.1:

  144B	  brr_cond %bb.2, ...
  160B	  brr_uncond %bb.3

  176B	bb.2:

  192B	  %8:anh_rn = COPY %4.lo16:an32_0_7
  224B	  %7:anh_rn = COPY %10.hiAcc_then_lo16:an32pairs
  256B	  brr_uncond %bb.4

  272B	bb.3:

  288B	  %9:rn = ...
  304B	  %10:an32pairs = ... %9:rn, ...
  352B	  %7:anh_rn = COPY %10.hiAcc_then_lo16:an32pairs
  368B	  %8:anh_rn = COPY %10.hiAcc_then_hi16:an32pairs

  384B	bb.4::

  400B	  nop 0, $noreg, 0, implicit %10.loAcc:an32pairs
  416B	  $a0h = COPY %8:anh_rn
  432B	  $a1h = COPY %7:anh_rn
  448B	  rets implicit $a0h, implicit $a1h

The nop is removed by by DMIE, resulting in

  *** Bad machine code: Multiple connected components in live interval ***
  - function:    f3
  - interval:    %10 [32r,224r:0)[304r,368r:1)  0 at 32r 1 at 304r weight:0.000000e+00

> 
> 
>> - I get MachineVerifier problems because the LIS update does not update live ranges for physical registers. Consider an instruction like this `dead %42:regs = COPY $a1`, if that is removed and it is the last use of the physical register `$a1` then we need to shrink the live range for `$a1` , otherwise the MachineVerifier will complain about "Live segment doesn't end at a valid instruction".
> 
> That is probably right. I avoid deleting dead phys, but not their uses. I think shrinkToUses() will be appropriate here.
> 
>> - I do not think the `!MRI->reg_empty(Reg)` check takes debug info into consideration. Possibly introducing a debug invariance problem.
> 
> I thought about this, so I did not use reg_nodbg_empty() in this instance. I.e. this LI gets recreated.

So it might be that that the call to createAndComputeVirtRegInterval only happens when having debug info. And afaict that might give a different codegen compared to the situation when not having any debug info. I think we usually try to avoid that. Codegen should not be impacted by existence of debug info.

> 
> 
>> - It does not seem like this pass is using the LIS itself. So why do we even bother keeping it update? It is destroyed, and then it is up to the pass manager to detect if it needs to be recalculated by a later pass, right? Or do we know that it will be used by a later pass, and thereby we want to bother about preserving it (even if it isn't for the greater good of the pass itself). If it is like that, then we probably need to add some code comments to highlight that we only do it because some other pass needs it (for example in case that pass is removed in the future).
> 
> The pass does not use it, but should update if it exists. That is true, if I do not claim it is preserved then it will be recalculated, but that is very slow.
>  So far only AMDGPU should be using it with liveness available. Am I missing some other invocation?

Well, in my case it is an OOT target. So I'm can't blame you for missing that.

> 
> 
>> - Is it really true that we now preserve all possible analyses (both existing and future, both upstream and downstream), motivating the change from setPreservesCFG() to setPreservesAll(). I mean, the pass is pretty "destructive" as it removes instructions. You have only added (afaict incomplete) support to preserve the LIS. Notice that I'm not sure exactly what the difference between "PreservesAll" and "PreservesCFG" is, but I suppose it includes more analysis results besides LIS (and SlotIndices).
> 
> It does preserve LIS and SlotIndexes. The rest of the analysis passes (like PDT) are just unaffected by the pass itself. I believe that is more or less the same assumption for all passes which use PreservesAll() that we are considering exiting analysis passes. Otherwise this method is not future proof in general.

Ok.  A little bit scary. But if that is how PreservesAll() works in general I guess it is ok.

> 
> 
>> Maybe we can back out the change in DeadMachineInstructionElim.cpp until the above problems have been sorted out?
> 
> If there is no better way. If you see the problems do you mind to share an IR?

A little bit hard due to that I've only triggered it for the OOT target so far. For now I hacked around it in our downstream clone (although using setPreservesCFG instead gives some churn in some AMDGPU test cases, so it seems like we get different results if LIS is recalculated instead of just being preserved by DeadMachineInstructionElim, I would have expected to get the same result, at least if the preserve worked correctly).

Anyway. I can try to make in-tree reproducers tomorrow.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59626/new/

https://reviews.llvm.org/D59626