[PATCH] D107677: Prevent machine licm if remattable with a vreg use

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 10 11:20:47 PDT 2021


rampitec added a comment.

In D107677#2936366 <https://reviews.llvm.org/D107677#2936366>, @dmgreen wrote:

> This does effect the ARM backend, apparently at some point it obtained the ability to hoist VCTP instructions which take a register use. I'm not sure if that really fits the definition of trivially rematerializable though, from the code comment on isTriviallyReMaterializable. (But I'm not sure that comment is up to date.)

I do not see a failing test though, but given this code yes, it should hoist it now because it also does not check uses:

  bool ARMBaseInstrInfo::isReallyTriviallyReMaterializable(const MachineInstr &MI,
                                                           AAResults *AA) const {
    // Try hard to rematerialize any VCTPs because if we spill P0, it will block
    // the tail predication conversion. This means that the element count
    // register has to be live for longer, but that has to be better than
    // spill/restore and VPT predication.
    return isVCTP(&MI) && !isPredicated(MI);
  }



> Can you explain more why we don't want to hoist them out of loops?

This code in the `MachineLICMBase::IsProfitableToHoist()` assumes a trivially rematerializable instruction can always be rematerialized if needed by RA:

  // Rematerializable instructions should always be hoisted since the register
  // allocator can just pull them down again when needed.
  if (TII->isTriviallyReMaterializable(MI, AA))
    return true;

So MachineLICM will always hoist such instructions even if that will push register pressure too high. However, its assumption that a rematerailizable instruction will be rematerilized if RA does not have enough registers is not true. The check at `LiveRangeEdit::allUsesAvailableAt()` ensures that a used register is available at a point of rematerialization. If it does not RA will not try to extend the liverange to that point at least because that would increase register pressure. So in fact rematerilization will not happen as LICM expects. I.e. it would transform:

  LOOP:
    %0 = DEF killed %1
    USE killed %0
    GOTO LOOP

into

  %0 = DEF killed %1
  LOOP:
    USE %0
    GOTO LOOP

Here DEF can be rematerialized before USE, but RA will not do it because %1 is killed at DEF and not available at USE. If DEF itself increases register pressure that is a problem.

In the test AMDGPU/licm-regpressure.mir updated in this patch MachineLICM hoists all V_CVT_F64_I32_e32 instrtuctions and that makes virtual registers %18 - %35 defined by these instructions live across the whole loop. By adding a check for such uses the logic of `MachineLICMBase::IsProfitableToHoist` proceeds further to `CanCauseHighRegPressure()` check and only hoists first 5 instructions (%18 - %22). The rest are kept in the loop because we have reached register pressure limit. Instructions remaining in the loop only consume 1 register for all defs because def is immediately killed.

I assume the same problem may happen with any instruction defining something including VCTP.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107677/new/

https://reviews.llvm.org/D107677



More information about the llvm-commits mailing list