[llvm-dev] Extending Register Rematerialization

Fri Dec 2 12:36:26 PST 2016

----- Original Message -----

> From: "Gerolf Hoflehner" <ghoflehner at apple.com>
> To: "Nirav Rana" <nirav076 at gmail.com>
> Cc: hfinkel at anl.gov, llvm-dev at lists.llvm.org, "Pandya Vivek"
> <h2015078 at pilani.bits-pilani.ac.in>,
> h2015089 at pilani.bits-pilani.ac.in, h2015172 at pilani.bits-pilani.ac.in
> Sent: Thursday, December 1, 2016 6:14:06 PM
> Subject: Re: [llvm-dev] Extending Register Rematerialization

> On which targets & apps/benchmarks do you expect a speed-up? In
> practice I expect spills/fills to be hard to beat by longer remat
> sequences.
Why? 

Perhaps it depends on how you define "longer." A larger OOO core with multiple pipelines can often execute a materialization sequence consisting of several instructions faster than it can get data from the L1 cache. If the code is already putting pressure on the load/store units then the extra spill/restore code can be noticeably worse. 

Note the following from AArch64InstrInfo.td: 

let isReMaterializable = 1, isCodeGenOnly = 1, isMoveImm = 1, 
isAsCheapAsAMove = 1 in { 
// FIXME: The following pseudo instructions are only needed because remat 
// cannot handle multiple instructions. When that changes, we can select 
// directly to the real instructions and get rid of these pseudos. 

def MOVi32imm 
: Pseudo<(outs GPR32:$dst), (ins i32imm:$src), 
[(set GPR32:$dst, imm:$src)]>, 
Sched<[WriteImm]>; 
def MOVi64imm 
: Pseudo<(outs GPR64:$dst), (ins i64imm:$src), 
[(set GPR64:$dst, imm:$src)]>, 
Sched<[WriteImm]>; 
} // isReMaterializable, isCodeGenOnly 

Also, I think that Ivan Baev's talk at the developers' meeting a couple of years ago also provides some good hints of places to look for where this might matter: http://llvm.org/devmtg/2014-10/#talk20 

Thanks again, 
Hal 

> Thanks
> Gerolf

> > On Nov 27, 2016, at 12:37 PM, Nirav Rana via llvm-dev <
> > llvm-dev at lists.llvm.org > wrote:
> 

> > Hello LLVM Developers,
> 

> > We are working on extending currently available register
> > rematerialization to include cases where sequence of multiple
> > instructions is required to rematerialize a value.
> 

> > We had a discussion on this in community mailing list and link is
> > here:
> 
> > http://lists.llvm.org/pipermail/llvm-dev/2016-September/subject.html#104777
> 

> > From the above discussion and studying the code we believe that
> > extension can be implemented in same flow as current remat is
> > implemented. What we unterstood is RegAlloc<>.cpp will try to
> > allocate register to live-range, and if not possible, will call
> > InlineSpiller.cpp to spill the live range. InlineSpiller.cpp will
> > try to first rematerialize the register value if possible with help
> > of LiveRangeEdit.cpp which provides various methods for checking if
> > value is rematable or not.
> 

> > So we have added a new function in LiveRangeEdit that traverses
> > sequence of instruction in use-def chain recursively (instead of
> > only current instruction in consideration) upto depth 6
> > (arbitrarily
> > taken for testing) to check if value can be rematerialized with the
> > sequence of instruction or not.
> 

> > Here is the code:
> 
> > //New function added for checking complex
> > multi-instruction-sequence
> > rematerializable
> 
> > bool LiveRangeEdit::checkComplexRematerializable(VNInfo *VNI,
> 
> > const MachineInstr *DefMI,
> 
> > unsigned int depth,
> 
> > AliasAnalysis *aa) {
> 
> > if(TII.isReMaterializablePossible(*DefMI, aa))
> 
> > return false;
> 
> > DEBUG(dbgs() << " ComplexRemat MI: " << *DefMI);
> 
> > for (unsigned i = 0, e = DefMI->getNumOperands(); i != e; ++i) {
> 
> > const MachineOperand &MO = DefMI->getOperand(i);
> 

> > if (!MO.isReg() || !MO.getReg() || !MO.readsReg())
> 
> > continue;
> 
> > if (TargetRegisterInfo::isPhysicalRegister(MO.getReg())) {
> 
> > if (MRI.isConstantPhysReg(MO.getReg(),
> > *DefMI->getParent()->getParent()))
> 
> > continue;
> 
> > //If not constant then check its def
> 
> > if(depth > 6)
> 
> > return false;
> 

> > LiveInterval &li = LIS.getInterval(MO.getReg());
> 
> > SlotIndex UseIdx = LIS.getInstructionIndex(*DefMI);
> 
> > VNInfo *UseVNInfo = li.getVNInfoAt(UseIdx);
> 

> > MachineInstr *NewDefMI =
> > LIS.getInstructionFromIndex(UseVNInfo->def);
> 
> > if(!checkComplexRematerializable(UseVNInfo, NewDefMI, depth+1, aa))
> 
> > return false;
> 
> > }
> 
> > }
> 
> > Remattable.insert(VNI); //May have to add new data structure
> 
> > return true;
> 
> > }
> 

> > In above function we are calling a new function
> > TII.isReMaterializablePossible(*DefMI, aa) which will act as early
> > heuristic and return false by checking if instruction is definitely
> > not rematerialize. We have found some cases from
> > TargetInstrInfo::isReallyTriviallyReMaterializableGeneric and code
> > for same is here:
> 

> > bool TargetInstrInfo::isReMaterializablePossible(
> 
> > const MachineInstr &MI, AliasAnalysis *AA) const {
> 
> > const MachineFunction &MF = *MI.getParent()->getParent();
> 
> > const MachineRegisterInfo &MRI = MF.getRegInfo();
> 

> > // Remat clients assume operand 0 is the defined register.
> 
> > if (!MI.getNumOperands() || !MI.getOperand(0).isReg())
> 
> > return false;
> 
> > unsigned DefReg = MI.getOperand(0).getReg();
> 

> > // A sub-register definition can only be rematerialized if the
> > instruction
> 
> > // doesn't read the other parts of the register. Otherwise it is
> > really a
> 
> > // read-modify-write operation on the full virtual register which
> > cannot be
> 
> > // moved safely.
> 
> > if (TargetRegisterInfo::isVirtualRegister(DefReg) &&
> 
> > MI.getOperand(0).getSubReg() && MI.readsVirtualRegister(DefReg))
> 
> > return false;
> 

> > // Avoid instructions obviously unsafe for remat.
> 
> > if (MI.isNotDuplicable() || MI.mayStore() ||
> > MI.hasUnmodeledSideEffects())
> 
> > return false;
> 

> > // Don't remat inline asm. We have no idea how expensive it is
> 
> > // even if it's side effect free.
> 
> > if (MI.isInlineAsm())
> 
> > return false;
> 
> > }
> 

> > We have following doubts and require guidance and suggestion to
> > move
> > ahead:
> 
> > 1. Is the approach we are following feasible?
> 
> > 2. What will be the suitable method to store the sequence of
> > instruction for recomputing value which will be used during
> > transformation.
> 
> > 3. Suggestion for deciding termination condition for checking
> > use-def
> > chain as it should be terminated when remat will be costly that
> > spill.
> 
> > 4. What other cases or instruction could be included in
> > isReMaterializablePossible() function. Some suggestions for
> > direction to look in.
> 

> > Any other suggestions will also be helpful for us to move in right
> > direction.
> 

> > - Nirav _______________________________________________
> 
> > LLVM Developers mailing list
> 
> > llvm-dev at lists.llvm.org
> 
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161202/5ace6ba7/attachment.html>