[PATCH] [Peephole] Advanced rewriting of copies to avoid cross register banks copies.

Quentin Colombet qcolombet at apple.com
Tue Jun 17 10:20:00 PDT 2014


Ping?

Thanks,
-Quentin

On Jun 10, 2014, at 11:57 AM, Quentin Colombet <qcolombet at apple.com> wrote:

> On Jun 10, 2014, at 10:53 AM, Tom Stellard <tom at stellard.net> wrote:
> 
>> On Tue, Jun 10, 2014 at 04:56:25PM +0000, Quentin Colombet wrote:
>>> Hi,
>>> 
>>> The proposed patch extends the peephole optimization introduced in r190713 to allow even more cross register banks copies rewriting.
>>> As it is, the extension may not be that useful, but I thought it may be easier to reviewer than the complete solution (see Motivating Examples and What Is Next?).
>>> 
>>> Thanks for your feedback.
>>> 
>>> ** Context  **
>>> 
>>> In r190713 we introduced a peephole optimization that produces register-coalescer friendly copies when possible.
>>> This optimization basically looks through a chain of copies to find a more suitable source for a cross register banks copy.
>>> E.g.,
>>> b = copy A <-- cross-bank copy
>>>>>> C = copy b <-- cross-bank copy
>>> 
>>> Is rewritten into:
>>> b = copy A  <-- cross-bank copy
>>>>>> C = copy A <-- same-bank copy 
>>> 
>>> However, there are several instructions that are lowered via cross-bank copies that this optimization fails to optimize.
>>> E.g.
>>> b = insert_subreg e, A, sub0 <-- cross-bank copy
>>>>>> C = copy b.sub0 <-- cross-bank copy
>>> 
>>> Ideally, we would like to produce the following code:
>>> b = insert_subreg e, A, sub0 <-- cross-bank copy
>>>>>> C = copy A <-- same-bank copy
>>> 
>>> 
>>> ** Proposed Patch **
>>> 
>>> The proposed patch taught the existing cross-bank copy optimization how to deal with the instructions that generate cross-bank copies, i.e., insert_subreg, extract_subreg, reg_sequence, and subreg_to_reg.
>>> We introduce a new helper class for that: ValueTracker.
>>> This class implements the logic to look through the copy related instructions and get the related source.
>>> 
>>> For now, the advanced copy rewriting is disabled by default as it is not sufficient to solve the motivating examples and I had a hard time to come up with a test case because of that (see motivating example section). However, you can give it a try on your favorite platform with -disable-adv-copy-opt=false and if it helps, I would be happy to add a test case!
>>> 
>>> I have also checked that the introduced refactoring does not change the current code gen through the entire llvm-testsuite + SPECs, when the extension is disable, for both x86_64 and arm64 with both O3 and Os.
>>> 
>>> 
>>> ** Motivating Examples **
>>> 
>>> Let us consider a couple of examples.
>>> 
>>> * armv7s *
>>> 
>>> define <2 x i32> @testuvec(<2 x i32> %A, <2 x i32> %B) nounwind {
>>> entry:
>>>  %div = udiv <2 x i32> %A, %B
>>>  ret <2 x i32> %div
>>> }
>>> 
>>> We would like the following code to be generated on swift (which has a udiv instruction):
>>> // %A is in r0, r1
>>> // %B is in r2, r3
>>> 	udiv	r0, r2, r0
>>> 	udiv	r1, r3, r1
>>> 	bx lr
>>> 
>>> However, we generate a far more complicated sequence of instructions because we do not recognize that we are moving r0, r1, etc, through d registers:
>>> 	vmov	d1, r0, r1
>>> 	vmov	d0, r2, r3
>>> 	vmov	r1, s2
>>> 	vmov	r0, s0
>>> 	vmov	r2, s3
>>> 	udiv	r0, r1, r0
>>> 	vmov	r1, s1
>>> 	udiv	r1, r2, r1
>>> 	vmov.32	d16[0], r0
>>> 	vmov.32	d16[1], r1
>>> 	vmov	r0, r1, d16
>>> 	bx	lr
>>> 
>>> * AArch64 *
>>> 
>>> define i64 @test2(i128 %arg) {
>>>  %vec = bitcast i128 %arg to <2 x i64>
>>>  %scalar = extractelement <2 x i64> %vec, i32 0
>>>  ret i64 %scalar
>>> }
>>> 
>>> One would expect that this code :
>>> // %arg is in x0, x1
>>> // we simply return x0
>>> 	ret
>>> 
>>> However, we generate a less straight forward sequence:
>>> 	fmov	d0, x0
>>> 	ins.d	v0[1], x1
>>> 	fmov	x0, d0
>>> 	ret
>>> 
>>> The proposed patch is not sufficient to catch those cases yet, as they use target specific instructions to implement the insert_subreg, extract_subreg logic. However, if the lowering was using the generic instructions, this optimization would have helped. See "What Is Next?” for how I plan to tackle that.
>>> 
>>> 
>>> ** Testcase ?! **
>>> 
>>> Since the current patch does not yet support the motivating examples, I do not have something reasonably small that exercises the new path. Thus, I have disabled it by default until we have the full support.
>>> Again, if you think that this optimization can help some of the cases you are seeing, give it a try, and propose your test case!
>>> 
>> 
>> The SI and newer subtargets in the R600 backend will probably benefit a
>> lot from this optimization.  You might try force enabling it and running
>> all the test cases in test/CodeGen/R600 to see if any tests fail.
> 
> Thanks for the suggestion.
> I have given it a try but none of the test fails. I haven’t checked if it is just the test checks that are not sensible enough.
> 
> That said, I guess that to express the full potential of value tracking, in particular for R600, we should use this technic to rewrite the arguments of REG_SEQUENCE, INSERT_SUBREG, and so on, not just copies.
> 
> By the way, thanks to this test, I realized I uploaded the wrong patch (I was missing one fix a typo) :).
> 
> Thanks,
> -Quentin
> 
>> 
>> If I get some time, I will try to take a look at how it impacts R600.
>> 
>> -Tom
>> 
>>> 
>>> ** What Is Next? **
>>> 
>>> * Teach the optimization about target specific nodes, so that we can handle the motivating examples.
>>> The idea would be to add new tablegen properties so that we would be able to specify that an instruction is similar to a insert_subreg instruction, etc., the same way we did with bitcast (though a little bit more complicated).
>>> * Enable the optimization by default or provide a target hook to control it.
>>> 
>>> Thanks,
>>> -Quentin
>>> 
>>> http://reviews.llvm.org/D4086
>>> 
>>> Files:
>>>  lib/CodeGen/PeepholeOptimizer.cpp
>> 
>>> Index: lib/CodeGen/PeepholeOptimizer.cpp
>>> ===================================================================
>>> --- lib/CodeGen/PeepholeOptimizer.cpp
>>> +++ lib/CodeGen/PeepholeOptimizer.cpp
>>> @@ -91,6 +91,10 @@
>>> DisablePeephole("disable-peephole", cl::Hidden, cl::init(false),
>>>                 cl::desc("Disable the peephole optimizer"));
>>> 
>>> +static cl::opt<bool>
>>> +DisableAdvCopyOpt("disable-adv-copy-opt", cl::Hidden, cl::init(true),
>>> +                  cl::desc("Disable advanced copy optimization"));
>>> +
>>> STATISTIC(NumReuse,      "Number of extension results reused");
>>> STATISTIC(NumCmps,       "Number of compares eliminated");
>>> STATISTIC(NumImmFold,    "Number of move immediate folded");
>>> @@ -137,6 +141,105 @@
>>>     bool isLoadFoldable(MachineInstr *MI,
>>>                         SmallSet<unsigned, 16> &FoldAsLoadDefCandidates);
>>>   };
>>> +
>>> +  /// \brief Helper class to track the possible sources of a value defined by
>>> +  /// a (chain of) copy related instructions.
>>> +  /// Given a definition (instruction and definition index), this class
>>> +  /// follow the use-def chain to find successive suitable sources.
>>> +  /// The given source can be used to rewrite the definition into
>>> +  /// def = COPY src.
>>> +  ///
>>> +  /// For instance, let us consider the following snippet:
>>> +  /// v0 =
>>> +  /// v2 = INSERT_SUBREG v1, v0, sub0
>>> +  /// def = COPY v2.sub0
>>> +  ///
>>> +  /// Using a ValueTracker for def = COPY v2.sub0 will give the following
>>> +  /// suitable sources:
>>> +  /// v2.sub0 and v0.
>>> +  /// Then, def can be rewritten into def = COPY v0.
>>> +  class ValueTracker {
>>> +  private:
>>> +    /// The current point into the use-def chain.
>>> +    const MachineInstr *Def;
>>> +    /// The index of the definition in Def.
>>> +    unsigned DefIdx;
>>> +    /// The sub register index of the definition.
>>> +    unsigned DefSubReg;
>>> +    /// The register where the value can be found.
>>> +    unsigned Reg;
>>> +    /// Specifiy whether or not the value tracking looks through
>>> +    /// complex instructions. When this is false, the value tracker
>>> +    /// bails on everything that is not a copy or a bitcast.
>>> +    ///
>>> +    /// Note: This could have been implemented as a specialized version of
>>> +    /// the ValueTracker class but that would have complicated the code of
>>> +    /// the users of this class.
>>> +    bool UseAdvancedTracking;
>>> +    /// Optional MachineRegisterInfo used to perform some complex
>>> +    /// tracking.
>>> +    const MachineRegisterInfo *MRI;
>>> +
>>> +    /// \brief Dispatcher to the right underlying implementation of
>>> +    /// getNextSource.
>>> +    bool getNextSourceImpl(unsigned &SrcIdx, unsigned &SrcSubReg);
>>> +    /// \brief Specialized version of getNextSource for Copy instructions.
>>> +    bool getNextSourceFromCopy(unsigned &SrcIdx, unsigned &SrcSubReg);
>>> +    /// \brief Specialized version of getNextSource for Bitcast instructions.
>>> +    bool getNextSourceFromBitcast(unsigned &SrcIdx, unsigned &SrcSubReg);
>>> +    /// \brief Specialized version of getNextSource for RegSequence
>>> +    /// instructions.
>>> +    bool getNextSourceFromRegSequence(unsigned &SrcIdx, unsigned &SrcSubReg);
>>> +    /// \brief Specialized version of getNextSource for InsertSubreg
>>> +    /// instructions.
>>> +    bool getNextSourceFromInsertSubreg(unsigned &SrcIdx, unsigned &SrcSubReg);
>>> +    /// \brief Specialized version of getNextSource for ExtractSubreg
>>> +    /// instructions.
>>> +    bool getNextSourceFromExtractSubreg(unsigned &SrcIdx, unsigned &SrcSubReg);
>>> +    /// \brief Specialized version of getNextSource for SubregToReg
>>> +    /// instructions.
>>> +    bool getNextSourceFromSubregToReg(unsigned &SrcIdx, unsigned &SrcSubReg);
>>> +
>>> +  public:
>>> +    /// \brief Create a ValueTracker instance for the value defines by \p MI
>>> +    /// at the operand index \p DefIdx.
>>> +    /// \p DefSubReg represents the sub register index the value tracker will
>>> +    /// track. It does not need to match the sub register index used in \p MI.
>>> +    /// \p UseAdvancedTracking specifies whether or not the value tracker looks
>>> +    /// through complex instructions. By default (false), it handles only copy
>>> +    /// and bitcast instructions.
>>> +    /// \p MRI useful to perform some complex checks.
>>> +    ValueTracker(const MachineInstr &MI, unsigned DefIdx, unsigned DefSubReg,
>>> +                 bool UseAdvancedTracking = false,
>>> +                 const MachineRegisterInfo *MRI = nullptr)
>>> +        : Def(&MI), DefIdx(DefIdx), DefSubReg(DefSubReg),
>>> +          UseAdvancedTracking(UseAdvancedTracking), MRI(MRI) {
>>> +      assert(Def->getOperand(DefIdx).isDef() &&
>>> +             Def->getOperand(DefIdx).isReg() &&
>>> +             "Definition does not match machine instruction");
>>> +      // Initially the value is in the defined register.
>>> +      Reg = Def->getOperand(DefIdx).getReg();
>>> +    }
>>> +
>>> +    /// \brief Following the use-def chain, get the next available source
>>> +    /// for the tracked value.
>>> +    /// When the returned value is not nullptr, getReg() gives the register
>>> +    /// that contain the tracked value.
>>> +    /// \note The sub register index returned in \p SrcSubReg must be used
>>> +    /// on that getReg() to access the actual value.
>>> +    /// \return Unless the returned value is nullptr (i.e., no source found),
>>> +    /// \p SrcIdx gives the index of the next source in the returned
>>> +    /// instruction and \p SrcSubReg the index to be used on that source to
>>> +    /// get the tracked value. When nullptr is returned, no alternative source
>>> +    /// has been found.
>>> +    const MachineInstr *getNextSource(unsigned &SrcIdx, unsigned &SrcSubReg);
>>> +
>>> +    /// \brief Get the last register where the initial value can be found.
>>> +    /// Initially this is the register of the definition.
>>> +    /// Then, after each successful call to getNextSource, this is the
>>> +    /// register of the last source.
>>> +    unsigned getReg() const { return Reg; }
>>> +  };
>>> }
>>> 
>>> char PeepholeOptimizer::ID = 0;
>>> @@ -443,31 +546,27 @@
>>>   unsigned Src;
>>>   unsigned SrcSubReg;
>>>   bool ShouldRewrite = false;
>>> -  MachineInstr *Copy = MI;
>>>   const TargetRegisterInfo &TRI = *TM->getRegisterInfo();
>>> 
>>> -  // Follow the chain of copies until we reach the top or find a
>>> -  // more suitable source.
>>> +  // Follow the chain of copies until we reach the top of the use-def chain
>>> +  // or find a more suitable source.
>>> +  ValueTracker ValTracker(*MI, DefIdx, DefSubReg, !DisableAdvCopyOpt, MRI);
>>>   do {
>>> -    unsigned CopyDefIdx, CopySrcIdx;
>>> -    if (!getCopyOrBitcastDefUseIdx(*Copy, CopyDefIdx, CopySrcIdx))
>>> +    unsigned CopySrcIdx, CopySrcSubReg;
>>> +    if (!ValTracker.getNextSource(CopySrcIdx, CopySrcSubReg))
>>>       break;
>>> -    const MachineOperand &MO = Copy->getOperand(CopySrcIdx);
>>> -    assert(MO.isReg() && "Copies must be between registers.");
>>> -    Src = MO.getReg();
>>> +    Src = ValTracker.getReg();
>>> +    SrcSubReg = CopySrcSubReg;
>>> 
>>>     if (TargetRegisterInfo::isPhysicalRegister(Src))
>>>       break;
>>> 
>>>     const TargetRegisterClass *SrcRC = MRI->getRegClass(Src);
>>> -    SrcSubReg = MO.getSubReg();
>>> 
>>>     // If this source does not incur a cross register bank copy, use it.
>>>     ShouldRewrite = shareSameRegisterFile(TRI, DefRC, DefSubReg, SrcRC,
>>>                                           SrcSubReg);
>>> -    // Follow the chain of copies: get the definition of Src.
>>> -    Copy = MRI->getVRegDef(Src);
>>> -  } while (!ShouldRewrite && Copy && (Copy->isCopy() || Copy->isBitcast()));
>>> +  } while (!ShouldRewrite);
>>> 
>>>   // If we did not find a more suitable source, there is nothing to optimize.
>>>   if (!ShouldRewrite || Src == MI->getOperand(SrcIdx).getReg())
>>> @@ -483,6 +582,9 @@
>>> 
>>>   MRI->replaceRegWith(Def, NewVR);
>>>   MRI->clearKillFlags(NewVR);
>>> +  // We extended the lifetime of Src.
>>> +  // Clear the kill flags to account for that.
>>> +  MRI->clearKillFlags(Src);
>>>   MI->eraseFromParent();
>>>   ++NumCopiesBitcasts;
>>>   return true;
>>> @@ -673,3 +775,247 @@
>>> 
>>>   return Changed;
>>> }
>>> +
>>> +bool ValueTracker::getNextSourceFromCopy(unsigned &SrcIdx,
>>> +                                         unsigned &SrcSubReg) {
>>> +  assert(Def->isCopy() && "Invalid definition");
>>> +  // Copy instruction are supposed to be: Def = Src.
>>> +  if (Def->getDesc().getNumOperands() != 2)
>>> +    return false;
>>> +  if (Def->getOperand(DefIdx).getSubReg() != DefSubReg)
>>> +    // If we look for a different subreg, it means we want a subreg of src.
>>> +    // Bails as we do not support composing subreg yet.
>>> +    return false;
>>> +  // Otherwise, we want the whole source.
>>> +  SrcIdx = 1;
>>> +  SrcSubReg = Def->getOperand(SrcIdx).getSubReg();
>>> +  return true;
>>> +}
>>> +
>>> +bool ValueTracker::getNextSourceFromBitcast(unsigned &SrcIdx,
>>> +                                            unsigned &SrcSubReg) {
>>> +  assert(Def->isBitcast() && "Invalid definition");
>>> +
>>> +  // Bitcasts with more than one def are not supported.
>>> +  if (Def->getDesc().getNumDefs() != 1)
>>> +    return false;
>>> +  if (Def->getOperand(DefIdx).getSubReg() != DefSubReg)
>>> +    // If we look for a different subreg, it means we want a subreg of the src.
>>> +    // Bails as we do not support composing subreg yet.
>>> +    return false;
>>> +
>>> +  SrcIdx = Def->getDesc().getNumOperands();
>>> +  for (unsigned OpIdx = DefIdx + 1, EndOpIdx = SrcIdx; OpIdx != EndOpIdx;
>>> +       ++OpIdx) {
>>> +    const MachineOperand &MO = Def->getOperand(OpIdx);
>>> +    if (!MO.isReg() || !MO.getReg())
>>> +      continue;
>>> +    assert(!MO.isDef() && "We should have skipped all the definitions by now");
>>> +    if (SrcIdx != EndOpIdx)
>>> +      // Multiple sources?
>>> +      return false;
>>> +    SrcIdx = OpIdx;
>>> +  }
>>> +  SrcSubReg = Def->getOperand(SrcIdx).getSubReg();
>>> +  return true;
>>> +}
>>> +
>>> +bool ValueTracker::getNextSourceFromRegSequence(unsigned &SrcIdx,
>>> +                                                unsigned &SrcSubReg) {
>>> +  assert(Def->isRegSequence() && "Invalid definition");
>>> +
>>> +  if (Def->getOperand(DefIdx).getSubReg())
>>> +    // If we are composing subreg, bails out.
>>> +    // The case we are checking is Def.<subreg> = REG_SEQUENCE.
>>> +    // This should almost never happen as the SSA property is tracked at
>>> +    // the register level (as opposed to the subreg level).
>>> +    // I.e.,
>>> +    // Def.sub0 =
>>> +    // Def.sub1 =
>>> +    // is a valid SSA representation for Def.sub0 and Def.sub1, but not for
>>> +    // Def. Thus, it must not be generated.
>>> +    // However, some code could theoritically generates a single
>>> +    // Def.sub0 (i.e, not defining the other subregs) and we would
>>> +    // have this case.
>>> +    // If we can ascertain (or force) that this never happens, we could
>>> +    // turn that into an assertion.
>>> +    return false;
>>> +
>>> +  // We are looking at:
>>> +  // Def = REG_SEQUENCE v0, sub0, v1, sub1, ...
>>> +  // Check if one of the operand defines the subreg we are interested in.
>>> +  for (unsigned OpIdx = DefIdx + 1, EndOpIdx = Def->getDesc().getNumOperands();
>>> +       OpIdx != EndOpIdx; OpIdx += 2) {
>>> +    const MachineOperand &MOSubIdx = Def->getOperand(OpIdx + 1);
>>> +    assert(MOSubIdx.isImm() &&
>>> +           "One of the subindex of the reg_sequence is not an immediate");
>>> +    if (MOSubIdx.getImm() == DefSubReg) {
>>> +      assert(Def->getOperand(OpIdx).isReg() &&
>>> +             "One of the source of the reg_sequence is not a register");
>>> +      SrcIdx = OpIdx;
>>> +      SrcSubReg = Def->getOperand(SrcIdx).getSubReg();
>>> +      return true;
>>> +    }
>>> +  }
>>> +
>>> +  // If the subreg we are tracking is super-defined by another subreg,
>>> +  // we could follow this value. However, this would require to compose
>>> +  // the subreg and we do not do that for now.
>>> +  return false;
>>> +}
>>> +
>>> +bool ValueTracker::getNextSourceFromInsertSubreg(unsigned &SrcIdx,
>>> +                                                 unsigned &SrcSubReg) {
>>> +  assert(Def->isInsertSubreg() && "Invalid definition");
>>> +  if (Def->getOperand(DefIdx).getSubReg())
>>> +    // If we are composing subreg, bails out.
>>> +    // Same remark as getNextSourceFromRegSequence.
>>> +    // I.e., this may be turned into an assert.
>>> +    return false;
>>> +
>>> +  // We are looking at:
>>> +  // Def = INSERT_SUBREG v0, v1, sub1
>>> +  // There are two cases:
>>> +  // 1. DefSubReg == sub1, get v1.
>>> +  // 2. DefSubReg != sub1, the value may be available through v0.
>>> +
>>> +  // #1 Check if the inserted register matches the require sub index.
>>> +  unsigned InsertedSubReg = Def->getOperand(3).getImm();
>>> +  if (InsertedSubReg == DefSubReg) {
>>> +    SrcIdx = 2;
>>> +    SrcSubReg = Def->getOperand(SrcIdx).getSubReg();
>>> +    return true;
>>> +  }
>>> +  // #2 Otherwise, if the sub register we are looking for is not partial
>>> +  // defined by the inserted element, we can look through the main
>>> +  // register (v0).
>>> +  // To check the overlapping we need a MRI and a TRI.
>>> +  if (!MRI)
>>> +    return false;
>>> +
>>> +  const MachineOperand &MODef = Def->getOperand(DefIdx);
>>> +  const MachineOperand &MOBase = Def->getOperand(1);
>>> +  // If the result register (Def) and the base register (v0) do not
>>> +  // have the same register class or if we have to compose
>>> +  // subregisters, bails out.
>>> +  if (MRI->getRegClass(MODef.getReg()) != MRI->getRegClass(MOBase.getReg()) ||
>>> +      MOBase.getSubReg())
>>> +    return false;
>>> +
>>> +  // Get the TRI and check if inserted sub register overlaps with the
>>> +  // sub register we are tracking.
>>> +  const TargetRegisterInfo *TRI = MRI->getTargetRegisterInfo();
>>> +  if (!TRI ||
>>> +      (TRI->getSubRegIndexLaneMask(DefSubReg) &
>>> +       TRI->getSubRegIndexLaneMask(InsertedSubReg)) != 0)
>>> +    return false;
>>> +  // At this point, the value is available in v0 via the same subreg
>>> +  // we used for Def.
>>> +  SrcIdx = 1;
>>> +  SrcSubReg = DefSubReg;
>>> +  return true;
>>> +}
>>> +
>>> +bool ValueTracker::getNextSourceFromExtractSubreg(unsigned &SrcIdx,
>>> +                                                  unsigned &SrcSubReg) {
>>> +  assert(Def->getOpcode() == TargetOpcode::EXTRACT_SUBREG &&
>>> +         "Invalid definition");
>>> +  // We are looking at:
>>> +  // Def = EXTRACT_SUBREG v0, sub0
>>> +
>>> +  // Bails if we have to compose sub registers.
>>> +  // Indeed, if DefSubReg != 0, we would have to compose it with sub0.
>>> +  if (DefSubReg)
>>> +    return false;
>>> +
>>> +  // Bails if we have to compose sub registers.
>>> +  // Likewise, if v0.subreg != 0, we would have to compose v0.subreg with sub0.
>>> +  if (Def->getOperand(1).getSubReg())
>>> +    return false;
>>> +  // Otherwise, the value is available in the v0.sub0.
>>> +  SrcIdx = 1;
>>> +  SrcSubReg = Def->getOperand(2).getImm();
>>> +  return true;
>>> +}
>>> +
>>> +bool ValueTracker::getNextSourceFromSubregToReg(unsigned &SrcIdx,
>>> +                                                unsigned &SrcSubReg) {
>>> +  assert(Def->isSubregToReg() && "Invalid definition");
>>> +  // We are looking at:
>>> +  // Def = SUBREG_TO_REG Imm, v0, sub0
>>> +
>>> +  // Bails if we have to compose sub registers.
>>> +  // If DefSubReg != sub0, we would have to check that all the bits
>>> +  // we track are included in sub0 and if yes, we would have to
>>> +  // determine the right subreg in v0.
>>> +  if (DefSubReg != Def->getOperand(3).getSubReg())
>>> +    return false;
>>> +  // Bails if we have to compose sub registers.
>>> +  // Likewise, if v0.subreg != 0, we would have to compose it with sub0.
>>> +  if (Def->getOperand(2).getSubReg())
>>> +    return false;
>>> +
>>> +  SrcIdx = 2;
>>> +  SrcSubReg = Def->getOperand(3).getSubReg();
>>> +  return true;
>>> +}
>>> +
>>> +bool ValueTracker::getNextSourceImpl(unsigned &SrcIdx, unsigned &SrcSubReg) {
>>> +  assert(Def && "This method needs a valid definition");
>>> +
>>> +  assert(
>>> +      (DefIdx < Def->getDesc().getNumDefs() || Def->getDesc().isVariadic()) &&
>>> +      Def->getOperand(DefIdx).isDef() && "Invalid DefIdx");
>>> +  if (Def->isCopy())
>>> +    return getNextSourceFromCopy(SrcIdx, SrcSubReg);
>>> +  if (Def->isBitcast())
>>> +    return getNextSourceFromBitcast(SrcIdx, SrcSubReg);
>>> +  // All the remaining cases involve "complex" instructions.
>>> +  // Bails if we did not ask for the advanced tracking.
>>> +  if (!UseAdvancedTracking)
>>> +    return false;
>>> +  if (Def->isRegSequence())
>>> +    return getNextSourceFromRegSequence(SrcIdx, SrcSubReg);
>>> +  if (Def->isInsertSubreg())
>>> +    return getNextSourceFromInsertSubreg(SrcIdx, SrcSubReg);
>>> +  if (Def->getOpcode() == TargetOpcode::EXTRACT_SUBREG)
>>> +    return getNextSourceFromExtractSubreg(SrcIdx, SrcSubReg);
>>> +  if (Def->isSubregToReg())
>>> +    return getNextSourceFromSubregToReg(SrcIdx, SrcSubReg);
>>> +  return false;
>>> +}
>>> +
>>> +const MachineInstr *ValueTracker::getNextSource(unsigned &SrcIdx,
>>> +                                                unsigned &SrcSubReg) {
>>> +  // If we reach a point where we cannot move up in the use-def chain,
>>> +  // there is nothing we can get.
>>> +  if (!Def)
>>> +    return nullptr;
>>> +
>>> +  const MachineInstr *PrevDef = nullptr;
>>> +  // Try to find the next source.
>>> +  if (getNextSourceImpl(SrcIdx, SrcSubReg)) {
>>> +    // Update definition, definition index, and subregister for the
>>> +    // next call of getNextSource.
>>> +    const MachineOperand &MO = Def->getOperand(SrcIdx);
>>> +    assert(MO.isReg() && !MO.isDef() && "Source is invalid");
>>> +    // Update the current register.
>>> +    Reg = MO.getReg();
>>> +    // Update the return value before moving up in the use-def chain.
>>> +    PrevDef = Def;
>>> +    // If we can still move up in the use-def chain, move to the next
>>> +    // defintion.
>>> +    if (!TargetRegisterInfo::isPhysicalRegister(Reg)) {
>>> +      Def = MRI->getVRegDef(Reg);
>>> +      DefIdx = MRI->def_begin(Reg).getOperandNo();
>>> +      DefSubReg = SrcSubReg;
>>> +      return PrevDef;
>>> +    }
>>> +  }
>>> +  // If we end up here, this means we will not be able to find another source
>>> +  // for the next iteration.
>>> +  // Make sure any new call to getNextSource bails out early by cutting the
>>> +  // use-def chain.
>>> +  Def = nullptr;
>>> +  return PrevDef;
>>> +}
>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140617/7fcac3fe/attachment.html>


More information about the llvm-commits mailing list