[llvm] r314729 - Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"

Diana Picus via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 3 09:54:50 PDT 2017


Hi Geoff,

I think this is also breaking one of our ARM bots:
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost-neon/builds/2117

I don't see the failure anymore with this reverted. Let me know if I
can help investigate.

Cheers,
Diana

On 3 October 2017 at 18:21, Geoff Berry via llvm-commits
<llvm-commits at lists.llvm.org> wrote:
> Hi Quentin,
>
> Sam's revert you referred to is from 4 weeks ago.  My latest re-commit
> doesn't seem to have been reverted.  I'll take care of it.
>
> On 10/3/2017 12:11 PM, Quentin Colombet wrote:
>>
>>
>>> On Oct 3, 2017, at 9:05 AM, Geoff Berry <gberry at codeaurora.org
>>> <mailto:gberry at codeaurora.org>> wrote:
>>>
>>> Hi Quentin,
>>>
>>> Yes, I'll revert shortly.  The verifier issue sounds like an issue I
>>> though I had worked around regarding forwarding to undef uses.  Let me know
>>> if I can help.
>>
>>
>> Will do.
>>
>> Looks like we are not the only one having issue with the commit.
>> Sam reverted it in r312490 already.
>>
>>>
>>> -Geoff
>>>
>>> On 10/3/2017 12:01 PM, Quentin Colombet wrote:
>>>>
>>>> Hi Geoff,
>>>> I see the MachineVerifier complaining after this commit for our
>>>> out-of-tree target. More over, address sanitizer is complaining about an
>>>> invalid access.
>>>> ==50770==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008
>>>> (pc 0x000102e7e42e bp 0x700002f37170 sp 0x700002f36f60 T9)
>>>> ==50770==The signal is caused by a READ memory access.
>>>> ==50770==Hint: address points to the zero page.
>>>>     #0 0x102e7e42d in (anonymous
>>>> namespace)::MachineCopyPropagation::forwardUses(llvm::MachineInstr&)
>>>> MachineCopyPropagation.cpp:596
>>>>     #1 0x102e79c22 in (anonymous
>>>> namespace)::MachineCopyPropagation::runOnMachineFunction(llvm::MachineFunction&)
>>>> MachineCopyPropagation.cpp:806
>>>> The MachineVerifier error is:
>>>> *** Bad machine code: Instruction ending live segment doesn't read the
>>>> register ***
>>>> Could you revert while we investigate?
>>>> Thanks,
>>>> -Quentin
>>>>>
>>>>> On Oct 2, 2017, at 3:01 PM, Geoff Berry via llvm-commits
>>>>> <llvm-commits at lists.llvm.org
>>>>> <mailto:llvm-commits at lists.llvm.org><mailto:llvm-commits at lists.llvm.org>>
>>>>> wrote:
>>>>>
>>>>> Author: gberry
>>>>> Date: Mon Oct  2 15:01:37 2017
>>>>> New Revision: 314729
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project?rev=314729&view=rev
>>>>> Log:
>>>>> Re-enable "[MachineCopyPropagation] Extend pass to do COPY source
>>>>> forwarding"
>>>>>
>>>>> Issues addressed since original review:
>>>>> - Avoid bug in regalloc greedy/machine verifier when forwarding to use
>>>>>  in an instruction that re-defines the same virtual register.
>>>>> - Fixed bug when forwarding to use in EarlyClobber instruction slot.
>>>>> - Fixed incorrect forwarding to register definitions that showed up in
>>>>>  explicit_uses() iterator (e.g. in INLINEASM).
>>>>> - Moved removal of dead instructions found by
>>>>>  LiveIntervals::shrinkToUses() outside of loop iterating over
>>>>>  instructions to avoid instructions being deleted while pointed to by
>>>>>  iterator.
>>>>> - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
>>>>> - The pass no longer forwards COPYs to physical register uses, since
>>>>>  doing so can break code that implicitly relies on the physical
>>>>>  register number of the use.
>>>>> - The pass no longer forwards COPYs to undef uses, since doing so
>>>>>  can break the machine verifier by creating LiveRanges that don't
>>>>>  end on a use (since the undef operand is not considered a use).
>>>>>
>>>>>  [MachineCopyPropagation] Extend pass to do COPY source forwarding
>>>>>
>>>>>  This change extends MachineCopyPropagation to do COPY source
>>>>> forwarding.
>>>>>
>>>>>  This change also extends the MachineCopyPropagation pass to be able to
>>>>>  be run during register allocation, after physical registers have been
>>>>>  assigned, but before the virtual registers have been re-written, which
>>>>>  allows it to remove virtual register COPY LiveIntervals that become
>>>>> dead
>>>>>  through the forwarding of all of their uses.
>>>>>
>>>>> Modified:
>>>>>    llvm/trunk/include/llvm/CodeGen/Passes.h
>>>>>    llvm/trunk/include/llvm/InitializePasses.h
>>>>>    llvm/trunk/lib/CodeGen/CodeGen.cpp
>>>>>    llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp
>>>>>    llvm/trunk/lib/CodeGen/TargetPassConfig.cpp
>>>>>    llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
>>>>>    llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll
>>>>>    llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
>>>>>    llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll
>>>>>    llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll
>>>>>    llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll
>>>>>    llvm/trunk/test/CodeGen/AArch64/neg-imm.ll
>>>>>    llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll
>>>>>    llvm/trunk/test/CodeGen/AMDGPU/mad-mix.ll
>>>>>    llvm/trunk/test/CodeGen/AMDGPU/ret.ll
>>>>>    llvm/trunk/test/CodeGen/ARM/atomic-op.ll
>>>>>    llvm/trunk/test/CodeGen/ARM/intrinsics-overflow.ll
>>>>>    llvm/trunk/test/CodeGen/ARM/swifterror.ll
>>>>>    llvm/trunk/test/CodeGen/Mips/llvm-ir/sub.ll
>>>>>    llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll
>>>>>    llvm/trunk/test/CodeGen/PowerPC/gpr-vsr-spill.ll
>>>>>    llvm/trunk/test/CodeGen/PowerPC/inlineasm-i64-reg.ll
>>>>>    llvm/trunk/test/CodeGen/PowerPC/opt-li-add-to-addi.ll
>>>>>    llvm/trunk/test/CodeGen/PowerPC/tail-dup-layout.ll
>>>>>    llvm/trunk/test/CodeGen/SPARC/atomics.ll
>>>>>    llvm/trunk/test/CodeGen/Thumb/thumb-shrink-wrapping.ll
>>>>>    llvm/trunk/test/CodeGen/X86/2006-03-01-InstrSchedBug.ll
>>>>>    llvm/trunk/test/CodeGen/X86/arg-copy-elide.ll
>>>>>    llvm/trunk/test/CodeGen/X86/avg.ll
>>>>>    llvm/trunk/test/CodeGen/X86/avx-load-store.ll
>>>>>    llvm/trunk/test/CodeGen/X86/avx512-bugfix-25270.ll
>>>>>    llvm/trunk/test/CodeGen/X86/avx512-calling-conv.ll
>>>>>    llvm/trunk/test/CodeGen/X86/avx512-mask-op.ll
>>>>>    llvm/trunk/test/CodeGen/X86/avx512-schedule.ll
>>>>>    llvm/trunk/test/CodeGen/X86/avx512bw-intrinsics-upgrade.ll
>>>>>    llvm/trunk/test/CodeGen/X86/buildvec-insertvec.ll
>>>>>    llvm/trunk/test/CodeGen/X86/combine-fcopysign.ll
>>>>>    llvm/trunk/test/CodeGen/X86/combine-shl.ll
>>>>>    llvm/trunk/test/CodeGen/X86/complex-fastmath.ll
>>>>>    llvm/trunk/test/CodeGen/X86/divide-by-constant.ll
>>>>>    llvm/trunk/test/CodeGen/X86/fmaxnum.ll
>>>>>    llvm/trunk/test/CodeGen/X86/fmf-flags.ll
>>>>>    llvm/trunk/test/CodeGen/X86/fminnum.ll
>>>>>    llvm/trunk/test/CodeGen/X86/fp128-i128.ll
>>>>>    llvm/trunk/test/CodeGen/X86/haddsub-2.ll
>>>>>    llvm/trunk/test/CodeGen/X86/haddsub-undef.ll
>>>>>    llvm/trunk/test/CodeGen/X86/half.ll
>>>>>    llvm/trunk/test/CodeGen/X86/inline-asm-fpstack.ll
>>>>>    llvm/trunk/test/CodeGen/X86/ipra-local-linkage.ll
>>>>>    llvm/trunk/test/CodeGen/X86/localescape.ll
>>>>>    llvm/trunk/test/CodeGen/X86/machine-cp.ll
>>>>>    llvm/trunk/test/CodeGen/X86/mul-i1024.ll
>>>>>    llvm/trunk/test/CodeGen/X86/mul-i512.ll
>>>>>    llvm/trunk/test/CodeGen/X86/mul128.ll
>>>>>    llvm/trunk/test/CodeGen/X86/mulvi32.ll
>>>>>    llvm/trunk/test/CodeGen/X86/pmul.ll
>>>>>    llvm/trunk/test/CodeGen/X86/powi.ll
>>>>>    llvm/trunk/test/CodeGen/X86/pr11334.ll
>>>>>    llvm/trunk/test/CodeGen/X86/pr29112.ll
>>>>>    llvm/trunk/test/CodeGen/X86/psubus.ll
>>>>>    llvm/trunk/test/CodeGen/X86/shrink-wrap-chkstk.ll
>>>>>    llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll
>>>>>    llvm/trunk/test/CodeGen/X86/sse1.ll
>>>>>    llvm/trunk/test/CodeGen/X86/sse3-avx-addsub-2.ll
>>>>>    llvm/trunk/test/CodeGen/X86/statepoint-live-in.ll
>>>>>    llvm/trunk/test/CodeGen/X86/statepoint-stack-usage.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vec_fp_to_int.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vec_minmax_sint.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vec_shift4.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vector-blend.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vector-idiv-sdiv-128.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vector-idiv-udiv-128.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vector-mul.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vector-rotate-128.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vector-sext.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vector-shift-ashr-128.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vector-shift-lshr-128.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vector-shift-shl-128.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vector-trunc-math.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vector-zext.ll
>>>>>    llvm/trunk/test/CodeGen/X86/vselect-minmax.ll
>>>>>    llvm/trunk/test/CodeGen/X86/widen_conv-3.ll
>>>>>    llvm/trunk/test/CodeGen/X86/widen_conv-4.ll
>>>>>    llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll
>>>>>    llvm/trunk/test/CodeGen/X86/x86-shrink-wrap-unwind.ll
>>>>>    llvm/trunk/test/CodeGen/X86/x86-shrink-wrapping.ll
>>>>>    llvm/trunk/test/DebugInfo/X86/live-debug-variables.ll
>>>>>    llvm/trunk/test/DebugInfo/X86/spill-nospill.ll
>>>>>
>>>>> Modified: llvm/trunk/include/llvm/CodeGen/Passes.h
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/Passes.h?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/include/llvm/CodeGen/Passes.h (original)
>>>>> +++ llvm/trunk/include/llvm/CodeGen/Passes.h Mon Oct  2 15:01:37 2017
>>>>> @@ -278,6 +278,11 @@ namespace llvm {
>>>>>   /// MachineSinking - This pass performs sinking on machine
>>>>> instructions.
>>>>>   extern char &MachineSinkingID;
>>>>>
>>>>> +  /// MachineCopyPropagationPreRegRewrite - This pass performs copy
>>>>> propagation
>>>>> +  /// on machine instructions after register allocation but before
>>>>> virtual
>>>>> +  /// register re-writing..
>>>>> +  extern char &MachineCopyPropagationPreRegRewriteID;
>>>>> +
>>>>>   /// MachineCopyPropagation - This pass performs copy propagation on
>>>>>   /// machine instructions.
>>>>>   extern char &MachineCopyPropagationID;
>>>>>
>>>>> Modified: llvm/trunk/include/llvm/InitializePasses.h
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/InitializePasses.h?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/include/llvm/InitializePasses.h (original)
>>>>> +++ llvm/trunk/include/llvm/InitializePasses.h Mon Oct  2 15:01:37 2017
>>>>> @@ -233,6 +233,7 @@ void initializeMachineBranchProbabilityI
>>>>> void initializeMachineCSEPass(PassRegistry&);
>>>>> void initializeMachineCombinerPass(PassRegistry&);
>>>>> void initializeMachineCopyPropagationPass(PassRegistry&);
>>>>> +void initializeMachineCopyPropagationPreRegRewritePass(PassRegistry&);
>>>>> void initializeMachineDominanceFrontierPass(PassRegistry&);
>>>>> void initializeMachineDominatorTreePass(PassRegistry&);
>>>>> void initializeMachineFunctionPrinterPassPass(PassRegistry&);
>>>>>
>>>>> Modified: llvm/trunk/lib/CodeGen/CodeGen.cpp
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/CodeGen.cpp?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/lib/CodeGen/CodeGen.cpp (original)
>>>>> +++ llvm/trunk/lib/CodeGen/CodeGen.cpp Mon Oct  2 15:01:37 2017
>>>>> @@ -53,6 +53,7 @@ void llvm::initializeCodeGen(PassRegistr
>>>>>   initializeMachineCSEPass(Registry);
>>>>>   initializeMachineCombinerPass(Registry);
>>>>>   initializeMachineCopyPropagationPass(Registry);
>>>>> +  initializeMachineCopyPropagationPreRegRewritePass(Registry);
>>>>>   initializeMachineDominatorTreePass(Registry);
>>>>>   initializeMachineFunctionPrinterPassPass(Registry);
>>>>>   initializeMachineLICMPass(Registry);
>>>>>
>>>>> Modified: llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp (original)
>>>>> +++ llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp Mon Oct  2
>>>>> 15:01:37 2017
>>>>> @@ -7,25 +7,71 @@
>>>>> //
>>>>>
>>>>> //===----------------------------------------------------------------------===//
>>>>> //
>>>>> -// This is an extremely simple MachineInstr-level copy propagation
>>>>> pass.
>>>>> +// This is a simple MachineInstr-level copy forwarding pass.  It may
>>>>> be run at
>>>>> +// two places in the codegen pipeline:
>>>>> +//   - After register allocation but before virtual registers have
>>>>> been remapped
>>>>> +//     to physical registers.
>>>>> +//   - After physical register remapping.
>>>>> +//
>>>>> +// The optimizations done vary slightly based on whether virtual
>>>>> registers are
>>>>> +// still present.  In both cases, this pass forwards the source of
>>>>> COPYs to the
>>>>> +// users of their destinations when doing so is legal.  For example:
>>>>> +//
>>>>> +//   %vreg1 = COPY %vreg0
>>>>> +//   ...
>>>>> +//   ... = OP %vreg1
>>>>> +//
>>>>> +// If
>>>>> +//   - the physical register assigned to %vreg0 has not been clobbered
>>>>> by the
>>>>> +//     time of the use of %vreg1
>>>>> +//   - the register class constraints are satisfied
>>>>> +//   - the COPY def is the only value that reaches OP
>>>>> +// then this pass replaces the above with:
>>>>> +//
>>>>> +//   %vreg1 = COPY %vreg0
>>>>> +//   ...
>>>>> +//   ... = OP %vreg0
>>>>> +//
>>>>> +// and updates the relevant state required by VirtRegMap (e.g.
>>>>> LiveIntervals).
>>>>> +// COPYs whose LiveIntervals become dead as a result of this
>>>>> forwarding (i.e. if
>>>>> +// all uses of %vreg1 are changed to %vreg0) are removed.
>>>>> +//
>>>>> +// When being run with only physical registers, this pass will also
>>>>> remove some
>>>>> +// redundant COPYs.  For example:
>>>>> +//
>>>>> +//    %R1 = COPY %R0
>>>>> +//    ... // No clobber of %R1
>>>>> +//    %R0 = COPY %R1 <<< Removed
>>>>> +//
>>>>> +// or
>>>>> +//
>>>>> +//    %R1 = COPY %R0
>>>>> +//    ... // No clobber of %R0
>>>>> +//    %R1 = COPY %R0 <<< Removed
>>>>> //
>>>>>
>>>>> //===----------------------------------------------------------------------===//
>>>>>
>>>>> +#include "LiveDebugVariables.h"
>>>>> #include "llvm/ADT/DenseMap.h"
>>>>> #include "llvm/ADT/STLExtras.h"
>>>>> #include "llvm/ADT/SetVector.h"
>>>>> #include "llvm/ADT/SmallVector.h"
>>>>> #include "llvm/ADT/Statistic.h"
>>>>> #include "llvm/ADT/iterator_range.h"
>>>>> +#include "llvm/CodeGen/LiveRangeEdit.h"
>>>>> +#include "llvm/CodeGen/LiveStackAnalysis.h"
>>>>> #include "llvm/CodeGen/MachineBasicBlock.h"
>>>>> #include "llvm/CodeGen/MachineFunction.h"
>>>>> #include "llvm/CodeGen/MachineFunctionPass.h"
>>>>> #include "llvm/CodeGen/MachineInstr.h"
>>>>> #include "llvm/CodeGen/MachineOperand.h"
>>>>> #include "llvm/CodeGen/MachineRegisterInfo.h"
>>>>> +#include "llvm/CodeGen/Passes.h"
>>>>> +#include "llvm/CodeGen/VirtRegMap.h"
>>>>> #include "llvm/MC/MCRegisterInfo.h"
>>>>> #include "llvm/Pass.h"
>>>>> #include "llvm/Support/Debug.h"
>>>>> +#include "llvm/Support/DebugCounter.h"
>>>>> #include "llvm/Support/raw_ostream.h"
>>>>> #include "llvm/Target/TargetInstrInfo.h"
>>>>> #include "llvm/Target/TargetRegisterInfo.h"
>>>>> @@ -38,6 +84,9 @@ using namespace llvm;
>>>>> #define DEBUG_TYPE "machine-cp"
>>>>>
>>>>> STATISTIC(NumDeletes, "Number of dead copies deleted");
>>>>> +STATISTIC(NumCopyForwards, "Number of copy uses forwarded");
>>>>> +DEBUG_COUNTER(FwdCounter, "machine-cp-fwd",
>>>>> +              "Controls which register COPYs are forwarded");
>>>>>
>>>>> namespace {
>>>>>
>>>>> @@ -45,19 +94,42 @@ using RegList = SmallVector<unsigned, 4>
>>>>> using SourceMap = DenseMap<unsigned, RegList>;
>>>>> using Reg2MIMap = DenseMap<unsigned, MachineInstr *>;
>>>>>
>>>>> -  class MachineCopyPropagation : public MachineFunctionPass {
>>>>> +  class MachineCopyPropagation : public MachineFunctionPass,
>>>>> +                                 private LiveRangeEdit::Delegate {
>>>>>     const TargetRegisterInfo *TRI;
>>>>>     const TargetInstrInfo *TII;
>>>>> -    const MachineRegisterInfo *MRI;
>>>>> +    MachineRegisterInfo *MRI;
>>>>> +    MachineFunction *MF;
>>>>> +    SlotIndexes *Indexes;
>>>>> +    LiveIntervals *LIS;
>>>>> +    const VirtRegMap *VRM;
>>>>> +    // True if this pass being run before virtual registers are
>>>>> remapped to
>>>>> +    // physical ones.
>>>>> +    bool PreRegRewrite;
>>>>> +    bool NoSubRegLiveness;
>>>>> +
>>>>> +  protected:
>>>>> +    MachineCopyPropagation(char &ID, bool PreRegRewrite)
>>>>> +        : MachineFunctionPass(ID), PreRegRewrite(PreRegRewrite) {}
>>>>>
>>>>>   public:
>>>>>     static char ID; // Pass identification, replacement for typeid
>>>>>
>>>>> -    MachineCopyPropagation() : MachineFunctionPass(ID) {
>>>>> +    MachineCopyPropagation() : MachineCopyPropagation(ID, false) {
>>>>>
>>>>> initializeMachineCopyPropagationPass(*PassRegistry::getPassRegistry());
>>>>>     }
>>>>>
>>>>>     void getAnalysisUsage(AnalysisUsage &AU) const override {
>>>>> +      if (PreRegRewrite) {
>>>>> +        AU.addRequired<SlotIndexes>();
>>>>> +        AU.addPreserved<SlotIndexes>();
>>>>> +        AU.addRequired<LiveIntervals>();
>>>>> +        AU.addPreserved<LiveIntervals>();
>>>>> +        AU.addRequired<VirtRegMap>();
>>>>> +        AU.addPreserved<VirtRegMap>();
>>>>> +        AU.addPreserved<LiveDebugVariables>();
>>>>> +        AU.addPreserved<LiveStacks>();
>>>>> +      }
>>>>>       AU.setPreservesCFG();
>>>>>       MachineFunctionPass::getAnalysisUsage(AU);
>>>>>     }
>>>>> @@ -65,6 +137,10 @@ using Reg2MIMap = DenseMap<unsigned, Mac
>>>>>     bool runOnMachineFunction(MachineFunction &MF) override;
>>>>>
>>>>>     MachineFunctionProperties getRequiredProperties() const override {
>>>>> +      if (PreRegRewrite)
>>>>> +        return MachineFunctionProperties()
>>>>> +            .set(MachineFunctionProperties::Property::NoPHIs)
>>>>> +            .set(MachineFunctionProperties::Property::TracksLiveness);
>>>>>       return MachineFunctionProperties().set(
>>>>>           MachineFunctionProperties::Property::NoVRegs);
>>>>>     }
>>>>> @@ -74,9 +150,33 @@ using Reg2MIMap = DenseMap<unsigned, Mac
>>>>>     void ReadRegister(unsigned Reg);
>>>>>     void CopyPropagateBlock(MachineBasicBlock &MBB);
>>>>>     bool eraseIfRedundant(MachineInstr &Copy, unsigned Src, unsigned
>>>>> Def);
>>>>> +    unsigned getPhysReg(unsigned Reg, unsigned SubReg);
>>>>> +    unsigned getPhysReg(const MachineOperand &Opnd) {
>>>>> +      return getPhysReg(Opnd.getReg(), Opnd.getSubReg());
>>>>> +    }
>>>>> +    unsigned getFullPhysReg(const MachineOperand &Opnd) {
>>>>> +      return getPhysReg(Opnd.getReg(), 0);
>>>>> +    }
>>>>> +    void forwardUses(MachineInstr &MI);
>>>>> +    bool isForwardableRegClassCopy(const MachineInstr &Copy,
>>>>> +                                   const MachineInstr &UseI);
>>>>> +    std::tuple<unsigned, unsigned, bool>
>>>>> +    checkUseSubReg(const MachineOperand &CopySrc, const MachineOperand
>>>>> &MOUse);
>>>>> +    bool hasImplicitOverlap(const MachineInstr &MI, const
>>>>> MachineOperand &Use);
>>>>> +    void narrowRegClass(const MachineInstr &MI, const MachineOperand
>>>>> &MOUse,
>>>>> +                        unsigned NewUseReg, unsigned NewUseSubReg);
>>>>> +    void updateForwardedCopyLiveInterval(const MachineInstr &Copy,
>>>>> +                                         const MachineInstr &UseMI,
>>>>> +                                         bool UseIsEarlyClobber,
>>>>> +                                         unsigned OrigUseReg,
>>>>> +                                         unsigned NewUseReg,
>>>>> +                                         unsigned NewUseSubReg);
>>>>> +    /// LiveRangeEdit callback for eliminateDeadDefs().
>>>>> +    void LRE_WillEraseInstruction(MachineInstr *MI) override;
>>>>>
>>>>>     /// Candidates for deletion.
>>>>>     SmallSetVector<MachineInstr*, 8> MaybeDeadCopies;
>>>>> +    SmallVector<MachineInstr*, 8> ShrunkDeadInsts;
>>>>>
>>>>>     /// Def -> available copies map.
>>>>>     Reg2MIMap AvailCopyMap;
>>>>> @@ -90,6 +190,14 @@ using Reg2MIMap = DenseMap<unsigned, Mac
>>>>>     bool Changed;
>>>>>   };
>>>>>
>>>>> +  class MachineCopyPropagationPreRegRewrite : public
>>>>> MachineCopyPropagation {
>>>>> +  public:
>>>>> +    static char ID; // Pass identification, replacement for typeid
>>>>> +    MachineCopyPropagationPreRegRewrite()
>>>>> +        : MachineCopyPropagation(ID, true) {
>>>>> +
>>>>> initializeMachineCopyPropagationPreRegRewritePass(*PassRegistry::getPassRegistry());
>>>>> +    }
>>>>> +  };
>>>>> } // end anonymous namespace
>>>>>
>>>>> char MachineCopyPropagation::ID = 0;
>>>>> @@ -99,6 +207,29 @@ char &llvm::MachineCopyPropagationID = M
>>>>> INITIALIZE_PASS(MachineCopyPropagation, DEBUG_TYPE,
>>>>>                 "Machine Copy Propagation Pass", false, false)
>>>>>
>>>>> +/// We have two separate passes that are very similar, the only
>>>>> difference being
>>>>> +/// where they are meant to be run in the pipeline.  This is done for
>>>>> several
>>>>> +/// reasons:
>>>>> +/// - the two passes have different dependencies
>>>>> +/// - some targets want to disable the later run of this pass, but not
>>>>> the
>>>>> +///   earlier one (e.g. NVPTX and WebAssembly)
>>>>> +/// - it allows for easier debugging via llc
>>>>> +
>>>>> +char MachineCopyPropagationPreRegRewrite::ID = 0;
>>>>> +char &llvm::MachineCopyPropagationPreRegRewriteID =
>>>>> MachineCopyPropagationPreRegRewrite::ID;
>>>>> +
>>>>> +INITIALIZE_PASS_BEGIN(MachineCopyPropagationPreRegRewrite,
>>>>> +                      "machine-cp-prerewrite",
>>>>> +                      "Machine Copy Propagation Pre-Register Rewrite
>>>>> Pass",
>>>>> +                      false, false)
>>>>> +INITIALIZE_PASS_DEPENDENCY(SlotIndexes)
>>>>> +INITIALIZE_PASS_DEPENDENCY(LiveIntervals)
>>>>> +INITIALIZE_PASS_DEPENDENCY(VirtRegMap)
>>>>> +INITIALIZE_PASS_END(MachineCopyPropagationPreRegRewrite,
>>>>> +                    "machine-cp-prerewrite",
>>>>> +                    "Machine Copy Propagation Pre-Register Rewrite
>>>>> Pass", false,
>>>>> +                    false)
>>>>> +
>>>>> /// Remove any entry in \p Map where the register is a subregister or
>>>>> equal to
>>>>> /// a register contained in \p Regs.
>>>>> static void removeRegsFromMap(Reg2MIMap &Map, const RegList &Regs,
>>>>> @@ -139,6 +270,10 @@ void MachineCopyPropagation::ClobberRegi
>>>>> }
>>>>>
>>>>> void MachineCopyPropagation::ReadRegister(unsigned Reg) {
>>>>> +  // We don't track MaybeDeadCopies when running pre-VirtRegRewriter.
>>>>> +  if (PreRegRewrite)
>>>>> +    return;
>>>>> +
>>>>>   // If 'Reg' is defined by a copy, the copy is no longer a candidate
>>>>>   // for elimination.
>>>>>   for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI) {
>>>>> @@ -170,6 +305,46 @@ static bool isNopCopy(const MachineInstr
>>>>>   return SubIdx == TRI->getSubRegIndex(PreviousDef, Def);
>>>>> }
>>>>>
>>>>> +/// Return the physical register assigned to \p Reg if it is a virtual
>>>>> register,
>>>>> +/// otherwise just return the physical reg from the operand itself.
>>>>> +///
>>>>> +/// If \p SubReg is 0 then return the full physical register assigned
>>>>> to the
>>>>> +/// virtual register ignoring subregs.  If we aren't tracking sub-reg
>>>>> liveness
>>>>> +/// then we need to use this to be more conservative with clobbers by
>>>>> killing
>>>>> +/// all super reg and their sub reg COPYs as well.  This is to prevent
>>>>> COPY
>>>>> +/// forwarding in cases like the following:
>>>>> +///
>>>>> +///    %vreg2 = COPY %vreg1:sub1
>>>>> +///    %vreg3 = COPY %vreg1:sub0
>>>>> +///    ...    = OP1 %vreg2
>>>>> +///    ...    = OP2 %vreg3
>>>>> +///
>>>>> +/// After forward %vreg2 (assuming this is the last use of %vreg1) and
>>>>> +/// VirtRegRewriter adding kill markers we have:
>>>>> +///
>>>>> +///    %vreg3 = COPY %vreg1:sub0
>>>>> +///    ...    = OP1 %vreg1:sub1<kill>
>>>>> +///    ...    = OP2 %vreg3
>>>>> +///
>>>>> +/// If %vreg3 is assigned to a sub-reg of %vreg1, then after rewriting
>>>>> we have:
>>>>> +///
>>>>> +///    ...     = OP1 R0:sub1, R0<imp-use,kill>
>>>>> +///    ...     = OP2 R0:sub0
>>>>> +///
>>>>> +/// and the use of R0 by OP2 will not have a valid definition.
>>>>> +unsigned MachineCopyPropagation::getPhysReg(unsigned Reg, unsigned
>>>>> SubReg) {
>>>>> +
>>>>> +  // Physical registers cannot have subregs.
>>>>> +  if (!TargetRegisterInfo::isVirtualRegister(Reg))
>>>>> +    return Reg;
>>>>> +
>>>>> +  assert(PreRegRewrite && "Unexpected virtual register encountered");
>>>>> +  Reg = VRM->getPhys(Reg);
>>>>> +  if (SubReg && !NoSubRegLiveness)
>>>>> +    Reg = TRI->getSubReg(Reg, SubReg);
>>>>> +  return Reg;
>>>>> +}
>>>>> +
>>>>> /// Remove instruction \p Copy if there exists a previous copy that
>>>>> copies the
>>>>> /// register \p Src to the register \p Def; This may happen indirectly
>>>>> by
>>>>> /// copying the super registers.
>>>>> @@ -207,6 +382,397 @@ bool MachineCopyPropagation::eraseIfRedu
>>>>>   return true;
>>>>> }
>>>>>
>>>>> +
>>>>> +/// Decide whether we should forward the destination of \param Copy to
>>>>> its use
>>>>> +/// in \param UseI based on the register class of the Copy operands.
>>>>> Same-class
>>>>> +/// COPYs are always accepted by this function, but cross-class COPYs
>>>>> are only
>>>>> +/// accepted if they are forwarded to another COPY with the operand
>>>>> register
>>>>> +/// classes reversed.  For example:
>>>>> +///
>>>>> +///   RegClassA = COPY RegClassB  // Copy parameter
>>>>> +///   ...
>>>>> +///   RegClassB = COPY RegClassA  // UseI parameter
>>>>> +///
>>>>> +/// which after forwarding becomes
>>>>> +///
>>>>> +///   RegClassA = COPY RegClassB
>>>>> +///   ...
>>>>> +///   RegClassB = COPY RegClassB
>>>>> +///
>>>>> +/// so we have reduced the number of cross-class COPYs and potentially
>>>>> +/// introduced a no COPY that can be removed.
>>>>> +bool MachineCopyPropagation::isForwardableRegClassCopy(
>>>>> +    const MachineInstr &Copy, const MachineInstr &UseI) {
>>>>> +  auto isCross = [&](const MachineOperand &Dst, const MachineOperand
>>>>> &Src) {
>>>>> +    unsigned DstReg = Dst.getReg();
>>>>> +    unsigned SrcPhysReg = getPhysReg(Src);
>>>>> +    const TargetRegisterClass *DstRC;
>>>>> +    if (TargetRegisterInfo::isVirtualRegister(DstReg)) {
>>>>> +      DstRC = MRI->getRegClass(DstReg);
>>>>> +      unsigned DstSubReg = Dst.getSubReg();
>>>>> +      if (DstSubReg)
>>>>> +        SrcPhysReg = TRI->getMatchingSuperReg(SrcPhysReg, DstSubReg,
>>>>> DstRC);
>>>>> +    } else
>>>>> +      DstRC = TRI->getMinimalPhysRegClass(DstReg);
>>>>> +
>>>>> +    return !DstRC->contains(SrcPhysReg);
>>>>> +  };
>>>>> +
>>>>> +  const MachineOperand &CopyDst = Copy.getOperand(0);
>>>>> +  const MachineOperand &CopySrc = Copy.getOperand(1);
>>>>> +
>>>>> +  if (!isCross(CopyDst, CopySrc))
>>>>> +    return true;
>>>>> +
>>>>> +  if (!UseI.isCopy())
>>>>> +    return false;
>>>>> +
>>>>> +  assert(getFullPhysReg(UseI.getOperand(1)) ==
>>>>> getFullPhysReg(CopyDst));
>>>>> +  return !isCross(UseI.getOperand(0), CopySrc);
>>>>> +}
>>>>> +
>>>>> +/// Check that the subregs on the copy source operand (\p CopySrc) and
>>>>> the use
>>>>> +/// operand to be forwarded to (\p MOUse) are compatible with doing
>>>>> the
>>>>> +/// forwarding.  Also computes the new register and subregister to be
>>>>> used in
>>>>> +/// the forwarded-to instruction.
>>>>> +std::tuple<unsigned, unsigned, bool>
>>>>> MachineCopyPropagation::checkUseSubReg(
>>>>> +    const MachineOperand &CopySrc, const MachineOperand &MOUse) {
>>>>> +  unsigned NewUseReg = CopySrc.getReg();
>>>>> +  unsigned NewUseSubReg;
>>>>> +
>>>>> +  if (TargetRegisterInfo::isPhysicalRegister(NewUseReg)) {
>>>>> +    // If MOUse is a virtual reg, we need to apply it to the new
>>>>> physical reg
>>>>> +    // we're going to replace it with.
>>>>> +    if (MOUse.getSubReg())
>>>>> +      NewUseReg = TRI->getSubReg(NewUseReg, MOUse.getSubReg());
>>>>> +    // If the original use subreg isn't valid on the new src reg, we
>>>>> can't
>>>>> +    // forward it here.
>>>>> +    if (!NewUseReg)
>>>>> +      return std::make_tuple(0, 0, false);
>>>>> +    NewUseSubReg = 0;
>>>>> +  } else {
>>>>> +    // %v1 = COPY %v2:sub1
>>>>> +    //    USE %v1:sub2
>>>>> +    // The new use is %v2:sub1:sub2
>>>>> +    NewUseSubReg =
>>>>> +        TRI->composeSubRegIndices(CopySrc.getSubReg(),
>>>>> MOUse.getSubReg());
>>>>> +    // Check that NewUseSubReg is valid on NewUseReg
>>>>> +    if (NewUseSubReg &&
>>>>> +        !TRI->getSubClassWithSubReg(MRI->getRegClass(NewUseReg),
>>>>> NewUseSubReg))
>>>>> +      return std::make_tuple(0, 0, false);
>>>>> +  }
>>>>> +
>>>>> +  return std::make_tuple(NewUseReg, NewUseSubReg, true);
>>>>> +}
>>>>> +
>>>>> +/// Check that \p MI does not have implicit uses that overlap with
>>>>> it's \p Use
>>>>> +/// operand (the register being replaced), since these can sometimes
>>>>> be
>>>>> +/// implicitly tied to other operands.  For example, on AMDGPU:
>>>>> +///
>>>>> +/// V_MOVRELS_B32_e32 %VGPR2, %M0<imp-use>, %EXEC<imp-use>,
>>>>> %VGPR2_VGPR3_VGPR4_VGPR5<imp-use>
>>>>> +///
>>>>> +/// the %VGPR2 is implicitly tied to the larger reg operand, but we
>>>>> have no
>>>>> +/// way of knowing we need to update the latter when updating the
>>>>> former.
>>>>> +bool MachineCopyPropagation::hasImplicitOverlap(const MachineInstr
>>>>> &MI,
>>>>> +                                                const MachineOperand
>>>>> &Use) {
>>>>> +  if (!TargetRegisterInfo::isPhysicalRegister(Use.getReg()))
>>>>> +    return false;
>>>>> +
>>>>> +  for (const MachineOperand &MIUse : MI.uses())
>>>>> +    if (&MIUse != &Use && MIUse.isReg() && MIUse.isImplicit() &&
>>>>> +        TRI->regsOverlap(Use.getReg(), MIUse.getReg()))
>>>>> +      return true;
>>>>> +
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>> +/// Narrow the register class of the forwarded vreg so it matches any
>>>>> +/// instruction constraints.  \p MI is the instruction being forwarded
>>>>> to. \p
>>>>> +/// MOUse is the operand being replaced in \p MI (which hasn't yet
>>>>> been updated
>>>>> +/// at the time this function is called).  \p NewUseReg and \p
>>>>> NewUseSubReg are
>>>>> +/// what the \p MOUse will be changed to after forwarding.
>>>>> +///
>>>>> +/// If we are forwarding
>>>>> +///    A:RCA = COPY B:RCB
>>>>> +/// into
>>>>> +///    ... = OP A:RCA
>>>>> +///
>>>>> +/// then we need to narrow the register class of B so that it is a
>>>>> subclass
>>>>> +/// of RCA so that it meets the instruction register class
>>>>> constraints.
>>>>> +void MachineCopyPropagation::narrowRegClass(const MachineInstr &MI,
>>>>> +                                            const MachineOperand
>>>>> &MOUse,
>>>>> +                                            unsigned NewUseReg,
>>>>> +                                            unsigned NewUseSubReg) {
>>>>> +  if (!TargetRegisterInfo::isVirtualRegister(NewUseReg))
>>>>> +    return;
>>>>> +
>>>>> +  // Make sure the virtual reg class allows the subreg.
>>>>> +  if (NewUseSubReg) {
>>>>> +    const TargetRegisterClass *CurUseRC = MRI->getRegClass(NewUseReg);
>>>>> +    const TargetRegisterClass *NewUseRC =
>>>>> +        TRI->getSubClassWithSubReg(CurUseRC, NewUseSubReg);
>>>>> +    if (CurUseRC != NewUseRC) {
>>>>> +      DEBUG(dbgs() << "MCP: Setting regclass of " <<
>>>>> PrintReg(NewUseReg, TRI)
>>>>> +                   << " to " << TRI->getRegClassName(NewUseRC) <<
>>>>> "\n");
>>>>> +      MRI->setRegClass(NewUseReg, NewUseRC);
>>>>> +    }
>>>>> +  }
>>>>> +
>>>>> +  unsigned MOUseOpNo = &MOUse - &MI.getOperand(0);
>>>>> +  const TargetRegisterClass *InstRC =
>>>>> +      TII->getRegClass(MI.getDesc(), MOUseOpNo, TRI, *MF);
>>>>> +  if (InstRC) {
>>>>> +    const TargetRegisterClass *CurUseRC = MRI->getRegClass(NewUseReg);
>>>>> +    if (NewUseSubReg)
>>>>> +      InstRC = TRI->getMatchingSuperRegClass(CurUseRC, InstRC,
>>>>> NewUseSubReg);
>>>>> +    if (!InstRC->hasSubClassEq(CurUseRC)) {
>>>>> +      const TargetRegisterClass *NewUseRC =
>>>>> +          TRI->getCommonSubClass(InstRC, CurUseRC);
>>>>> +      DEBUG(dbgs() << "MCP: Setting regclass of " <<
>>>>> PrintReg(NewUseReg, TRI)
>>>>> +                   << " to " << TRI->getRegClassName(NewUseRC) <<
>>>>> "\n");
>>>>> +      MRI->setRegClass(NewUseReg, NewUseRC);
>>>>> +    }
>>>>> +  }
>>>>> +}
>>>>> +
>>>>> +/// Update the LiveInterval information to reflect the destination of
>>>>> \p Copy
>>>>> +/// being forwarded to a use in \p UseMI.  \p OrigUseReg is the
>>>>> register being
>>>>> +/// forwarded through. It should be the destination register of \p
>>>>> Copy and has
>>>>> +/// already been replaced in \p UseMI at the point this function is
>>>>> called.  \p
>>>>> +/// NewUseReg and \p NewUseSubReg are the register and subregister
>>>>> being
>>>>> +/// forwarded.  They should be the source register of the \p Copy and
>>>>> should be
>>>>> +/// the value of the \p UseMI operand being forwarded at the point
>>>>> this function
>>>>> +/// is called.  \p UseIsEarlyClobber is true if the use being
>>>>> re-written is in
>>>>> +/// the EarlyClobber slot index (as opposed to the register slot of
>>>>> most
>>>>> +/// register operands).
>>>>> +void MachineCopyPropagation::updateForwardedCopyLiveInterval(
>>>>> +    const MachineInstr &Copy, const MachineInstr &UseMI, bool
>>>>> UseIsEarlyClobber,
>>>>> +    unsigned OrigUseReg, unsigned NewUseReg, unsigned NewUseSubReg) {
>>>>> +
>>>>> +  assert(TRI->isSubRegisterEq(getPhysReg(OrigUseReg, 0),
>>>>> +                              getFullPhysReg(Copy.getOperand(0))) &&
>>>>> +         "OrigUseReg mismatch");
>>>>> +  assert(TRI->isSubRegisterEq(getFullPhysReg(Copy.getOperand(1)),
>>>>> +                              getPhysReg(NewUseReg, 0)) &&
>>>>> +         "NewUseReg mismatch");
>>>>> +
>>>>> +  // Extend live range starting from COPY early-clobber slot, since
>>>>> that
>>>>> +  // is where the original src live range ends.
>>>>> +  SlotIndex CopyUseIdx =
>>>>> +      Indexes->getInstructionIndex(Copy).getRegSlot(true
>>>>> /*=EarlyClobber*/);
>>>>> +  SlotIndex UseEndIdx =
>>>>> +
>>>>> Indexes->getInstructionIndex(UseMI).getRegSlot(UseIsEarlyClobber);
>>>>> +  if (TargetRegisterInfo::isVirtualRegister(NewUseReg)) {
>>>>> +    LiveInterval &LI = LIS->getInterval(NewUseReg);
>>>>> +    LI.extendInBlock(CopyUseIdx, UseEndIdx);
>>>>> +    LaneBitmask UseMask = TRI->getSubRegIndexLaneMask(NewUseSubReg);
>>>>> +    for (auto &S : LI.subranges())
>>>>> +      if ((S.LaneMask & UseMask).any() && S.find(CopyUseIdx))
>>>>> +        S.extendInBlock(CopyUseIdx, UseEndIdx);
>>>>> +  } else {
>>>>> +    assert(NewUseSubReg == 0 && "Unexpected subreg on physical
>>>>> register!");
>>>>> +    for (MCRegUnitIterator UI(NewUseReg, TRI); UI.isValid(); ++UI) {
>>>>> +      LiveRange &LR = LIS->getRegUnit(*UI);
>>>>> +      LR.extendInBlock(CopyUseIdx, UseEndIdx);
>>>>> +    }
>>>>> +  }
>>>>> +
>>>>> +  if (!TargetRegisterInfo::isVirtualRegister(OrigUseReg))
>>>>> +    return;
>>>>> +
>>>>> +  // Shrink the live-range for the old use reg if the forwarded use
>>>>> was it's
>>>>> +  // last use.
>>>>> +  LiveInterval &OrigUseLI = LIS->getInterval(OrigUseReg);
>>>>> +
>>>>> +  // Can happen for undef uses.
>>>>> +  if (OrigUseLI.empty())
>>>>> +    return;
>>>>> +
>>>>> +  SlotIndex CopyDefIdx =
>>>>> Indexes->getInstructionIndex(Copy).getRegSlot();
>>>>> +  const LiveRange::Segment *OrigUseSeg =
>>>>> +      OrigUseLI.getSegmentContaining(CopyDefIdx);
>>>>> +
>>>>> +  // Only shrink if forwarded use is the end of a segment.
>>>>> +  if (OrigUseSeg->end != UseEndIdx)
>>>>> +    return;
>>>>> +
>>>>> +  LIS->shrinkToUses(&OrigUseLI, &ShrunkDeadInsts);
>>>>> +}
>>>>> +
>>>>> +void MachineCopyPropagation::LRE_WillEraseInstruction(MachineInstr
>>>>> *MI) {
>>>>> +  Changed = true;
>>>>> +}
>>>>> +
>>>>> +/// Look for available copies whose destination register is used by \p
>>>>> MI and
>>>>> +/// replace the use in \p MI with the copy's source register.
>>>>> +void MachineCopyPropagation::forwardUses(MachineInstr &MI) {
>>>>> +  // We can't generally forward uses after virtual registers have been
>>>>> renamed
>>>>> +  // because some targets generate code that has implicit dependencies
>>>>> on the
>>>>> +  // physical register numbers.  For example, in PowerPC, when
>>>>> spilling
>>>>> +  // condition code registers, the following code pattern is
>>>>> generated:
>>>>> +  //
>>>>> +  //   %CR7 = COPY %CR0
>>>>> +  //   %R6 = MFOCRF %CR7
>>>>> +  //   %R6 = RLWINM %R6, 29, 31, 31
>>>>> +  //
>>>>> +  // where the shift amount in the RLWINM instruction depends on the
>>>>> source
>>>>> +  // register number of the MFOCRF instruction.  If we were to forward
>>>>> %CR0 to
>>>>> +  // the MFOCRF instruction, the shift amount would no longer be
>>>>> correct.
>>>>> +  //
>>>>> +  // FIXME: It may be possible to define a target hook that checks the
>>>>> register
>>>>> +  // class or user opcode and allows some cases, but prevents cases
>>>>> like the
>>>>> +  // above from being broken to enable later register copy forwarding.
>>>>> +  if (!PreRegRewrite)
>>>>> +    return;
>>>>> +
>>>>> +  if (AvailCopyMap.empty())
>>>>> +    return;
>>>>> +
>>>>> +  // Look for non-tied explicit vreg uses that have an active COPY
>>>>> +  // instruction that defines the physical register allocated to them.
>>>>> +  // Replace the vreg with the source of the active COPY.
>>>>> +  for (MachineOperand &MOUse : MI.explicit_uses()) {
>>>>> +    // Don't forward into undef use operands since doing so can cause
>>>>> problems
>>>>> +    // with the machine verifier, since it doesn't treat undef reads
>>>>> as reads,
>>>>> +    // so we can end up with a live range that ends on an undef read,
>>>>> leading to
>>>>> +    // an error that the live range doesn't end on a read of the live
>>>>> range
>>>>> +    // register.
>>>>> +    if (!MOUse.isReg() || MOUse.isTied() || MOUse.isUndef() ||
>>>>> MOUse.isDef())
>>>>> +      continue;
>>>>> +
>>>>> +    unsigned UseReg = MOUse.getReg();
>>>>> +    if (!UseReg)
>>>>> +      continue;
>>>>> +
>>>>> +    // See comment above check for !PreRegRewrite regarding forwarding
>>>>> changing
>>>>> +    // physical registers.
>>>>> +    if (!TargetRegisterInfo::isVirtualRegister(UseReg))
>>>>> +      continue;
>>>>> +    else {
>>>>> +      // FIXME: Don't forward COPYs to a use that is on an instruction
>>>>> that
>>>>> +      // re-defines the same virtual register.  This leads to machine
>>>>> +      // verification failures because of a bug in the greedy
>>>>> +      // allocator/verifier.  The bug is that just after greedy
>>>>> regalloc, we can
>>>>> +      // end up with code that looks like:
>>>>> +      //
>>>>> +      // %vreg1<def> = ...
>>>>> +      // ...
>>>>> +      // ... = %vreg1
>>>>> +      // ...
>>>>> +      // %vreg1<def> = %vreg1
>>>>> +      // ...
>>>>> +      //
>>>>> +      // verifyLiveInterval() accepts this code as valid since it sees
>>>>> the
>>>>> +      // second def as part of the same live interval component
>>>>> because
>>>>> +      // ConnectedVNInfoEqClasses::Classify() sees this second def as
>>>>> a
>>>>> +      // "two-addr" redefinition, even though the def and source
>>>>> operands are
>>>>> +      // not tied.
>>>>> +      //
>>>>> +      // If we replace just the second use of %vreg1 in the above
>>>>> code, then we
>>>>> +      // end up with:
>>>>> +      //
>>>>> +      // ...
>>>>> +      // %vreg1<def> = ...
>>>>> +      // ...
>>>>> +      // ... = %vreg1
>>>>> +      // ...
>>>>> +      // %vreg1<def> = *%vreg2*
>>>>> +      // ...
>>>>> +      //
>>>>> +      // verifyLiveInterval() now rejects this code since it sees
>>>>> these two def
>>>>> +      // live ranges as being separate components.  To get rid of this
>>>>> +      // restriction on forwarding, regalloc greedy would need to be
>>>>> fixed to
>>>>> +      // avoid generating code like the first snippet above, as well
>>>>> as the
>>>>> +      // verifier being fixed to reject such code.
>>>>> +      if (llvm::any_of(MI.defs(), [UseReg](MachineOperand &Def) {
>>>>> +            if (!Def.isReg() || !Def.isDef() || Def.getReg() !=
>>>>> UseReg)
>>>>> +              return false;
>>>>> +            // Only tied use/def operands or a subregister def without
>>>>> the undef
>>>>> +            // flag should result in connected liveranges.
>>>>> +            if (Def.isTied())
>>>>> +              return false;
>>>>> +            if (Def.getSubReg() != 0 && Def.isUndef())
>>>>> +              return false;
>>>>> +            return true;
>>>>> +          }))
>>>>> +        continue;
>>>>> +    }
>>>>> +
>>>>> +    UseReg = VRM->getPhys(UseReg);
>>>>> +
>>>>> +    // Don't forward COPYs via non-allocatable regs since they can
>>>>> have
>>>>> +    // non-standard semantics.
>>>>> +    if (!MRI->isAllocatable(UseReg))
>>>>> +      continue;
>>>>> +
>>>>> +    auto CI = AvailCopyMap.find(UseReg);
>>>>> +    if (CI == AvailCopyMap.end())
>>>>> +      continue;
>>>>> +
>>>>> +    MachineInstr &Copy = *CI->second;
>>>>> +    MachineOperand &CopyDst = Copy.getOperand(0);
>>>>> +    MachineOperand &CopySrc = Copy.getOperand(1);
>>>>> +
>>>>> +    // Don't forward COPYs that are already NOPs due to register
>>>>> assignment.
>>>>> +    if (getPhysReg(CopyDst) == getPhysReg(CopySrc))
>>>>> +      continue;
>>>>> +
>>>>> +    // FIXME: Don't handle partial uses of wider COPYs yet.
>>>>> +    if (CopyDst.getSubReg() != 0 || UseReg != getPhysReg(CopyDst))
>>>>> +      continue;
>>>>> +
>>>>> +    // Don't forward COPYs of non-allocatable regs unless they are
>>>>> constant.
>>>>> +    unsigned CopySrcReg = CopySrc.getReg();
>>>>> +    if (TargetRegisterInfo::isPhysicalRegister(CopySrcReg) &&
>>>>> +        !MRI->isAllocatable(CopySrcReg) &&
>>>>> !MRI->isConstantPhysReg(CopySrcReg))
>>>>> +      continue;
>>>>> +
>>>>> +    if (!isForwardableRegClassCopy(Copy, MI))
>>>>> +      continue;
>>>>> +
>>>>> +    unsigned NewUseReg, NewUseSubReg;
>>>>> +    bool SubRegOK;
>>>>> +    std::tie(NewUseReg, NewUseSubReg, SubRegOK) =
>>>>> +        checkUseSubReg(CopySrc, MOUse);
>>>>> +    if (!SubRegOK)
>>>>> +      continue;
>>>>> +
>>>>> +    if (hasImplicitOverlap(MI, MOUse))
>>>>> +      continue;
>>>>> +
>>>>> +    if (!DebugCounter::shouldExecute(FwdCounter))
>>>>> +      continue;
>>>>> +
>>>>> +    DEBUG(dbgs() << "MCP: Replacing "
>>>>> +          << PrintReg(MOUse.getReg(), TRI, MOUse.getSubReg())
>>>>> +          << "\n     with "
>>>>> +          << PrintReg(NewUseReg, TRI, CopySrc.getSubReg())
>>>>> +          << "\n     in "
>>>>> +          << MI
>>>>> +          << "     from "
>>>>> +          << Copy);
>>>>> +
>>>>> +    narrowRegClass(MI, MOUse, NewUseReg, NewUseSubReg);
>>>>> +
>>>>> +    unsigned OrigUseReg = MOUse.getReg();
>>>>> +    MOUse.setReg(NewUseReg);
>>>>> +    MOUse.setSubReg(NewUseSubReg);
>>>>> +
>>>>> +    DEBUG(dbgs() << "MCP: After replacement: " << MI << "\n");
>>>>> +
>>>>> +    if (PreRegRewrite)
>>>>> +      updateForwardedCopyLiveInterval(Copy, MI,
>>>>> MOUse.isEarlyClobber(),
>>>>> +                                      OrigUseReg, NewUseReg,
>>>>> NewUseSubReg);
>>>>> +    else
>>>>> +      for (MachineInstr &KMI :
>>>>> +             make_range(Copy.getIterator(),
>>>>> std::next(MI.getIterator())))
>>>>> +        KMI.clearRegisterKills(NewUseReg, TRI);
>>>>> +
>>>>> +    ++NumCopyForwards;
>>>>> +    Changed = true;
>>>>> +  }
>>>>> +}
>>>>> +
>>>>> void MachineCopyPropagation::CopyPropagateBlock(MachineBasicBlock &MBB)
>>>>> {
>>>>>   DEBUG(dbgs() << "MCP: CopyPropagateBlock " << MBB.getName() << "\n");
>>>>>
>>>>> @@ -215,12 +781,8 @@ void MachineCopyPropagation::CopyPropaga
>>>>>     ++I;
>>>>>
>>>>>     if (MI->isCopy()) {
>>>>> -      unsigned Def = MI->getOperand(0).getReg();
>>>>> -      unsigned Src = MI->getOperand(1).getReg();
>>>>> -
>>>>> -      assert(!TargetRegisterInfo::isVirtualRegister(Def) &&
>>>>> -             !TargetRegisterInfo::isVirtualRegister(Src) &&
>>>>> -             "MachineCopyPropagation should be run after register
>>>>> allocation!");
>>>>> +      unsigned Def = getPhysReg(MI->getOperand(0));
>>>>> +      unsigned Src = getPhysReg(MI->getOperand(1));
>>>>>
>>>>>       // The two copies cancel out and the source of the first copy
>>>>>       // hasn't been overridden, eliminate the second one. e.g.
>>>>> @@ -237,8 +799,16 @@ void MachineCopyPropagation::CopyPropaga
>>>>>       //  %ECX<def> = COPY %EAX
>>>>>       // =>
>>>>>       //  %ECX<def> = COPY %EAX
>>>>> -      if (eraseIfRedundant(*MI, Def, Src) || eraseIfRedundant(*MI,
>>>>> Src, Def))
>>>>> -        continue;
>>>>> +      if (!PreRegRewrite)
>>>>> +        if (eraseIfRedundant(*MI, Def, Src) || eraseIfRedundant(*MI,
>>>>> Src, Def))
>>>>> +          continue;
>>>>> +
>>>>> +      forwardUses(*MI);
>>>>> +
>>>>> +      // Src may have been changed by forwardUses()
>>>>> +      Src = getPhysReg(MI->getOperand(1));
>>>>> +      unsigned DefClobber = getFullPhysReg(MI->getOperand(0));
>>>>> +      unsigned SrcClobber = getFullPhysReg(MI->getOperand(1));
>>>>>
>>>>>       // If Src is defined by a previous copy, the previous copy cannot
>>>>> be
>>>>>       // eliminated.
>>>>> @@ -255,7 +825,10 @@ void MachineCopyPropagation::CopyPropaga
>>>>>       DEBUG(dbgs() << "MCP: Copy is a deletion candidate: ";
>>>>> MI->dump());
>>>>>
>>>>>       // Copy is now a candidate for deletion.
>>>>> -      if (!MRI->isReserved(Def))
>>>>> +      // Only look for dead COPYs if we're not running just before
>>>>> +      // VirtRegRewriter, since presumably these COPYs will have
>>>>> already been
>>>>> +      // removed.
>>>>> +      if (!PreRegRewrite && !MRI->isReserved(Def))
>>>>>         MaybeDeadCopies.insert(MI);
>>>>>
>>>>>       // If 'Def' is previously source of another copy, then this
>>>>> earlier copy's
>>>>> @@ -265,11 +838,11 @@ void MachineCopyPropagation::CopyPropaga
>>>>>       // %xmm2<def> = copy %xmm0
>>>>>       // ...
>>>>>       // %xmm2<def> = copy %xmm9
>>>>> -      ClobberRegister(Def);
>>>>> +      ClobberRegister(DefClobber);
>>>>>       for (const MachineOperand &MO : MI->implicit_operands()) {
>>>>>         if (!MO.isReg() || !MO.isDef())
>>>>>           continue;
>>>>> -        unsigned Reg = MO.getReg();
>>>>> +        unsigned Reg = getFullPhysReg(MO);
>>>>>         if (!Reg)
>>>>>           continue;
>>>>>         ClobberRegister(Reg);
>>>>> @@ -284,13 +857,27 @@ void MachineCopyPropagation::CopyPropaga
>>>>>
>>>>>       // Remember source that's copied to Def. Once it's clobbered,
>>>>> then
>>>>>       // it's no longer available for copy propagation.
>>>>> -      RegList &DestList = SrcMap[Src];
>>>>> -      if (!is_contained(DestList, Def))
>>>>> -        DestList.push_back(Def);
>>>>> +      RegList &DestList = SrcMap[SrcClobber];
>>>>> +      if (!is_contained(DestList, DefClobber))
>>>>> +        DestList.push_back(DefClobber);
>>>>>
>>>>>       continue;
>>>>>     }
>>>>>
>>>>> +    // Clobber any earlyclobber regs first.
>>>>> +    for (const MachineOperand &MO : MI->operands())
>>>>> +      if (MO.isReg() && MO.isEarlyClobber()) {
>>>>> +        unsigned Reg = getFullPhysReg(MO);
>>>>> +        // If we have a tied earlyclobber, that means it is also read
>>>>> by this
>>>>> +        // instruction, so we need to make sure we don't remove it as
>>>>> dead
>>>>> +        // later.
>>>>> +        if (MO.isTied())
>>>>> +          ReadRegister(Reg);
>>>>> +        ClobberRegister(Reg);
>>>>> +      }
>>>>> +
>>>>> +    forwardUses(*MI);
>>>>> +
>>>>>     // Not a copy.
>>>>>     SmallVector<unsigned, 2> Defs;
>>>>>     const MachineOperand *RegMask = nullptr;
>>>>> @@ -299,14 +886,11 @@ void MachineCopyPropagation::CopyPropaga
>>>>>         RegMask = &MO;
>>>>>       if (!MO.isReg())
>>>>>         continue;
>>>>> -      unsigned Reg = MO.getReg();
>>>>> +      unsigned Reg = getFullPhysReg(MO);
>>>>>       if (!Reg)
>>>>>         continue;
>>>>>
>>>>> -      assert(!TargetRegisterInfo::isVirtualRegister(Reg) &&
>>>>> -             "MachineCopyPropagation should be run after register
>>>>> allocation!");
>>>>> -
>>>>> -      if (MO.isDef()) {
>>>>> +      if (MO.isDef() && !MO.isEarlyClobber()) {
>>>>>         Defs.push_back(Reg);
>>>>>         continue;
>>>>>       } else if (MO.readsReg())
>>>>> @@ -358,11 +942,22 @@ void MachineCopyPropagation::CopyPropaga
>>>>>       ClobberRegister(Reg);
>>>>>   }
>>>>>
>>>>> +  // Remove instructions that were made dead by shrinking live ranges.
>>>>> Do this
>>>>> +  // after iterating over instructions to avoid instructions changing
>>>>> while
>>>>> +  // iterating.
>>>>> +  if (!ShrunkDeadInsts.empty()) {
>>>>> +    SmallVector<unsigned, 8> NewRegs;
>>>>> +    LiveRangeEdit(nullptr, NewRegs, *MF, *LIS, nullptr, this)
>>>>> +        .eliminateDeadDefs(ShrunkDeadInsts);
>>>>> +  }
>>>>> +
>>>>>   // If MBB doesn't have successors, delete the copies whose defs are
>>>>> not used.
>>>>>   // If MBB does have successors, then conservative assume the defs are
>>>>> live-out
>>>>>   // since we don't want to trust live-in lists.
>>>>>   if (MBB.succ_empty()) {
>>>>>     for (MachineInstr *MaybeDead : MaybeDeadCopies) {
>>>>> +      DEBUG(dbgs() << "MCP: Removing copy due to no live-out succ: ";
>>>>> +            MaybeDead->dump());
>>>>>       assert(!MRI->isReserved(MaybeDead->getOperand(0).getReg()));
>>>>>       MaybeDead->eraseFromParent();
>>>>>       Changed = true;
>>>>> @@ -374,6 +969,7 @@ void MachineCopyPropagation::CopyPropaga
>>>>>   AvailCopyMap.clear();
>>>>>   CopyMap.clear();
>>>>>   SrcMap.clear();
>>>>> +  ShrunkDeadInsts.clear();
>>>>> }
>>>>>
>>>>> bool MachineCopyPropagation::runOnMachineFunction(MachineFunction &MF)
>>>>> {
>>>>> @@ -385,6 +981,13 @@ bool MachineCopyPropagation::runOnMachin
>>>>>   TRI = MF.getSubtarget().getRegisterInfo();
>>>>>   TII = MF.getSubtarget().getInstrInfo();
>>>>>   MRI = &MF.getRegInfo();
>>>>> +  this->MF = &MF;
>>>>> +  if (PreRegRewrite) {
>>>>> +    Indexes = &getAnalysis<SlotIndexes>();
>>>>> +    LIS = &getAnalysis<LiveIntervals>();
>>>>> +    VRM = &getAnalysis<VirtRegMap>();
>>>>> +  }
>>>>> +  NoSubRegLiveness = !MRI->subRegLivenessEnabled();
>>>>>
>>>>>   for (MachineBasicBlock &MBB : MF)
>>>>>     CopyPropagateBlock(MBB);
>>>>>
>>>>> Modified: llvm/trunk/lib/CodeGen/TargetPassConfig.cpp
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetPassConfig.cpp?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/lib/CodeGen/TargetPassConfig.cpp (original)
>>>>> +++ llvm/trunk/lib/CodeGen/TargetPassConfig.cpp Mon Oct  2 15:01:37
>>>>> 2017
>>>>> @@ -88,6 +88,8 @@ static cl::opt<bool> DisableCGP("disable
>>>>>     cl::desc("Disable Codegen Prepare"));
>>>>> static cl::opt<bool> DisableCopyProp("disable-copyprop", cl::Hidden,
>>>>>     cl::desc("Disable Copy Propagation pass"));
>>>>> +static cl::opt<bool>
>>>>> DisableCopyPropPreRegRewrite("disable-copyprop-prerewrite", cl::Hidden,
>>>>> +    cl::desc("Disable Copy Propagation Pre-Register Re-write pass"));
>>>>> static cl::opt<bool>
>>>>> DisablePartialLibcallInlining("disable-partial-libcall-inlining",
>>>>>     cl::Hidden, cl::desc("Disable Partial Libcall Inlining"));
>>>>> static cl::opt<bool> EnableImplicitNullChecks(
>>>>> @@ -252,6 +254,9 @@ static IdentifyingPassPtr overridePass(A
>>>>>   if (StandardID == &MachineCopyPropagationID)
>>>>>     return applyDisable(TargetID, DisableCopyProp);
>>>>>
>>>>> +  if (StandardID == &MachineCopyPropagationPreRegRewriteID)
>>>>> +    return applyDisable(TargetID, DisableCopyPropPreRegRewrite);
>>>>> +
>>>>>   return TargetID;
>>>>> }
>>>>>
>>>>> @@ -1064,6 +1069,10 @@ void TargetPassConfig::addOptimizedRegAl
>>>>>     // Allow targets to change the register assignments before
>>>>> rewriting.
>>>>>     addPreRewrite();
>>>>>
>>>>> +    // Copy propagate to forward register uses and try to eliminate
>>>>> COPYs that
>>>>> +    // were not coalesced.
>>>>> +    addPass(&MachineCopyPropagationPreRegRewriteID);
>>>>> +
>>>>>     // Finally rewrite virtual registers.
>>>>>     addPass(&VirtRegRewriterID);
>>>>>
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll (original)
>>>>> +++ llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll Mon Oct  2
>>>>> 15:01:37 2017
>>>>> @@ -9,7 +9,8 @@ define i16 @halfword(%struct.a* %ctx, i3
>>>>> ; CHECK-LABEL: halfword:
>>>>> ; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8
>>>>> ; CHECK: ldrh [[REG1:w[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]], lsl
>>>>> #1]
>>>>> -; CHECK: strh [[REG1]], [{{.*}}[[REG2]], [[REG]], lsl #1]
>>>>> +; CHECK: mov [[REG3:x[0-9]+]], [[REG2]]
>>>>> +; CHECK: strh [[REG1]], [{{.*}}[[REG3]], [[REG]], lsl #1]
>>>>>   %shr81 = lshr i32 %xor72, 9
>>>>>   %conv82 = zext i32 %shr81 to i64
>>>>>   %idxprom83 = and i64 %conv82, 255
>>>>> @@ -24,7 +25,8 @@ define i32 @word(%struct.b* %ctx, i32 %x
>>>>> ; CHECK-LABEL: word:
>>>>> ; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8
>>>>> ; CHECK: ldr [[REG1:w[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]], lsl
>>>>> #2]
>>>>> -; CHECK: str [[REG1]], [{{.*}}[[REG2]], [[REG]], lsl #2]
>>>>> +; CHECK: mov [[REG3:x[0-9]+]], [[REG2]]
>>>>> +; CHECK: str [[REG1]], [{{.*}}[[REG3]], [[REG]], lsl #2]
>>>>>   %shr81 = lshr i32 %xor72, 9
>>>>>   %conv82 = zext i32 %shr81 to i64
>>>>>   %idxprom83 = and i64 %conv82, 255
>>>>> @@ -39,7 +41,8 @@ define i64 @doubleword(%struct.c* %ctx,
>>>>> ; CHECK-LABEL: doubleword:
>>>>> ; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8
>>>>> ; CHECK: ldr [[REG1:x[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]], lsl
>>>>> #3]
>>>>> -; CHECK: str [[REG1]], [{{.*}}[[REG2]], [[REG]], lsl #3]
>>>>> +; CHECK: mov [[REG3:x[0-9]+]], [[REG2]]
>>>>> +; CHECK: str [[REG1]], [{{.*}}[[REG3]], [[REG]], lsl #3]
>>>>>   %shr81 = lshr i32 %xor72, 9
>>>>>   %conv82 = zext i32 %shr81 to i64
>>>>>   %idxprom83 = and i64 %conv82, 255
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll (original)
>>>>> +++ llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll Mon Oct  2
>>>>> 15:01:37 2017
>>>>> @@ -8,15 +8,9 @@ define <2 x i64> @bar(<2 x i64> %a, <2 x
>>>>> ; CHECK: add.2dv[[REG:[0-9]+]], v0, v1
>>>>> ; CHECK: addd[[REG3:[0-9]+]], d[[REG]], d1
>>>>> ; CHECK: subd[[REG2:[0-9]+]], d[[REG]], d1
>>>>> -; Without advanced copy optimization, we end up with cross register
>>>>> -; banks copies that cannot be coalesced.
>>>>> -; CHECK-NOOPT: fmov [[COPY_REG3:x[0-9]+]], d[[REG3]]
>>>>> -; With advanced copy optimization, we end up with just one copy
>>>>> -; to insert the computed high part into the V register.
>>>>> -; CHECK-OPT-NOT: fmov
>>>>> +; CHECK-NOT: fmov
>>>>> ; CHECK: fmov [[COPY_REG2:x[0-9]+]], d[[REG2]]
>>>>> -; CHECK-NOOPT: fmov d0, [[COPY_REG3]]
>>>>> -; CHECK-OPT-NOT: fmov
>>>>> +; CHECK-NOT: fmov
>>>>> ; CHECK: ins.d v0[1], [[COPY_REG2]]
>>>>> ; CHECK-NEXT: ret
>>>>> ;
>>>>> @@ -24,11 +18,9 @@ define <2 x i64> @bar(<2 x i64> %a, <2 x
>>>>> ; GENERIC: addv[[REG:[0-9]+]].2d, v0.2d, v1.2d
>>>>> ; GENERIC: addd[[REG3:[0-9]+]], d[[REG]], d1
>>>>> ; GENERIC: subd[[REG2:[0-9]+]], d[[REG]], d1
>>>>> -; GENERIC-NOOPT: fmov [[COPY_REG3:x[0-9]+]], d[[REG3]]
>>>>> -; GENERIC-OPT-NOT: fmov
>>>>> +; GENERIC-NOT: fmov
>>>>> ; GENERIC: fmov [[COPY_REG2:x[0-9]+]], d[[REG2]]
>>>>> -; GENERIC-NOOPT: fmov d0, [[COPY_REG3]]
>>>>> -; GENERIC-OPT-NOT: fmov
>>>>> +; GENERIC-NOT: fmov
>>>>> ; GENERIC: ins v0.d[1], [[COPY_REG2]]
>>>>> ; GENERIC-NEXT: ret
>>>>>   %add = add <2 x i64> %a, %b
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
>>>>> (original)
>>>>> +++ llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll Mon Oct
>>>>> 2 15:01:37 2017
>>>>> @@ -4,8 +4,10 @@
>>>>> define i32 @t(i32 %a, i32 %b, i32 %c, i32 %d) nounwind ssp {
>>>>> entry:
>>>>> ; CHECK-LABEL: t:
>>>>> -; CHECK: mov x0, [[REG1:x[0-9]+]]
>>>>> -; CHECK: mov x1, [[REG2:x[0-9]+]]
>>>>> +; CHECK: mov [[REG2:x[0-9]+]], x3
>>>>> +; CHECK: mov [[REG1:x[0-9]+]], x2
>>>>> +; CHECK: mov x0, x2
>>>>> +; CHECK: mov x1, x3
>>>>> ; CHECK: bl _foo
>>>>> ; CHECK: mov x0, [[REG1]]
>>>>> ; CHECK: mov x1, [[REG2]]
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll (original)
>>>>> +++ llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll Mon Oct  2
>>>>> 15:01:37 2017
>>>>> @@ -489,7 +489,7 @@ else:
>>>>>
>>>>> ; CHECK-COMMON-LABEL: test_phi:
>>>>> ; CHECK-COMMON: mov  x[[PTR:[0-9]+]], x0
>>>>> -; CHECK-COMMON: ldr  h[[AB:[0-9]+]], [x[[PTR]]]
>>>>> +; CHECK-COMMON: ldr  h[[AB:[0-9]+]], [x0]
>>>>> ; CHECK-COMMON: [[LOOP:LBB[0-9_]+]]:
>>>>> ; CHECK-COMMON: mov.16b  v[[R:[0-9]+]], v[[AB]]
>>>>> ; CHECK-COMMON: ldr  h[[AB]], [x[[PTR]]]
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll (original)
>>>>> +++ llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll Mon Oct  2
>>>>> 15:01:37 2017
>>>>> @@ -17,6 +17,9 @@ define i32 @test_multiflag(i32 %n, i32 %
>>>>>   %val = zext i1 %test to i32
>>>>> ; CHECK: cset {{[xw][0-9]+}}, ne
>>>>>
>>>>> +; CHECK: mov [[RHSCOPY:w[0-9]+]], [[RHS]]
>>>>> +; CHECK: mov [[LHSCOPY:w[0-9]+]], [[LHS]]
>>>>> +
>>>>>   store i32 %val, i32* @var
>>>>>
>>>>>   call void @bar()
>>>>> @@ -25,7 +28,7 @@ define i32 @test_multiflag(i32 %n, i32 %
>>>>>   ; Currently, the comparison is emitted again. An MSR/MRS pair would
>>>>> also be
>>>>>   ; acceptable, but assuming the call preserves NZCV is not.
>>>>>   br i1 %test, label %iftrue, label %iffalse
>>>>> -; CHECK: cmp [[LHS]], [[RHS]]
>>>>> +; CHECK: cmp [[LHSCOPY]], [[RHSCOPY]]
>>>>> ; CHECK: b.eq
>>>>>
>>>>> iftrue:
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll
>>>>> (original)
>>>>> +++ llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll Mon Oct
>>>>> 2 15:01:37 2017
>>>>> @@ -8,10 +8,9 @@
>>>>> define void @test(%struct1* %fde, i32 %fd, void (i32, i32, i8*)* %func,
>>>>> i8* %arg)  {
>>>>> ;CHECK-LABEL: test
>>>>> entry:
>>>>> -; A53: mov [[DATA:w[0-9]+]], w1
>>>>> ; A53: str q{{[0-9]+}}, {{.*}}
>>>>> ; A53: str q{{[0-9]+}}, {{.*}}
>>>>> -; A53: str [[DATA]], {{.*}}
>>>>> +; A53: str w1, {{.*}}
>>>>>
>>>>>   %0 = bitcast %struct1* %fde to i8*
>>>>>   tail call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 40, i32 8, i1
>>>>> false)
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/AArch64/neg-imm.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/neg-imm.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/AArch64/neg-imm.ll (original)
>>>>> +++ llvm/trunk/test/CodeGen/AArch64/neg-imm.ll Mon Oct  2 15:01:37 2017
>>>>> @@ -7,8 +7,8 @@ declare void @foo(i32)
>>>>> define void @test(i32 %px) {
>>>>> ; CHECK_LABEL: test:
>>>>> ; CHECK_LABEL: %entry
>>>>> -; CHECK: subs
>>>>> -; CHECK-NEXT: csel
>>>>> +; CHECK: subs [[REG0:w[0-9]+]],
>>>>> +; CHECK: csel {{w[0-9]+}}, wzr, [[REG0]]
>>>>> entry:
>>>>>   %sub = add nsw i32 %px, -1
>>>>>   %cmp = icmp slt i32 %px, 1
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll
>>>>> (original)
>>>>> +++ llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll Mon
>>>>> Oct  2 15:01:37 2017
>>>>> @@ -547,16 +547,16 @@ define void @func_use_every_sgpr_input_c
>>>>> ; GCN: s_mov_b32 s5, s32
>>>>> ; GCN: s_add_u32 s32, s32, 0x300
>>>>>
>>>>> -; GCN-DAG: s_mov_b32 [[SAVE_X:s[0-9]+]], s14
>>>>> -; GCN-DAG: s_mov_b32 [[SAVE_Y:s[0-9]+]], s15
>>>>> -; GCN-DAG: s_mov_b32 [[SAVE_Z:s[0-9]+]], s16
>>>>> +; GCN-DAG: s_mov_b32 [[SAVE_X:s[0-57-9][0-9]*]], s14
>>>>> +; GCN-DAG: s_mov_b32 [[SAVE_Y:s[0-68-9][0-9]*]], s15
>>>>> +; GCN-DAG: s_mov_b32 [[SAVE_Z:s[0-79][0-9]*]], s16
>>>>> ; GCN-DAG: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, s[6:7]
>>>>> ; GCN-DAG: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, s[8:9]
>>>>> ; GCN-DAG: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, s[10:11]
>>>>>
>>>>> -; GCN-DAG: s_mov_b32 s6, [[SAVE_X]]
>>>>> -; GCN-DAG: s_mov_b32 s7, [[SAVE_Y]]
>>>>> -; GCN-DAG: s_mov_b32 s8, [[SAVE_Z]]
>>>>> +; GCN-DAG: s_mov_b32 s6, s14
>>>>> +; GCN-DAG: s_mov_b32 s7, s15
>>>>> +; GCN-DAG: s_mov_b32 s8, s16
>>>>> ; GCN: s_swappc_b64
>>>>>
>>>>> ; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s5 offset:4
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/AMDGPU/mad-mix.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/mad-mix.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/AMDGPU/mad-mix.ll (original)
>>>>> +++ llvm/trunk/test/CodeGen/AMDGPU/mad-mix.ll Mon Oct  2 15:01:37 2017
>>>>> @@ -51,7 +51,7 @@ define float @v_mad_mix_f32_f16hi_f16hi_
>>>>>
>>>>> ; GCN-LABEL: {{^}}v_mad_mix_v2f32:
>>>>> ; GFX9: v_mov_b32_e32 v3, v1
>>>>> -; GFX9-NEXT: v_mad_mix_f32 v1, v0, v3, v2 op_sel:[1,1,1]
>>>>> +; GFX9-NEXT: v_mad_mix_f32 v1, v0, v1, v2 op_sel:[1,1,1]
>>>>> ; GFX9-NEXT: v_mad_mix_f32 v0, v0, v3, v2
>>>>>
>>>>> ; CIVI: v_mac_f32
>>>>> @@ -66,7 +66,7 @@ define <2 x float> @v_mad_mix_v2f32(<2 x
>>>>> ; GCN-LABEL: {{^}}v_mad_mix_v2f32_shuffle:
>>>>> ; GCN: s_waitcnt
>>>>> ; GFX9-NEXT: v_mov_b32_e32 v3, v1
>>>>> -; GFX9-NEXT: v_mad_mix_f32 v1, v0, v3, v2 op_sel:[0,1,1]
>>>>> +; GFX9-NEXT: v_mad_mix_f32 v1, v0, v1, v2 op_sel:[0,1,1]
>>>>> ; GFX9-NEXT: v_mad_mix_f32 v0, v0, v3, v2 op_sel:[1,0,1]
>>>>> ; GFX9-NEXT: s_setpc_b64
>>>>>
>>>>> @@ -246,7 +246,7 @@ define float @v_mad_mix_f32_f16lo_f16lo_
>>>>> ; GCN-LABEL: {{^}}v_mad_mix_v2f32_f32imm1:
>>>>> ; GFX9: v_mov_b32_e32 v2, v1
>>>>> ; GFX9: v_mov_b32_e32 v3, 1.0
>>>>> -; GFX9: v_mad_mix_f32 v1, v0, v2, v3 op_sel:[1,1,0] op_sel_hi:[1,1,0]
>>>>> ; encoding
>>>>> +; GFX9: v_mad_mix_f32 v1, v0, v1, v3 op_sel:[1,1,0] op_sel_hi:[1,1,0]
>>>>> ; encoding
>>>>> ; GFX9: v_mad_mix_f32 v0, v0, v2, v3 op_sel_hi:[1,1,0] ; encoding
>>>>> define <2 x float> @v_mad_mix_v2f32_f32imm1(<2 x half> %src0, <2 x
>>>>> half> %src1) #0 {
>>>>>   %src0.ext = fpext <2 x half> %src0 to <2 x float>
>>>>> @@ -258,7 +258,7 @@ define <2 x float> @v_mad_mix_v2f32_f32i
>>>>> ; GCN-LABEL: {{^}}v_mad_mix_v2f32_cvtf16imminv2pi:
>>>>> ; GFX9: v_mov_b32_e32 v2, v1
>>>>> ; GFX9: v_mov_b32_e32 v3, 0x3e230000
>>>>> -; GFX9: v_mad_mix_f32 v1, v0, v2, v3 op_sel:[1,1,0] op_sel_hi:[1,1,0]
>>>>> ; encoding
>>>>> +; GFX9: v_mad_mix_f32 v1, v0, v1, v3 op_sel:[1,1,0] op_sel_hi:[1,1,0]
>>>>> ; encoding
>>>>> ; GFX9: v_mad_mix_f32 v0, v0, v2, v3 op_sel_hi:[1,1,0] ; encoding
>>>>> define <2 x float> @v_mad_mix_v2f32_cvtf16imminv2pi(<2 x half> %src0,
>>>>> <2 x half> %src1) #0 {
>>>>>   %src0.ext = fpext <2 x half> %src0 to <2 x float>
>>>>> @@ -271,7 +271,7 @@ define <2 x float> @v_mad_mix_v2f32_cvtf
>>>>> ; GCN-LABEL: {{^}}v_mad_mix_v2f32_f32imminv2pi:
>>>>> ; GFX9: v_mov_b32_e32 v2, v1
>>>>> ; GFX9: v_mov_b32_e32 v3, 0.15915494
>>>>> -; GFX9: v_mad_mix_f32 v1, v0, v2, v3 op_sel:[1,1,0] op_sel_hi:[1,1,0]
>>>>> ; encoding
>>>>> +; GFX9: v_mad_mix_f32 v1, v0, v1, v3 op_sel:[1,1,0] op_sel_hi:[1,1,0]
>>>>> ; encoding
>>>>> ; GFX9: v_mad_mix_f32 v0, v0, v2, v3 op_sel_hi:[1,1,0] ; encoding
>>>>> define <2 x float> @v_mad_mix_v2f32_f32imminv2pi(<2 x half> %src0, <2 x
>>>>> half> %src1) #0 {
>>>>>   %src0.ext = fpext <2 x half> %src0 to <2 x float>
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/AMDGPU/ret.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/ret.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/AMDGPU/ret.ll (original)
>>>>> +++ llvm/trunk/test/CodeGen/AMDGPU/ret.ll Mon Oct  2 15:01:37 2017
>>>>> @@ -2,10 +2,10 @@
>>>>> ; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s |
>>>>> FileCheck -check-prefix=GCN %s
>>>>>
>>>>> ; GCN-LABEL: {{^}}vgpr:
>>>>> -; GCN: v_mov_b32_e32 v1, v0
>>>>> -; GCN-DAG: v_add_f32_e32 v0, 1.0, v1
>>>>> -; GCN-DAG: exp mrt0 v1, v1, v1, v1 done vm
>>>>> +; GCN-DAG: v_mov_b32_e32 v1, v0
>>>>> +; GCN-DAG: exp mrt0 v0, v0, v0, v0 done vm
>>>>> ; GCN: s_waitcnt expcnt(0)
>>>>> +; GCN: v_add_f32_e32 v0, 1.0, v0
>>>>> ; GCN-NOT: s_endpgm
>>>>> define amdgpu_vs { float, float } @vgpr([9 x <16 x i8>] addrspace(2)*
>>>>> byval %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0 {
>>>>> bb:
>>>>> @@ -204,13 +204,13 @@ bb:
>>>>> }
>>>>>
>>>>> ; GCN-LABEL: {{^}}both:
>>>>> -; GCN: v_mov_b32_e32 v1, v0
>>>>> -; GCN-DAG: exp mrt0 v1, v1, v1, v1 done vm
>>>>> -; GCN-DAG: v_add_f32_e32 v0, 1.0, v1
>>>>> -; GCN-DAG: s_add_i32 s0, s3, 2
>>>>> +; GCN-DAG: exp mrt0 v0, v0, v0, v0 done vm
>>>>> +; GCN-DAG: v_mov_b32_e32 v1, v0
>>>>> ; GCN-DAG: s_mov_b32 s1, s2
>>>>> -; GCN: s_mov_b32 s2, s3
>>>>> ; GCN: s_waitcnt expcnt(0)
>>>>> +; GCN: v_add_f32_e32 v0, 1.0, v0
>>>>> +; GCN-DAG: s_add_i32 s0, s3, 2
>>>>> +; GCN-DAG: s_mov_b32 s2, s3
>>>>> ; GCN-NOT: s_endpgm
>>>>> define amdgpu_vs { float, i32, float, i32, i32 } @both([9 x <16 x i8>]
>>>>> addrspace(2)* byval %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0
>>>>> {
>>>>> bb:
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/ARM/atomic-op.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/atomic-op.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/ARM/atomic-op.ll (original)
>>>>> +++ llvm/trunk/test/CodeGen/ARM/atomic-op.ll Mon Oct  2 15:01:37 2017
>>>>> @@ -287,7 +287,8 @@ define i32 @test_cmpxchg_fail_order(i32
>>>>>
>>>>>   %pair = cmpxchg i32* %addr, i32 %desired, i32 %new seq_cst monotonic
>>>>>   %oldval = extractvalue { i32, i1 } %pair, 0
>>>>> -; CHECK-ARMV7:     ldrex   [[OLDVAL:r[0-9]+]], [r[[ADDR:[0-9]+]]]
>>>>> +; CHECK-ARMV7:     mov     r[[ADDR:[0-9]+]], r0
>>>>> +; CHECK-ARMV7:     ldrex   [[OLDVAL:r[0-9]+]], [r0]
>>>>> ; CHECK-ARMV7:     cmp     [[OLDVAL]], r1
>>>>> ; CHECK-ARMV7:     bne     [[FAIL_BB:\.?LBB[0-9]+_[0-9]+]]
>>>>> ; CHECK-ARMV7:     dmb ish
>>>>> @@ -305,7 +306,8 @@ define i32 @test_cmpxchg_fail_order(i32
>>>>> ; CHECK-ARMV7:     dmb     ish
>>>>> ; CHECK-ARMV7:     bx      lr
>>>>>
>>>>> -; CHECK-T2:     ldrex   [[OLDVAL:r[0-9]+]], [r[[ADDR:[0-9]+]]]
>>>>> +; CHECK-T2:     mov     r[[ADDR:[0-9]+]], r0
>>>>> +; CHECK-T2:     ldrex   [[OLDVAL:r[0-9]+]], [r0]
>>>>> ; CHECK-T2:     cmp     [[OLDVAL]], r1
>>>>> ; CHECK-T2:     bne     [[FAIL_BB:\.?LBB.*]]
>>>>> ; CHECK-T2:     dmb ish
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/ARM/intrinsics-overflow.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/intrinsics-overflow.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/ARM/intrinsics-overflow.ll (original)
>>>>> +++ llvm/trunk/test/CodeGen/ARM/intrinsics-overflow.ll Mon Oct  2
>>>>> 15:01:37 2017
>>>>> @@ -39,7 +39,7 @@ define i32 @sadd_overflow(i32 %a, i32 %b
>>>>>   ; ARM: movvc r[[R1]], #0
>>>>>
>>>>>   ; THUMBV6: mov  r[[R2:[0-9]+]], r[[R0:[0-9]+]]
>>>>> -  ; THUMBV6: adds r[[R3:[0-9]+]], r[[R2]], r[[R1:[0-9]+]]
>>>>> +  ; THUMBV6: adds r[[R3:[0-9]+]], r[[R0]], r[[R1:[0-9]+]]
>>>>>   ; THUMBV6: movs r[[R0]], #0
>>>>>   ; THUMBV6: movs r[[R1]], #1
>>>>>   ; THUMBV6: cmp  r[[R3]], r[[R2]]
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/ARM/swifterror.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/swifterror.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/ARM/swifterror.ll (original)
>>>>> +++ llvm/trunk/test/CodeGen/ARM/swifterror.ll Mon Oct  2 15:01:37 2017
>>>>> @@ -182,7 +182,7 @@ define float @foo_loop(%swift_error** sw
>>>>> ; CHECK-APPLE: beq
>>>>> ; CHECK-APPLE: mov r0, #16
>>>>> ; CHECK-APPLE: malloc
>>>>> -; CHECK-APPLE: strb r{{.*}}, [{{.*}}[[ID]], #8]
>>>>> +; CHECK-APPLE: strb r{{.*}}, [r0, #8]
>>>>> ; CHECK-APPLE: ble
>>>>> ; CHECK-APPLE: mov r8, [[ID]]
>>>>>
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/Mips/llvm-ir/sub.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Mips/llvm-ir/sub.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/Mips/llvm-ir/sub.ll (original)
>>>>> +++ llvm/trunk/test/CodeGen/Mips/llvm-ir/sub.ll Mon Oct  2 15:01:37
>>>>> 2017
>>>>> @@ -165,7 +165,7 @@ entry:
>>>>> ; MMR3: subu16   $5, $[[T19]], $[[T20]]
>>>>>
>>>>> ; MMR6: move     $[[T0:[0-9]+]], $7
>>>>> -; MMR6: sw       $[[T0]], 8($sp)
>>>>> +; MMR6: sw       $7, 8($sp)
>>>>> ; MMR6: move     $[[T1:[0-9]+]], $5
>>>>> ; MMR6: sw       $4, 12($sp)
>>>>> ; MMR6: lw       $[[T2:[0-9]+]], 48($sp)
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll (original)
>>>>> +++ llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll Mon Oct  2 15:01:37
>>>>> 2017
>>>>> @@ -14,7 +14,8 @@ define double @foo3(double %a) nounwind
>>>>>   ret double %r
>>>>>
>>>>> ; CHECK: @foo3
>>>>> -; CHECK: xsnmsubadp [[REG:[0-9]+]], {{[0-9]+}}, [[REG]]
>>>>> +; CHECK: fmr [[REG:[0-9]+]], [[REG2:[0-9]+]]
>>>>> +; CHECK: xsnmsubadp [[REG]], {{[0-9]+}}, [[REG2]]
>>>>> ; CHECK: xsmaddmdp
>>>>> ; CHECK: xsmaddadp
>>>>> }
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/PowerPC/gpr-vsr-spill.ll
>>>>>
>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/gpr-vsr-spill.ll?rev=314729&r1=314728&r2=314729&view=diff


More information about the llvm-commits mailing list