[llvm] r311038 - [MachineCopyPropagation] Extend pass to do COPY source forwarding

Vitaly Buka via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 17 18:24:46 PDT 2017


If you have android with arm 32bit, you probably can reproduce with running
https://llvm.org/svn/llvm-project/zorg/trunk/zorg/buildbot/builders/sanitizers/buildbot_android.sh
in empty directory.

On Thu, Aug 17, 2017 at 6:23 PM, Vitaly Buka <vitalybuka at google.com> wrote:

> I just hangs on "Running the AddressSanitizer tests" for arm
>
> /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-
> linux-android/build/compiler_rt_build_android_arm/lib/asan/tests/AsanTest:
> 1 file pus
> hed. 3.7 MB/s (10602792 bytes in 2.742s)
> + echo @@@BUILD_STEP run asan lit tests '[arm/sailfish-userdebug/OPR1.
> 170621.001]@@@'
> @@@BUILD_STEP run asan lit tests [arm/sailfish-userdebug/OPR1.
> 170621.001]@@@
> + cd /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-
> linux-android/build/compiler_rt_build_android_arm
> + ninja check-asan
> [1/1] Running the AddressSanitizer tests
>
>
> But the same device passes aarch64 tests:
> [1/1] Running the AddressSanitizer tests
> -- Testing: 407 tests, 12 threads --
> Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
> Testing Time: 87.09s
>   Expected Passes    : 202
>   Expected Failures  : 27
>   Unsupported Tests  : 178
> + echo @@@BUILD_STEP run sanitizer_common tests
> '[aarch64/sailfish-userdebug/OPR1.170621.001]@@@'
> @@@BUILD_STEP run sanitizer_common tests [aarch64/sailfish-userdebug/
> OPR1.170621.001]@@@
>
>
>
> On Thu, Aug 17, 2017 at 5:42 PM, Geoff Berry <gberry at codeaurora.org>
> wrote:
>
>> It seems like this is still happening after fixing a couple of issues
>> with this patch.  I'll revert again shortly.  I'm having a hard time
>> understanding what is failing from looking at the buildbot logs though. Do
>> you have any way of determining what exactly is timing out?
>>
>> On 8/16/2017 11:13 PM, Vitaly Buka wrote:
>>
>>> Looks like after this patch Android tests consistently hang
>>> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-and
>>> roid/builds/1825
>>>
>>> On Wed, Aug 16, 2017 at 1:50 PM, Geoff Berry via llvm-commits <
>>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
>>>
>>>     Author: gberry
>>>     Date: Wed Aug 16 13:50:01 2017
>>>     New Revision: 311038
>>>
>>>     URL: http://llvm.org/viewvc/llvm-project?rev=311038&view=rev
>>>     <http://llvm.org/viewvc/llvm-project?rev=311038&view=rev>
>>>     Log:
>>>     [MachineCopyPropagation] Extend pass to do COPY source forwarding
>>>
>>>     This change extends MachineCopyPropagation to do COPY source
>>> forwarding.
>>>
>>>     This change also extends the MachineCopyPropagation pass to be able
>>> to
>>>     be run during register allocation, after physical registers have been
>>>     assigned, but before the virtual registers have been re-written,
>>> which
>>>     allows it to remove virtual register COPY LiveIntervals that become
>>> dead
>>>     through the forwarding of all of their uses.
>>>
>>>     Reviewers: qcolombet, javed.absar, MatzeB, jonpa
>>>
>>>     Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier,
>>>     mgorny
>>>
>>>     Differential Revision: https://reviews.llvm.org/D30751
>>>     <https://reviews.llvm.org/D30751>
>>>
>>>     Modified:
>>>          llvm/trunk/include/llvm/CodeGen/Passes.h
>>>          llvm/trunk/include/llvm/InitializePasses.h
>>>          llvm/trunk/lib/CodeGen/CodeGen.cpp
>>>          llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp
>>>          llvm/trunk/lib/CodeGen/TargetPassConfig.cpp
>>>          llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
>>>          llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll
>>>          llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
>>>          llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll
>>>          llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll
>>>          llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll
>>>          llvm/trunk/test/CodeGen/AArch64/neg-imm.ll
>>>          llvm/trunk/test/CodeGen/AMDGPU/byval-frame-setup.ll
>>>          llvm/trunk/test/CodeGen/AMDGPU/call-argument-types.ll
>>>          llvm/trunk/test/CodeGen/AMDGPU/call-preserved-registers.ll
>>>          llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll
>>>          llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll
>>>          llvm/trunk/test/CodeGen/AMDGPU/mubuf-offset-private.ll
>>>          llvm/trunk/test/CodeGen/AMDGPU/multilevel-break.ll
>>>          llvm/trunk/test/CodeGen/AMDGPU/private-access-no-objects.ll
>>>          llvm/trunk/test/CodeGen/AMDGPU/ret.ll
>>>          llvm/trunk/test/CodeGen/ARM/atomic-op.ll
>>>          llvm/trunk/test/CodeGen/ARM/swifterror.ll
>>>          llvm/trunk/test/CodeGen/Mips/llvm-ir/sub.ll
>>>          llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll
>>>          llvm/trunk/test/CodeGen/PowerPC/inlineasm-i64-reg.ll
>>>          llvm/trunk/test/CodeGen/PowerPC/tail-dup-layout.ll
>>>          llvm/trunk/test/CodeGen/SPARC/32abi.ll
>>>          llvm/trunk/test/CodeGen/SPARC/atomics.ll
>>>          llvm/trunk/test/CodeGen/Thumb/thumb-shrink-wrapping.ll
>>>          llvm/trunk/test/CodeGen/X86/2006-03-01-InstrSchedBug.ll
>>>          llvm/trunk/test/CodeGen/X86/arg-copy-elide.ll
>>>          llvm/trunk/test/CodeGen/X86/avg.ll
>>>          llvm/trunk/test/CodeGen/X86/avx-load-store.ll
>>>          llvm/trunk/test/CodeGen/X86/avx512-bugfix-25270.ll
>>>          llvm/trunk/test/CodeGen/X86/avx512-calling-conv.ll
>>>          llvm/trunk/test/CodeGen/X86/avx512-mask-op.ll
>>>          llvm/trunk/test/CodeGen/X86/avx512bw-intrinsics-upgrade.ll
>>>          llvm/trunk/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll
>>>          llvm/trunk/test/CodeGen/X86/bitcast-int-to-vector-bool-zext.ll
>>>          llvm/trunk/test/CodeGen/X86/buildvec-insertvec.ll
>>>          llvm/trunk/test/CodeGen/X86/combine-fcopysign.ll
>>>          llvm/trunk/test/CodeGen/X86/complex-fastmath.ll
>>>          llvm/trunk/test/CodeGen/X86/divide-by-constant.ll
>>>          llvm/trunk/test/CodeGen/X86/fmaxnum.ll
>>>          llvm/trunk/test/CodeGen/X86/fminnum.ll
>>>          llvm/trunk/test/CodeGen/X86/fp128-i128.ll
>>>          llvm/trunk/test/CodeGen/X86/haddsub-2.ll
>>>          llvm/trunk/test/CodeGen/X86/haddsub-undef.ll
>>>          llvm/trunk/test/CodeGen/X86/half.ll
>>>          llvm/trunk/test/CodeGen/X86/inline-asm-fpstack.ll
>>>          llvm/trunk/test/CodeGen/X86/ipra-local-linkage.ll
>>>          llvm/trunk/test/CodeGen/X86/localescape.ll
>>>          llvm/trunk/test/CodeGen/X86/mul-i1024.ll
>>>          llvm/trunk/test/CodeGen/X86/mul-i512.ll
>>>          llvm/trunk/test/CodeGen/X86/mul128.ll
>>>          llvm/trunk/test/CodeGen/X86/pmul.ll
>>>          llvm/trunk/test/CodeGen/X86/powi.ll
>>>          llvm/trunk/test/CodeGen/X86/pr11334.ll
>>>          llvm/trunk/test/CodeGen/X86/pr29112.ll
>>>          llvm/trunk/test/CodeGen/X86/psubus.ll
>>>          llvm/trunk/test/CodeGen/X86/select.ll
>>>          llvm/trunk/test/CodeGen/X86/shrink-wrap-chkstk.ll
>>>          llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll
>>>          llvm/trunk/test/CodeGen/X86/sse-scalar-fp-arith.ll
>>>          llvm/trunk/test/CodeGen/X86/sse1.ll
>>>          llvm/trunk/test/CodeGen/X86/sse3-avx-addsub-2.ll
>>>          llvm/trunk/test/CodeGen/X86/statepoint-live-in.ll
>>>          llvm/trunk/test/CodeGen/X86/statepoint-stack-usage.ll
>>>          llvm/trunk/test/CodeGen/X86/vec_fp_to_int.ll
>>>          llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll
>>>          llvm/trunk/test/CodeGen/X86/vec_minmax_sint.ll
>>>          llvm/trunk/test/CodeGen/X86/vec_shift4.ll
>>>          llvm/trunk/test/CodeGen/X86/vector-blend.ll
>>>          llvm/trunk/test/CodeGen/X86/vector-idiv-sdiv-128.ll
>>>          llvm/trunk/test/CodeGen/X86/vector-idiv-udiv-128.ll
>>>          llvm/trunk/test/CodeGen/X86/vector-rotate-128.ll
>>>          llvm/trunk/test/CodeGen/X86/vector-sext.ll
>>>          llvm/trunk/test/CodeGen/X86/vector-shift-ashr-128.ll
>>>          llvm/trunk/test/CodeGen/X86/vector-shift-lshr-128.ll
>>>          llvm/trunk/test/CodeGen/X86/vector-shift-shl-128.ll
>>>          llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll
>>>          llvm/trunk/test/CodeGen/X86/vector-trunc-math.ll
>>>          llvm/trunk/test/CodeGen/X86/vector-zext.ll
>>>          llvm/trunk/test/CodeGen/X86/vselect-minmax.ll
>>>          llvm/trunk/test/CodeGen/X86/widen_conv-3.ll
>>>          llvm/trunk/test/CodeGen/X86/widen_conv-4.ll
>>>          llvm/trunk/test/CodeGen/X86/x86-shrink-wrap-unwind.ll
>>>          llvm/trunk/test/CodeGen/X86/x86-shrink-wrapping.ll
>>>
>>>     Modified: llvm/trunk/include/llvm/CodeGen/Passes.h
>>>     URL:
>>>     http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/
>>> CodeGen/Passes.h?rev=311038&r1=311037&r2=311038&view=diff
>>>     <http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm
>>> /CodeGen/Passes.h?rev=311038&r1=311037&r2=311038&view=diff>
>>>     ============================================================
>>> ==================
>>>     --- llvm/trunk/include/llvm/CodeGen/Passes.h (original)
>>>     +++ llvm/trunk/include/llvm/CodeGen/Passes.h Wed Aug 16 13:50:01
>>> 2017
>>>     @@ -278,6 +278,11 @@ namespace llvm {
>>>         /// MachineSinking - This pass performs sinking on machine
>>>     instructions.
>>>         extern char &MachineSinkingID;
>>>
>>>     +  /// MachineCopyPropagationPreRegRewrite - This pass performs copy
>>>     propagation
>>>     +  /// on machine instructions after register allocation but before
>>>     virtual
>>>     +  /// register re-writing..
>>>     +  extern char &MachineCopyPropagationPreRegRewriteID;
>>>     +
>>>         /// MachineCopyPropagation - This pass performs copy propagation
>>> on
>>>         /// machine instructions.
>>>         extern char &MachineCopyPropagationID;
>>>
>>>     Modified: llvm/trunk/include/llvm/InitializePasses.h
>>>     URL:
>>>     http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/
>>> InitializePasses.h?rev=311038&r1=311037&r2=311038&view=diff
>>>     <http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm
>>> /InitializePasses.h?rev=311038&r1=311037&r2=311038&view=diff>
>>>     ============================================================
>>> ==================
>>>     --- llvm/trunk/include/llvm/InitializePasses.h (original)
>>>     +++ llvm/trunk/include/llvm/InitializePasses.h Wed Aug 16 13:50:01
>>> 2017
>>>     @@ -233,6 +233,7 @@ void initializeMachineBranchProbabilityI
>>>       void initializeMachineCSEPass(PassRegistry&);
>>>       void initializeMachineCombinerPass(PassRegistry&);
>>>       void initializeMachineCopyPropagationPass(PassRegistry&);
>>>     +void initializeMachineCopyPropagationPreRegRewritePass(PassRegist
>>> ry&);
>>>       void initializeMachineDominanceFrontierPass(PassRegistry&);
>>>       void initializeMachineDominatorTreePass(PassRegistry&);
>>>       void initializeMachineFunctionPrinterPassPass(PassRegistry&);
>>>
>>>     Modified: llvm/trunk/lib/CodeGen/CodeGen.cpp
>>>     URL:
>>>     http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/C
>>> odeGen.cpp?rev=311038&r1=311037&r2=311038&view=diff
>>>     <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/
>>> CodeGen.cpp?rev=311038&r1=311037&r2=311038&view=diff>
>>>     ============================================================
>>> ==================
>>>     --- llvm/trunk/lib/CodeGen/CodeGen.cpp (original)
>>>     +++ llvm/trunk/lib/CodeGen/CodeGen.cpp Wed Aug 16 13:50:01 2017
>>>     @@ -54,6 +54,7 @@ void llvm::initializeCodeGen(PassRegistr
>>>         initializeMachineCSEPass(Registry);
>>>         initializeMachineCombinerPass(Registry);
>>>         initializeMachineCopyPropagationPass(Registry);
>>>     +  initializeMachineCopyPropagationPreRegRewritePass(Registry);
>>>         initializeMachineDominatorTreePass(Registry);
>>>         initializeMachineFunctionPrinterPassPass(Registry);
>>>         initializeMachineLICMPass(Registry);
>>>
>>>     Modified: llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp
>>>     URL:
>>>     http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/M
>>> achineCopyPropagation.cpp?rev=311038&r1=311037&r2=311038&view=diff
>>>     <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/
>>> MachineCopyPropagation.cpp?rev=311038&r1=311037&r2=311038&view=diff>
>>>     ============================================================
>>> ==================
>>>     --- llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp (original)
>>>     +++ llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp Wed Aug 16
>>>     13:50:01 2017
>>>     @@ -7,18 +7,62 @@
>>>       //
>>>       //===-------------------------------------------------------
>>> ---------------===//
>>>       //
>>>     -// This is an extremely simple MachineInstr-level copy propagation
>>>     pass.
>>>     +// This is a simple MachineInstr-level copy forwarding pass.  It
>>>     may be run at
>>>     +// two places in the codegen pipeline:
>>>     +//   - After register allocation but before virtual registers have
>>>     been remapped
>>>     +//     to physical registers.
>>>     +//   - After physical register remapping.
>>>     +//
>>>     +// The optimizations done vary slightly based on whether virtual
>>>     registers are
>>>     +// still present.  In both cases, this pass forwards the source of
>>>     COPYs to the
>>>     +// users of their destinations when doing so is legal.  For example:
>>>     +//
>>>     +//   %vreg1 = COPY %vreg0
>>>     +//   ...
>>>     +//   ... = OP %vreg1
>>>     +//
>>>     +// If
>>>     +//   - the physical register assigned to %vreg0 has not been
>>>     clobbered by the
>>>     +//     time of the use of %vreg1
>>>     +//   - the register class constraints are satisfied
>>>     +//   - the COPY def is the only value that reaches OP
>>>     +// then this pass replaces the above with:
>>>     +//
>>>     +//   %vreg1 = COPY %vreg0
>>>     +//   ...
>>>     +//   ... = OP %vreg0
>>>     +//
>>>     +// and updates the relevant state required by VirtRegMap (e.g.
>>>     LiveIntervals).
>>>     +// COPYs whose LiveIntervals become dead as a result of this
>>>     forwarding (i.e. if
>>>     +// all uses of %vreg1 are changed to %vreg0) are removed.
>>>     +//
>>>     +// When being run with only physical registers, this pass will also
>>>     remove some
>>>     +// redundant COPYs.  For example:
>>>     +//
>>>     +//    %R1 = COPY %R0
>>>     +//    ... // No clobber of %R1
>>>     +//    %R0 = COPY %R1 <<< Removed
>>>     +//
>>>     +// or
>>>     +//
>>>     +//    %R1 = COPY %R0
>>>     +//    ... // No clobber of %R0
>>>     +//    %R1 = COPY %R0 <<< Removed
>>>       //
>>>       //===-------------------------------------------------------
>>> ---------------===//
>>>
>>>     +#include "LiveDebugVariables.h"
>>>       #include "llvm/ADT/DenseMap.h"
>>>       #include "llvm/ADT/SetVector.h"
>>>       #include "llvm/ADT/SmallVector.h"
>>>       #include "llvm/ADT/Statistic.h"
>>>     +#include "llvm/CodeGen/LiveRangeEdit.h"
>>>     +#include "llvm/CodeGen/LiveStackAnalysis.h"
>>>       #include "llvm/CodeGen/MachineFunction.h"
>>>       #include "llvm/CodeGen/MachineFunctionPass.h"
>>>       #include "llvm/CodeGen/MachineRegisterInfo.h"
>>>       #include "llvm/CodeGen/Passes.h"
>>>     +#include "llvm/CodeGen/VirtRegMap.h"
>>>       #include "llvm/Pass.h"
>>>       #include "llvm/Support/Debug.h"
>>>       #include "llvm/Support/raw_ostream.h"
>>>     @@ -30,24 +74,48 @@ using namespace llvm;
>>>       #define DEBUG_TYPE "machine-cp"
>>>
>>>       STATISTIC(NumDeletes, "Number of dead copies deleted");
>>>     +STATISTIC(NumCopyForwards, "Number of copy uses forwarded");
>>>
>>>       namespace {
>>>         typedef SmallVector<unsigned, 4> RegList;
>>>         typedef DenseMap<unsigned, RegList> SourceMap;
>>>         typedef DenseMap<unsigned, MachineInstr*> Reg2MIMap;
>>>
>>>     -  class MachineCopyPropagation : public MachineFunctionPass {
>>>     +  class MachineCopyPropagation : public MachineFunctionPass,
>>>     +                                 private LiveRangeEdit::Delegate {
>>>           const TargetRegisterInfo *TRI;
>>>           const TargetInstrInfo *TII;
>>>     -    const MachineRegisterInfo *MRI;
>>>     +    MachineRegisterInfo *MRI;
>>>     +    MachineFunction *MF;
>>>     +    SlotIndexes *Indexes;
>>>     +    LiveIntervals *LIS;
>>>     +    const VirtRegMap *VRM;
>>>     +    // True if this pass being run before virtual registers are
>>>     remapped to
>>>     +    // physical ones.
>>>     +    bool PreRegRewrite;
>>>     +    bool NoSubRegLiveness;
>>>     +
>>>     +  protected:
>>>     +    MachineCopyPropagation(char &ID, bool PreRegRewrite)
>>>     +        : MachineFunctionPass(ID), PreRegRewrite(PreRegRewrite) {}
>>>
>>>         public:
>>>           static char ID; // Pass identification, replacement for typeid
>>>     -    MachineCopyPropagation() : MachineFunctionPass(ID) {
>>>     +    MachineCopyPropagation() : MachineCopyPropagation(ID, false) {
>>>                 initializeMachineCopyPropagati
>>> onPass(*PassRegistry::getPassRegistry());
>>>           }
>>>
>>>           void getAnalysisUsage(AnalysisUsage &AU) const override {
>>>     +      if (PreRegRewrite) {
>>>     +        AU.addRequired<SlotIndexes>();
>>>     +        AU.addPreserved<SlotIndexes>();
>>>     +        AU.addRequired<LiveIntervals>();
>>>     +        AU.addPreserved<LiveIntervals>();
>>>     +        AU.addRequired<VirtRegMap>();
>>>     +        AU.addPreserved<VirtRegMap>();
>>>     +        AU.addPreserved<LiveDebugVariables>();
>>>     +        AU.addPreserved<LiveStacks>();
>>>     +      }
>>>             AU.setPreservesCFG();
>>>             MachineFunctionPass::getAnalysisUsage(AU);
>>>           }
>>>     @@ -55,6 +123,10 @@ namespace {
>>>           bool runOnMachineFunction(MachineFunction &MF) override;
>>>
>>>           MachineFunctionProperties getRequiredProperties() const
>>> override {
>>>     +      if (PreRegRewrite)
>>>     +        return MachineFunctionProperties()
>>>     +            .set(MachineFunctionProperties::Property::NoPHIs)
>>>     +            .set(MachineFunctionProperties
>>> ::Property::TracksLiveness);
>>>             return MachineFunctionProperties().set(
>>>                 MachineFunctionProperties::Property::NoVRegs);
>>>           }
>>>     @@ -64,6 +136,28 @@ namespace {
>>>           void ReadRegister(unsigned Reg);
>>>           void CopyPropagateBlock(MachineBasicBlock &MBB);
>>>           bool eraseIfRedundant(MachineInstr &Copy, unsigned Src,
>>>     unsigned Def);
>>>     +    unsigned getPhysReg(unsigned Reg, unsigned SubReg);
>>>     +    unsigned getPhysReg(const MachineOperand &Opnd) {
>>>     +      return getPhysReg(Opnd.getReg(), Opnd.getSubReg());
>>>     +    }
>>>     +    unsigned getFullPhysReg(const MachineOperand &Opnd) {
>>>     +      return getPhysReg(Opnd.getReg(), 0);
>>>     +    }
>>>     +    void forwardUses(MachineInstr &MI);
>>>     +    bool isForwardableRegClassCopy(const MachineInstr &Copy,
>>>     +                                   const MachineInstr &UseI);
>>>     +    std::tuple<unsigned, unsigned, bool>
>>>     +    checkUseSubReg(const MachineOperand &CopySrc, const
>>>     MachineOperand &MOUse);
>>>     +    bool hasImplicitOverlap(const MachineInstr &MI, const
>>>     MachineOperand &Use);
>>>     +    void narrowRegClass(const MachineInstr &MI, const
>>>     MachineOperand &MOUse,
>>>     +                        unsigned NewUseReg, unsigned NewUseSubReg);
>>>     +    void updateForwardedCopyLiveInterval(const MachineInstr &Copy,
>>>     +                                         const MachineInstr &UseMI,
>>>     +                                         unsigned OrigUseReg,
>>>     +                                         unsigned NewUseReg,
>>>     +                                         unsigned NewUseSubReg);
>>>     +    /// LiveRangeEdit callback for eliminateDeadDefs().
>>>     +    void LRE_WillEraseInstruction(MachineInstr *MI) override;
>>>
>>>           /// Candidates for deletion.
>>>           SmallSetVector<MachineInstr*, 8> MaybeDeadCopies;
>>>     @@ -75,6 +169,15 @@ namespace {
>>>           SourceMap SrcMap;
>>>           bool Changed;
>>>         };
>>>     +
>>>     +  class MachineCopyPropagationPreRegRewrite : public
>>>     MachineCopyPropagation {
>>>     +  public:
>>>     +    static char ID; // Pass identification, replacement for typeid
>>>     +    MachineCopyPropagationPreRegRewrite()
>>>     +        : MachineCopyPropagation(ID, true) {
>>>     +         initializeMachineCopyPropagationPreRegRewritePass(*
>>> PassRegistry::getPassRegistry());
>>>     +    }
>>>     +  };
>>>       }
>>>       char MachineCopyPropagation::ID = 0;
>>>       char &llvm::MachineCopyPropagationID = MachineCopyPropagation::ID;
>>>     @@ -82,6 +185,29 @@ char &llvm::MachineCopyPropagationID = M
>>>       INITIALIZE_PASS(MachineCopyPropagation, DEBUG_TYPE,
>>>                       "Machine Copy Propagation Pass", false, false)
>>>
>>>     +/// We have two separate passes that are very similar, the only
>>>     difference being
>>>     +/// where they are meant to be run in the pipeline.  This is done
>>>     for several
>>>     +/// reasons:
>>>     +/// - the two passes have different dependencies
>>>     +/// - some targets want to disable the later run of this pass, but
>>>     not the
>>>     +///   earlier one (e.g. NVPTX and WebAssembly)
>>>     +/// - it allows for easier debugging via llc
>>>     +
>>>     +char MachineCopyPropagationPreRegRewrite::ID = 0;
>>>     +char &llvm::MachineCopyPropagationPreRegRewriteID =
>>>     MachineCopyPropagationPreRegRewrite::ID;
>>>     +
>>>     +INITIALIZE_PASS_BEGIN(MachineCopyPropagationPreRegRewrite,
>>>     +                      "machine-cp-prerewrite",
>>>     +                      "Machine Copy Propagation Pre-Register
>>>     Rewrite Pass",
>>>     +                      false, false)
>>>     +INITIALIZE_PASS_DEPENDENCY(SlotIndexes)
>>>     +INITIALIZE_PASS_DEPENDENCY(LiveIntervals)
>>>     +INITIALIZE_PASS_DEPENDENCY(VirtRegMap)
>>>     +INITIALIZE_PASS_END(MachineCopyPropagationPreRegRewrite,
>>>     +                    "machine-cp-prerewrite",
>>>     +                    "Machine Copy Propagation Pre-Register Rewrite
>>>     Pass", false,
>>>     +                    false)
>>>     +
>>>       /// Remove any entry in \p Map where the register is a subregister
>>>     or equal to
>>>       /// a register contained in \p Regs.
>>>       static void removeRegsFromMap(Reg2MIMap &Map, const RegList &Regs,
>>>     @@ -122,6 +248,10 @@ void MachineCopyPropagation::ClobberRegi
>>>       }
>>>
>>>       void MachineCopyPropagation::ReadRegister(unsigned Reg) {
>>>     +  // We don't track MaybeDeadCopies when running
>>> pre-VirtRegRewriter.
>>>     +  if (PreRegRewrite)
>>>     +    return;
>>>     +
>>>         // If 'Reg' is defined by a copy, the copy is no longer a
>>> candidate
>>>         // for elimination.
>>>         for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI) {
>>>     @@ -153,6 +283,46 @@ static bool isNopCopy(const MachineInstr
>>>         return SubIdx == TRI->getSubRegIndex(PreviousDef, Def);
>>>       }
>>>
>>>     +/// Return the physical register assigned to \p Reg if it is a
>>>     virtual register,
>>>     +/// otherwise just return the physical reg from the operand itself.
>>>     +///
>>>     +/// If \p SubReg is 0 then return the full physical register
>>>     assigned to the
>>>     +/// virtual register ignoring subregs.  If we aren't tracking
>>>     sub-reg liveness
>>>     +/// then we need to use this to be more conservative with clobbers
>>>     by killing
>>>     +/// all super reg and their sub reg COPYs as well.  This is to
>>>     prevent COPY
>>>     +/// forwarding in cases like the following:
>>>     +///
>>>     +///    %vreg2 = COPY %vreg1:sub1
>>>     +///    %vreg3 = COPY %vreg1:sub0
>>>     +///    ...    = OP1 %vreg2
>>>     +///    ...    = OP2 %vreg3
>>>     +///
>>>     +/// After forward %vreg2 (assuming this is the last use of %vreg1)
>>> and
>>>     +/// VirtRegRewriter adding kill markers we have:
>>>     +///
>>>     +///    %vreg3 = COPY %vreg1:sub0
>>>     +///    ...    = OP1 %vreg1:sub1<kill>
>>>     +///    ...    = OP2 %vreg3
>>>     +///
>>>     +/// If %vreg3 is assigned to a sub-reg of %vreg1, then after
>>>     rewriting we have:
>>>     +///
>>>     +///    ...     = OP1 R0:sub1, R0<imp-use,kill>
>>>     +///    ...     = OP2 R0:sub0
>>>     +///
>>>     +/// and the use of R0 by OP2 will not have a valid definition.
>>>     +unsigned MachineCopyPropagation::getPhysReg(unsigned Reg, unsigned
>>>     SubReg) {
>>>     +
>>>     +  // Physical registers cannot have subregs.
>>>     +  if (!TargetRegisterInfo::isVirtualRegister(Reg))
>>>     +    return Reg;
>>>     +
>>>     +  assert(PreRegRewrite && "Unexpected virtual register
>>> encountered");
>>>     +  Reg = VRM->getPhys(Reg);
>>>     +  if (SubReg && !NoSubRegLiveness)
>>>     +    Reg = TRI->getSubReg(Reg, SubReg);
>>>     +  return Reg;
>>>     +}
>>>     +
>>>       /// Remove instruction \p Copy if there exists a previous copy
>>>     that copies the
>>>       /// register \p Src to the register \p Def; This may happen
>>>     indirectly by
>>>       /// copying the super registers.
>>>     @@ -190,6 +360,325 @@ bool MachineCopyPropagation::eraseIfRedu
>>>         return true;
>>>       }
>>>
>>>     +
>>>     +/// Decide whether we should forward the destination of \param Copy
>>>     to its use
>>>     +/// in \param UseI based on the register class of the Copy
>>>     operands.  Same-class
>>>     +/// COPYs are always accepted by this function, but cross-class
>>>     COPYs are only
>>>     +/// accepted if they are forwarded to another COPY with the operand
>>>     register
>>>     +/// classes reversed.  For example:
>>>     +///
>>>     +///   RegClassA = COPY RegClassB  // Copy parameter
>>>     +///   ...
>>>     +///   RegClassB = COPY RegClassA  // UseI parameter
>>>     +///
>>>     +/// which after forwarding becomes
>>>     +///
>>>     +///   RegClassA = COPY RegClassB
>>>     +///   ...
>>>     +///   RegClassB = COPY RegClassB
>>>     +///
>>>     +/// so we have reduced the number of cross-class COPYs and
>>> potentially
>>>     +/// introduced a no COPY that can be removed.
>>>     +bool MachineCopyPropagation::isForwardableRegClassCopy(
>>>     +    const MachineInstr &Copy, const MachineInstr &UseI) {
>>>     +  auto isCross = [&](const MachineOperand &Dst, const
>>>     MachineOperand &Src) {
>>>     +    unsigned DstReg = Dst.getReg();
>>>     +    unsigned SrcPhysReg = getPhysReg(Src);
>>>     +    const TargetRegisterClass *DstRC;
>>>     +    if (TargetRegisterInfo::isVirtualRegister(DstReg)) {
>>>     +      DstRC = MRI->getRegClass(DstReg);
>>>     +      unsigned DstSubReg = Dst.getSubReg();
>>>     +      if (DstSubReg)
>>>     +        SrcPhysReg = TRI->getMatchingSuperReg(SrcPhysReg,
>>>     DstSubReg, DstRC);
>>>     +    } else
>>>     +      DstRC = TRI->getMinimalPhysRegClass(DstReg);
>>>     +
>>>     +    return !DstRC->contains(SrcPhysReg);
>>>     +  };
>>>     +
>>>     +  const MachineOperand &CopyDst = Copy.getOperand(0);
>>>     +  const MachineOperand &CopySrc = Copy.getOperand(1);
>>>     +
>>>     +  if (!isCross(CopyDst, CopySrc))
>>>     +    return true;
>>>     +
>>>     +  if (!UseI.isCopy())
>>>     +    return false;
>>>     +
>>>     +  assert(getFullPhysReg(UseI.getOperand(1)) ==
>>>     getFullPhysReg(CopyDst));
>>>     +  return !isCross(UseI.getOperand(0), CopySrc);
>>>     +}
>>>     +
>>>     +/// Check that the subregs on the copy source operand (\p CopySrc)
>>>     and the use
>>>     +/// operand to be forwarded to (\p MOUse) are compatible with doing
>>> the
>>>     +/// forwarding.  Also computes the new register and subregister to
>>>     be used in
>>>     +/// the forwarded-to instruction.
>>>     +std::tuple<unsigned, unsigned, bool>
>>>     MachineCopyPropagation::checkUseSubReg(
>>>     +    const MachineOperand &CopySrc, const MachineOperand &MOUse) {
>>>     +  unsigned NewUseReg = CopySrc.getReg();
>>>     +  unsigned NewUseSubReg;
>>>     +
>>>     +  if (TargetRegisterInfo::isPhysicalRegister(NewUseReg)) {
>>>     +    // If MOUse is a virtual reg, we need to apply it to the new
>>>     physical reg
>>>     +    // we're going to replace it with.
>>>     +    if (MOUse.getSubReg())
>>>     +      NewUseReg = TRI->getSubReg(NewUseReg, MOUse.getSubReg());
>>>     +    // If the original use subreg isn't valid on the new src reg,
>>>     we can't
>>>     +    // forward it here.
>>>     +    if (!NewUseReg)
>>>     +      return std::make_tuple(0, 0, false);
>>>     +    NewUseSubReg = 0;
>>>     +  } else {
>>>     +    // %v1 = COPY %v2:sub1
>>>     +    //    USE %v1:sub2
>>>     +    // The new use is %v2:sub1:sub2
>>>     +    NewUseSubReg =
>>>     +        TRI->composeSubRegIndices(CopySrc.getSubReg(),
>>>     MOUse.getSubReg());
>>>     +    // Check that NewUseSubReg is valid on NewUseReg
>>>     +    if (NewUseSubReg &&
>>>     +        !TRI->getSubClassWithSubReg(MRI->getRegClass(NewUseReg),
>>>     NewUseSubReg))
>>>     +      return std::make_tuple(0, 0, false);
>>>     +  }
>>>     +
>>>     +  return std::make_tuple(NewUseReg, NewUseSubReg, true);
>>>     +}
>>>     +
>>>     +/// Check that \p MI does not have implicit uses that overlap with
>>>     it's \p Use
>>>     +/// operand (the register being replaced), since these can
>>> sometimes be
>>>     +/// implicitly tied to other operands.  For example, on AMDGPU:
>>>     +///
>>>     +/// V_MOVRELS_B32_e32 %VGPR2, %M0<imp-use>, %EXEC<imp-use>,
>>>     %VGPR2_VGPR3_VGPR4_VGPR5<imp-use>
>>>     +///
>>>     +/// the %VGPR2 is implicitly tied to the larger reg operand, but we
>>>     have no
>>>     +/// way of knowing we need to update the latter when updating the
>>>     former.
>>>     +bool MachineCopyPropagation::hasImplicitOverlap(const MachineInstr
>>> &MI,
>>>     +                                                const
>>>     MachineOperand &Use) {
>>>     +  if (!TargetRegisterInfo::isPhysicalRegister(Use.getReg()))
>>>     +    return false;
>>>     +
>>>     +  for (const MachineOperand &MIUse : MI.uses())
>>>     +    if (&MIUse != &Use && MIUse.isReg() && MIUse.isImplicit() &&
>>>     +        TRI->regsOverlap(Use.getReg(), MIUse.getReg()))
>>>     +      return true;
>>>     +
>>>     +  return false;
>>>     +}
>>>     +
>>>     +/// Narrow the register class of the forwarded vreg so it matches
>>> any
>>>     +/// instruction constraints.  \p MI is the instruction being
>>>     forwarded to. \p
>>>     +/// MOUse is the operand being replaced in \p MI (which hasn't yet
>>>     been updated
>>>     +/// at the time this function is called).  \p NewUseReg and \p
>>>     NewUseSubReg are
>>>     +/// what the \p MOUse will be changed to after forwarding.
>>>     +///
>>>     +/// If we are forwarding
>>>     +///    A:RCA = COPY B:RCB
>>>     +/// into
>>>     +///    ... = OP A:RCA
>>>     +///
>>>     +/// then we need to narrow the register class of B so that it is a
>>>     subclass
>>>     +/// of RCA so that it meets the instruction register class
>>> constraints.
>>>     +void MachineCopyPropagation::narrowRegClass(const MachineInstr &MI,
>>>     +                                            const MachineOperand
>>>     &MOUse,
>>>     +                                            unsigned NewUseReg,
>>>     +                                            unsigned NewUseSubReg) {
>>>     +  if (!TargetRegisterInfo::isVirtualRegister(NewUseReg))
>>>     +    return;
>>>     +
>>>     +  // Make sure the virtual reg class allows the subreg.
>>>     +  if (NewUseSubReg) {
>>>     +    const TargetRegisterClass *CurUseRC =
>>> MRI->getRegClass(NewUseReg);
>>>     +    const TargetRegisterClass *NewUseRC =
>>>     +        TRI->getSubClassWithSubReg(CurUseRC, NewUseSubReg);
>>>     +    if (CurUseRC != NewUseRC) {
>>>     +      DEBUG(dbgs() << "MCP: Setting regclass of " <<
>>>     PrintReg(NewUseReg, TRI)
>>>     +                   << " to " << TRI->getRegClassName(NewUseRC) <<
>>>     "\n");
>>>     +      MRI->setRegClass(NewUseReg, NewUseRC);
>>>     +    }
>>>     +  }
>>>     +
>>>     +  unsigned MOUseOpNo = &MOUse - &MI.getOperand(0);
>>>     +  const TargetRegisterClass *InstRC =
>>>     +      TII->getRegClass(MI.getDesc(), MOUseOpNo, TRI, *MF);
>>>     +  if (InstRC) {
>>>     +    const TargetRegisterClass *CurUseRC =
>>> MRI->getRegClass(NewUseReg);
>>>     +    if (NewUseSubReg)
>>>     +      InstRC = TRI->getMatchingSuperRegClass(CurUseRC, InstRC,
>>>     NewUseSubReg);
>>>     +    if (!InstRC->hasSubClassEq(CurUseRC)) {
>>>     +      const TargetRegisterClass *NewUseRC =
>>>     +          TRI->getCommonSubClass(InstRC, CurUseRC);
>>>     +      DEBUG(dbgs() << "MCP: Setting regclass of " <<
>>>     PrintReg(NewUseReg, TRI)
>>>     +                   << " to " << TRI->getRegClassName(NewUseRC) <<
>>>     "\n");
>>>     +      MRI->setRegClass(NewUseReg, NewUseRC);
>>>     +    }
>>>     +  }
>>>     +}
>>>     +
>>>     +/// Update the LiveInterval information to reflect the destination
>>>     of \p Copy
>>>     +/// being forwarded to a use in \p UseMI.  \p OrigUseReg is the
>>>     register being
>>>     +/// forwarded through. It should be the destination register of \p
>>>     Copy and has
>>>     +/// already been replaced in \p UseMI at the point this function is
>>>     called.  \p
>>>     +/// NewUseReg and \p NewUseSubReg are the register and subregister
>>>     being
>>>     +/// forwarded.  They should be the source register of the \p Copy
>>>     and should be
>>>     +/// the value of the \p UseMI operand being forwarded at the point
>>>     this function
>>>     +/// is called.
>>>     +void MachineCopyPropagation::updateForwardedCopyLiveInterval(
>>>     +    const MachineInstr &Copy, const MachineInstr &UseMI, unsigned
>>>     OrigUseReg,
>>>     +    unsigned NewUseReg, unsigned NewUseSubReg) {
>>>     +
>>>     +  assert(TRI->isSubRegisterEq(getPhysReg(OrigUseReg, 0),
>>>     +                              getFullPhysReg(Copy.getOperand(0)))
>>> &&
>>>     +         "OrigUseReg mismatch");
>>>     +  assert(TRI->isSubRegisterEq(getFullPhysReg(Copy.getOperand(1)),
>>>     +                              getPhysReg(NewUseReg, 0)) &&
>>>     +         "NewUseReg mismatch");
>>>     +
>>>     +  // Extend live range starting from COPY early-clobber slot, since
>>>     that
>>>     +  // is where the original src live range ends.
>>>     +  SlotIndex CopyUseIdx =
>>>     +      Indexes->getInstructionIndex(Copy).getRegSlot(true
>>>     /*=EarlyClobber*/);
>>>     +  SlotIndex UseIdx = Indexes->getInstructionIndex(U
>>> seMI).getRegSlot();
>>>     +  if (TargetRegisterInfo::isVirtualRegister(NewUseReg)) {
>>>     +    LiveInterval &LI = LIS->getInterval(NewUseReg);
>>>     +    LI.extendInBlock(CopyUseIdx, UseIdx);
>>>     +    LaneBitmask UseMask = TRI->getSubRegIndexLaneMask(Ne
>>> wUseSubReg);
>>>     +    for (auto &S : LI.subranges())
>>>     +      if ((S.LaneMask & UseMask).any() && S.find(CopyUseIdx))
>>>     +        S.extendInBlock(CopyUseIdx, UseIdx);
>>>     +  } else {
>>>     +    assert(NewUseSubReg == 0 && "Unexpected subreg on physical
>>>     register!");
>>>     +    for (MCRegUnitIterator UI(NewUseReg, TRI); UI.isValid(); ++UI) {
>>>     +      LiveRange &LR = LIS->getRegUnit(*UI);
>>>     +      LR.extendInBlock(CopyUseIdx, UseIdx);
>>>     +    }
>>>     +  }
>>>     +
>>>     +  if (!TargetRegisterInfo::isVirtualRegister(OrigUseReg))
>>>     +    return;
>>>     +
>>>     +  LiveInterval &LI = LIS->getInterval(OrigUseReg);
>>>     +
>>>     +  // Can happen for undef uses.
>>>     +  if (LI.empty())
>>>     +    return;
>>>     +
>>>     +  SlotIndex UseIndex = Indexes->getInstructionIndex(UseMI);
>>>     +  const LiveRange::Segment *UseSeg = LI.getSegmentContaining(UseInd
>>> ex);
>>>     +
>>>     +  // Only shrink if forwarded use is the end of a segment.
>>>     +  if (UseSeg->end != UseIndex.getRegSlot())
>>>     +    return;
>>>     +
>>>     +  SmallVector<MachineInstr *, 4> DeadInsts;
>>>     +  LIS->shrinkToUses(&LI, &DeadInsts);
>>>     +  if (!DeadInsts.empty()) {
>>>     +    SmallVector<unsigned, 8> NewRegs;
>>>     +    LiveRangeEdit(nullptr, NewRegs, *MF, *LIS, nullptr, this)
>>>     +        .eliminateDeadDefs(DeadInsts);
>>>     +  }
>>>     +}
>>>     +
>>>     +void MachineCopyPropagation::LRE_WillEraseInstruction(MachineInstr
>>>     *MI) {
>>>     +  // Remove this COPY from further consideration for forwarding.
>>>     +  ClobberRegister(getFullPhysReg(MI->getOperand(0)));
>>>     +  Changed = true;
>>>     +}
>>>     +
>>>     +/// Look for available copies whose destination register is used by
>>>     \p MI and
>>>     +/// replace the use in \p MI with the copy's source register.
>>>     +void MachineCopyPropagation::forwardUses(MachineInstr &MI) {
>>>     +  if (AvailCopyMap.empty())
>>>     +    return;
>>>     +
>>>     +  // Look for non-tied explicit vreg uses that have an active COPY
>>>     +  // instruction that defines the physical register allocated to
>>> them.
>>>     +  // Replace the vreg with the source of the active COPY.
>>>     +  for (MachineOperand &MOUse : MI.explicit_uses()) {
>>>     +    if (!MOUse.isReg() || MOUse.isTied())
>>>     +      continue;
>>>     +
>>>     +    unsigned UseReg = MOUse.getReg();
>>>     +    if (!UseReg)
>>>     +      continue;
>>>     +
>>>     +    if (TargetRegisterInfo::isVirtualRegister(UseReg))
>>>     +      UseReg = VRM->getPhys(UseReg);
>>>     +    else if (MI.isCall() || MI.isReturn() || MI.isInlineAsm() ||
>>>     +             MI.hasUnmodeledSideEffects() || MI.isDebugValue() ||
>>>     MI.isKill())
>>>     +      // Some instructions seem to have ABI uses e.g. not marked as
>>>     +      // implicit, which can lead to forwarding them when we
>>>     shouldn't, so
>>>     +      // restrict the types of instructions we forward physical
>>>     regs into.
>>>     +      continue;
>>>     +
>>>     +    // Don't forward COPYs via non-allocatable regs since they can
>>> have
>>>     +    // non-standard semantics.
>>>     +    if (!MRI->isAllocatable(UseReg))
>>>     +      continue;
>>>     +
>>>     +    auto CI = AvailCopyMap.find(UseReg);
>>>     +    if (CI == AvailCopyMap.end())
>>>     +      continue;
>>>     +
>>>     +    MachineInstr &Copy = *CI->second;
>>>     +    MachineOperand &CopyDst = Copy.getOperand(0);
>>>     +    MachineOperand &CopySrc = Copy.getOperand(1);
>>>     +
>>>     +    // Don't forward COPYs that are already NOPs due to register
>>>     assignment.
>>>     +    if (getPhysReg(CopyDst) == getPhysReg(CopySrc))
>>>     +      continue;
>>>     +
>>>     +    // FIXME: Don't handle partial uses of wider COPYs yet.
>>>     +    if (CopyDst.getSubReg() != 0 || UseReg != getPhysReg(CopyDst))
>>>     +      continue;
>>>     +
>>>     +    // Don't forward COPYs of non-allocatable regs unless they are
>>>     constant.
>>>     +    unsigned CopySrcReg = CopySrc.getReg();
>>>     +    if (TargetRegisterInfo::isPhysicalRegister(CopySrcReg) &&
>>>     +        !MRI->isAllocatable(CopySrcReg) &&
>>>     !MRI->isConstantPhysReg(CopySrcReg))
>>>     +      continue;
>>>     +
>>>     +    if (!isForwardableRegClassCopy(Copy, MI))
>>>     +      continue;
>>>     +
>>>     +    unsigned NewUseReg, NewUseSubReg;
>>>     +    bool SubRegOK;
>>>     +    std::tie(NewUseReg, NewUseSubReg, SubRegOK) =
>>>     +        checkUseSubReg(CopySrc, MOUse);
>>>     +    if (!SubRegOK)
>>>     +      continue;
>>>     +
>>>     +    if (hasImplicitOverlap(MI, MOUse))
>>>     +      continue;
>>>     +
>>>     +    DEBUG(dbgs() << "MCP: Replacing "
>>>     +          << PrintReg(MOUse.getReg(), TRI, MOUse.getSubReg())
>>>     +          << "\n     with "
>>>     +          << PrintReg(NewUseReg, TRI, CopySrc.getSubReg())
>>>     +          << "\n     in "
>>>     +          << MI
>>>     +          << "     from "
>>>     +          << Copy);
>>>     +
>>>     +    narrowRegClass(MI, MOUse, NewUseReg, NewUseSubReg);
>>>     +
>>>     +    unsigned OrigUseReg = MOUse.getReg();
>>>     +    MOUse.setReg(NewUseReg);
>>>     +    MOUse.setSubReg(NewUseSubReg);
>>>     +
>>>     +    DEBUG(dbgs() << "MCP: After replacement: " << MI << "\n");
>>>     +
>>>     +    if (PreRegRewrite)
>>>     +      updateForwardedCopyLiveInterval(Copy, MI, OrigUseReg,
>>> NewUseReg,
>>>     +                                      NewUseSubReg);
>>>     +    else
>>>     +      for (MachineInstr &KMI :
>>>     +             make_range(Copy.getIterator(),
>>>     std::next(MI.getIterator())))
>>>     +        KMI.clearRegisterKills(NewUseReg, TRI);
>>>     +
>>>     +    ++NumCopyForwards;
>>>     +    Changed = true;
>>>     +  }
>>>     +}
>>>     +
>>>       void MachineCopyPropagation::CopyPropagateBlock(MachineBasicBlock
>>>     &MBB) {
>>>         DEBUG(dbgs() << "MCP: CopyPropagateBlock " << MBB.getName() <<
>>>     "\n");
>>>
>>>     @@ -198,12 +687,8 @@ void MachineCopyPropagation::CopyPropaga
>>>           ++I;
>>>
>>>           if (MI->isCopy()) {
>>>     -      unsigned Def = MI->getOperand(0).getReg();
>>>     -      unsigned Src = MI->getOperand(1).getReg();
>>>     -
>>>     -      assert(!TargetRegisterInfo::isVirtualRegister(Def) &&
>>>     -             !TargetRegisterInfo::isVirtualRegister(Src) &&
>>>     -             "MachineCopyPropagation should be run after register
>>>     allocation!");
>>>     +      unsigned Def = getPhysReg(MI->getOperand(0));
>>>     +      unsigned Src = getPhysReg(MI->getOperand(1));
>>>
>>>             // The two copies cancel out and the source of the first copy
>>>             // hasn't been overridden, eliminate the second one. e.g.
>>>     @@ -220,8 +705,16 @@ void MachineCopyPropagation::CopyPropaga
>>>             //  %ECX<def> = COPY %EAX
>>>             // =>
>>>             //  %ECX<def> = COPY %EAX
>>>     -      if (eraseIfRedundant(*MI, Def, Src) || eraseIfRedundant(*MI,
>>>     Src, Def))
>>>     -        continue;
>>>     +      if (!PreRegRewrite)
>>>     +        if (eraseIfRedundant(*MI, Def, Src) ||
>>>     eraseIfRedundant(*MI, Src, Def))
>>>     +          continue;
>>>     +
>>>     +      forwardUses(*MI);
>>>     +
>>>     +      // Src may have been changed by forwardUses()
>>>     +      Src = getPhysReg(MI->getOperand(1));
>>>     +      unsigned DefClobber = getFullPhysReg(MI->getOperand(0));
>>>     +      unsigned SrcClobber = getFullPhysReg(MI->getOperand(1));
>>>
>>>             // If Src is defined by a previous copy, the previous copy
>>>     cannot be
>>>             // eliminated.
>>>     @@ -238,7 +731,10 @@ void MachineCopyPropagation::CopyPropaga
>>>             DEBUG(dbgs() << "MCP: Copy is a deletion candidate: ";
>>>     MI->dump());
>>>
>>>             // Copy is now a candidate for deletion.
>>>     -      if (!MRI->isReserved(Def))
>>>     +      // Only look for dead COPYs if we're not running just before
>>>     +      // VirtRegRewriter, since presumably these COPYs will have
>>>     already been
>>>     +      // removed.
>>>     +      if (!PreRegRewrite && !MRI->isReserved(Def))
>>>               MaybeDeadCopies.insert(MI);
>>>
>>>             // If 'Def' is previously source of another copy, then this
>>>     earlier copy's
>>>     @@ -248,11 +744,11 @@ void MachineCopyPropagation::CopyPropaga
>>>             // %xmm2<def> = copy %xmm0
>>>             // ...
>>>             // %xmm2<def> = copy %xmm9
>>>     -      ClobberRegister(Def);
>>>     +      ClobberRegister(DefClobber);
>>>             for (const MachineOperand &MO : MI->implicit_operands()) {
>>>               if (!MO.isReg() || !MO.isDef())
>>>                 continue;
>>>     -        unsigned Reg = MO.getReg();
>>>     +        unsigned Reg = getFullPhysReg(MO);
>>>               if (!Reg)
>>>                 continue;
>>>               ClobberRegister(Reg);
>>>     @@ -267,13 +763,27 @@ void MachineCopyPropagation::CopyPropaga
>>>
>>>             // Remember source that's copied to Def. Once it's
>>>     clobbered, then
>>>             // it's no longer available for copy propagation.
>>>     -      RegList &DestList = SrcMap[Src];
>>>     -      if (!is_contained(DestList, Def))
>>>     -        DestList.push_back(Def);
>>>     +      RegList &DestList = SrcMap[SrcClobber];
>>>     +      if (!is_contained(DestList, DefClobber))
>>>     +        DestList.push_back(DefClobber);
>>>
>>>             continue;
>>>           }
>>>
>>>     +    // Clobber any earlyclobber regs first.
>>>     +    for (const MachineOperand &MO : MI->operands())
>>>     +      if (MO.isReg() && MO.isEarlyClobber()) {
>>>     +        unsigned Reg = getFullPhysReg(MO);
>>>     +        // If we have a tied earlyclobber, that means it is also
>>>     read by this
>>>     +        // instruction, so we need to make sure we don't remove it
>>>     as dead
>>>     +        // later.
>>>     +        if (MO.isTied())
>>>     +          ReadRegister(Reg);
>>>     +        ClobberRegister(Reg);
>>>     +      }
>>>     +
>>>     +    forwardUses(*MI);
>>>     +
>>>           // Not a copy.
>>>           SmallVector<unsigned, 2> Defs;
>>>           const MachineOperand *RegMask = nullptr;
>>>     @@ -282,14 +792,11 @@ void MachineCopyPropagation::CopyPropaga
>>>               RegMask = &MO;
>>>             if (!MO.isReg())
>>>               continue;
>>>     -      unsigned Reg = MO.getReg();
>>>     +      unsigned Reg = getFullPhysReg(MO);
>>>             if (!Reg)
>>>               continue;
>>>
>>>     -      assert(!TargetRegisterInfo::isVirtualRegister(Reg) &&
>>>     -             "MachineCopyPropagation should be run after register
>>>     allocation!");
>>>     -
>>>     -      if (MO.isDef()) {
>>>     +      if (MO.isDef() && !MO.isEarlyClobber()) {
>>>               Defs.push_back(Reg);
>>>               continue;
>>>             } else if (MO.readsReg())
>>>     @@ -346,6 +853,8 @@ void MachineCopyPropagation::CopyPropaga
>>>         // since we don't want to trust live-in lists.
>>>         if (MBB.succ_empty()) {
>>>           for (MachineInstr *MaybeDead : MaybeDeadCopies) {
>>>     +      DEBUG(dbgs() << "MCP: Removing copy due to no live-out succ:
>>> ";
>>>     +            MaybeDead->dump());
>>>             assert(!MRI->isReserved(MaybeDead->getOperand(0).getReg()));
>>>             MaybeDead->eraseFromParent();
>>>             Changed = true;
>>>     @@ -368,10 +877,16 @@ bool MachineCopyPropagation::runOnMachin
>>>         TRI = MF.getSubtarget().getRegisterInfo();
>>>         TII = MF.getSubtarget().getInstrInfo();
>>>         MRI = &MF.getRegInfo();
>>>     +  this->MF = &MF;
>>>     +  if (PreRegRewrite) {
>>>     +    Indexes = &getAnalysis<SlotIndexes>();
>>>     +    LIS = &getAnalysis<LiveIntervals>();
>>>     +    VRM = &getAnalysis<VirtRegMap>();
>>>     +  }
>>>     +  NoSubRegLiveness = !MRI->subRegLivenessEnabled();
>>>
>>>         for (MachineBasicBlock &MBB : MF)
>>>           CopyPropagateBlock(MBB);
>>>
>>>         return Changed;
>>>       }
>>>     -
>>>
>>>     Modified: llvm/trunk/lib/CodeGen/TargetPassConfig.cpp
>>>     URL:
>>>     http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/T
>>> argetPassConfig.cpp?rev=311038&r1=311037&r2=311038&view=diff
>>>     <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/
>>> TargetPassConfig.cpp?rev=311038&r1=311037&r2=311038&view=diff>
>>>     ============================================================
>>> ==================
>>>     --- llvm/trunk/lib/CodeGen/TargetPassConfig.cpp (original)
>>>     +++ llvm/trunk/lib/CodeGen/TargetPassConfig.cpp Wed Aug 16 13:50:01
>>> 2017
>>>     @@ -88,6 +88,8 @@ static cl::opt<bool> DisableCGP("disable
>>>           cl::desc("Disable Codegen Prepare"));
>>>       static cl::opt<bool> DisableCopyProp("disable-copyprop",
>>> cl::Hidden,
>>>           cl::desc("Disable Copy Propagation pass"));
>>>     +static cl::opt<bool>
>>>     DisableCopyPropPreRegRewrite("disable-copyprop-prerewrite",
>>> cl::Hidden,
>>>     +    cl::desc("Disable Copy Propagation Pre-Register Re-write
>>> pass"));
>>>       static cl::opt<bool>
>>>     DisablePartialLibcallInlining("disable-partial-libcall-inlining",
>>>           cl::Hidden, cl::desc("Disable Partial Libcall Inlining"));
>>>       static cl::opt<bool> EnableImplicitNullChecks(
>>>     @@ -248,6 +250,9 @@ static IdentifyingPassPtr overridePass(A
>>>         if (StandardID == &MachineCopyPropagationID)
>>>           return applyDisable(TargetID, DisableCopyProp);
>>>
>>>     +  if (StandardID == &MachineCopyPropagationPreRegRewriteID)
>>>     +    return applyDisable(TargetID, DisableCopyPropPreRegRewrite);
>>>     +
>>>         return TargetID;
>>>       }
>>>
>>>     @@ -1059,6 +1064,10 @@ void TargetPassConfig::addOptimizedRegAl
>>>           // Allow targets to change the register assignments before
>>>     rewriting.
>>>           addPreRewrite();
>>>
>>>     +    // Copy propagate to forward register uses and try to eliminate
>>>     COPYs that
>>>     +    // were not coalesced.
>>>     +    addPass(&MachineCopyPropagationPreRegRewriteID);
>>>     +
>>>           // Finally rewrite virtual registers.
>>>           addPass(&VirtRegRewriterID);
>>>
>>>
>>>     Modified: llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
>>>     URL:
>>>     http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>> AArch64/aarch64-fold-lslfast.ll?rev=311038&r1=311037&r2=311038&view=diff
>>>     <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen
>>> /AArch64/aarch64-fold-lslfast.ll?rev=311038&r1=311037&r2=311
>>> 038&view=diff>
>>>     ============================================================
>>> ==================
>>>     --- llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
>>> (original)
>>>     +++ llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll Wed Aug
>>>     16 13:50:01 2017
>>>     @@ -9,7 +9,8 @@ define i16 @halfword(%struct.a* %ctx, i3
>>>       ; CHECK-LABEL: halfword:
>>>       ; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8
>>>       ; CHECK: ldrh [[REG1:w[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]],
>>>     lsl #1]
>>>     -; CHECK: strh [[REG1]], [{{.*}}[[REG2]], [[REG]], lsl #1]
>>>     +; CHECK: mov [[REG3:x[0-9]+]], [[REG2]]
>>>     +; CHECK: strh [[REG1]], [{{.*}}[[REG3]], [[REG]], lsl #1]
>>>         %shr81 = lshr i32 %xor72, 9
>>>         %conv82 = zext i32 %shr81 to i64
>>>         %idxprom83 = and i64 %conv82, 255
>>>     @@ -24,7 +25,8 @@ define i32 @word(%struct.b* %ctx, i32 %x
>>>       ; CHECK-LABEL: word:
>>>       ; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8
>>>       ; CHECK: ldr [[REG1:w[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]],
>>>     lsl #2]
>>>     -; CHECK: str [[REG1]], [{{.*}}[[REG2]], [[REG]], lsl #2]
>>>     +; CHECK: mov [[REG3:x[0-9]+]], [[REG2]]
>>>     +; CHECK: str [[REG1]], [{{.*}}[[REG3]], [[REG]], lsl #2]
>>>         %shr81 = lshr i32 %xor72, 9
>>>         %conv82 = zext i32 %shr81 to i64
>>>         %idxprom83 = and i64 %conv82, 255
>>>     @@ -39,7 +41,8 @@ define i64 @doubleword(%struct.c* %ctx,
>>>       ; CHECK-LABEL: doubleword:
>>>       ; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8
>>>       ; CHECK: ldr [[REG1:x[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]],
>>>     lsl #3]
>>>     -; CHECK: str [[REG1]], [{{.*}}[[REG2]], [[REG]], lsl #3]
>>>     +; CHECK: mov [[REG3:x[0-9]+]], [[REG2]]
>>>     +; CHECK: str [[REG1]], [{{.*}}[[REG3]], [[REG]], lsl #3]
>>>         %shr81 = lshr i32 %xor72, 9
>>>         %conv82 = zext i32 %shr81 to i64
>>>         %idxprom83 = and i64 %conv82, 255
>>>
>>>     Modified: llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll
>>>     URL:
>>>     http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>> AArch64/arm64-AdvSIMD-Scalar.ll?rev=311038&r1=311037&r2=311038&view=diff
>>>     <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen
>>> /AArch64/arm64-AdvSIMD-Scalar.ll?rev=311038&r1=311037&r2=311
>>> 038&view=diff>
>>>     ============================================================
>>> ==================
>>>     --- llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll
>>> (original)
>>>     +++ llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll Wed Aug
>>>     16 13:50:01 2017
>>>     @@ -8,15 +8,9 @@ define <2 x i64> @bar(<2 x i64> %a, <2 x
>>>       ; CHECK: add.2d        v[[REG:[0-9]+]], v0, v1
>>>       ; CHECK: add   d[[REG3:[0-9]+]], d[[REG]], d1
>>>       ; CHECK: sub   d[[REG2:[0-9]+]], d[[REG]], d1
>>>     -; Without advanced copy optimization, we end up with cross register
>>>     -; banks copies that cannot be coalesced.
>>>     -; CHECK-NOOPT: fmov [[COPY_REG3:x[0-9]+]], d[[REG3]]
>>>     -; With advanced copy optimization, we end up with just one copy
>>>     -; to insert the computed high part into the V register.
>>>     -; CHECK-OPT-NOT: fmov
>>>     +; CHECK-NOT: fmov
>>>       ; CHECK: fmov [[COPY_REG2:x[0-9]+]], d[[REG2]]
>>>     -; CHECK-NOOPT: fmov d0, [[COPY_REG3]]
>>>     -; CHECK-OPT-NOT: fmov
>>>     +; CHECK-NOT: fmov
>>>       ; CHECK: ins.d v0[1], [[COPY_REG2]]
>>>       ; CHECK-NEXT: ret
>>>       ;
>>>     @@ -24,11 +18,9 @@ define <2 x i64> @bar(<2 x i64> %a, <2 x
>>>       ; GENERIC: add v[[REG:[0-9]+]].2d, v0.2d, v1.2d
>>>       ; GENERIC: add d[[REG3:[0-9]+]], d[[REG]], d1
>>>       ; GENERIC: sub d[[REG2:[0-9]+]], d[[REG]], d1
>>>     -; GENERIC-NOOPT: fmov [[COPY_REG3:x[0-9]+]], d[[REG3]]
>>>     -; GENERIC-OPT-NOT: fmov
>>>     +; GENERIC-NOT: fmov
>>>       ; GENERIC: fmov [[COPY_REG2:x[0-9]+]], d[[REG2]]
>>>     -; GENERIC-NOOPT: fmov d0, [[COPY_REG3]]
>>>     -; GENERIC-OPT-NOT: fmov
>>>     +; GENERIC-NOT: fmov
>>>       ; GENERIC: ins v0.d[1], [[COPY_REG2]]
>>>       ; GENERIC-NEXT: ret
>>>         %add = add <2 x i64> %a, %b
>>>
>>>     Modified: llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
>>>     URL:
>>>     http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>> AArch64/arm64-zero-cycle-regmov.ll?rev=311038&r1=311037&r2=
>>> 311038&view=diff
>>>     <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen
>>> /AArch64/arm64-zero-cycle-regmov.ll?rev=311038&r1=311037&r2=
>>> 311038&view=diff>
>>>     ============================================================
>>> ==================
>>>     --- llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
>>>     (original)
>>>     +++ llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll Wed
>>>     Aug 16 13:50:01 2017
>>>     @@ -4,8 +4,10 @@
>>>       define i32 @t(i32 %a, i32 %b, i32 %c, i32 %d) nounwind ssp {
>>>       entry:
>>>       ; CHECK-LABEL: t:
>>>     -; CHECK: mov x0, [[REG1:x[0-9]+]]
>>>     -; CHECK: mov x1, [[REG2:x[0-9]+]]
>>>     +; CHECK: mov [[REG2:x[0-9]+]], x3
>>>     +; CHECK: mov [[REG1:x[0-9]+]], x2
>>>     +; CHECK: mov x0, x2
>>>     +; CHECK: mov x1, x3
>>>       ; CHECK: bl _foo
>>>       ; CHECK: mov x0, [[REG1]]
>>>       ; CHECK: mov x1, [[REG2]]
>>>
>>>     Modified: llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll
>>>     URL:
>>>     http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>> AArch64/f16-instructions.ll?rev=311038&r1=311037&r2=311038&view=diff
>>>     <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen
>>> /AArch64/f16-instructions.ll?rev=311038&r1=311037&r2=311038&view=diff>
>>>     ============================================================
>>> ==================
>>>     --- llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll (original)
>>>     +++ llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll Wed Aug 16
>>>     13:50:01 2017
>>>     @@ -350,7 +350,7 @@ else:
>>>
>>>       ; CHECK-LABEL: test_phi:
>>>       ; CHECK: mov  x[[PTR:[0-9]+]], x0
>>>     -; CHECK: ldr  h[[AB:[0-9]+]], [x[[PTR]]]
>>>     +; CHECK: ldr  h[[AB:[0-9]+]], [x0]
>>>       ; CHECK: [[LOOP:LBB[0-9_]+]]:
>>>       ; CHECK: mov.16b  v[[R:[0-9]+]], v[[AB]]
>>>       ; CHECK: ldr  h[[AB]], [x[[PTR]]]
>>>
>>>     Modified: llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll
>>>     URL:
>>>     http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>> AArch64/flags-multiuse.ll?rev=311038&r1=311037&r2=311038&view=diff
>>>     <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen
>>> /AArch64/flags-multiuse.ll?rev=311038&r1=311037&r2=311038&view=diff>
>>>     ============================================================
>>> ==================
>>>     --- llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll (original)
>>>     +++ llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll Wed Aug 16
>>>     13:50:01 2017
>>>     @@ -17,6 +17,9 @@ define i32 @test_multiflag(i32 %n, i32 %
>>>         %val = zext i1 %test to i32
>>>       ; CHECK: cset {{[xw][0-9]+}}, ne
>>>
>>>     +; CHECK: mov [[RHSCOPY:w[0-9]+]], [[RHS]]
>>>     +; CHECK: mov [[LHSCOPY:w[0-9]+]], [[LHS]]
>>>     +
>>>         store i32 %val, i32* @var
>>>
>>>         call void @bar()
>>>     @@ -25,7 +28,7 @@ define i32 @test_multiflag(i32 %n, i32 %
>>>         ; Currently, the comparison is emitted again. An MSR/MRS pair
>>>     would also be
>>>         ; acceptable, but assuming the call preserves NZCV is not.
>>>         br i1 %test, label %iftrue, label %iffalse
>>>     -; CHECK: cmp [[LHS]], [[RHS]]
>>>     +; CHECK: cmp [[LHSCOPY]], [[RHSCOPY]]
>>>       ; CHECK: b.eq
>>>
>>>       iftrue:
>>>
>>>     Modified: llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll
>>>     URL:
>>>     http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>> AArch64/merge-store-dependency.ll?rev=311038&r1=311037&r2=31
>>> 1038&view=diff
>>>     <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen
>>> /AArch64/merge-store-dependency.ll?rev=311038&r1=311037&r2=3
>>> 11038&view=diff>
>>>     ============================================================
>>> ==================
>>>     --- llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll
>>> (original)
>>>     +++ llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll Wed
>>>     Aug 16 13:50:01 2017
>>>     @@ -8,10 +8,9 @@
>>>       define void @test(%struct1* %fde, i32 %fd, void (i32, i32, i8*)*
>>>     %func, i8* %arg)  {
>>>       ;CHECK-LABEL: test
>>>       entry:
>>>     -; A53: mov [[DATA:w[0-9]+]], w1
>>>       ; A53: str q{{[0-9]+}}, {{.*}}
>>>       ; A53: str q{{[0-9]+}}, {{.*}}
>>>     -; A53: str [[DATA]], {{.*}}
>>>     +; A53: str w1, {{.*}}
>>>
>>>         %0 = bitcast %struct1* %fde to i8*
>>>         tail call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 40, i32
>>>     8, i1 false)
>>>
>>>     Modified: llvm/trunk/test/CodeGen/AArch64/neg-imm.ll
>>>     URL:
>>>     http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>> AArch64/neg-imm.ll?rev=311038&r1=311037&r2=311038&view=diff
>>>     <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen
>>> /AArch64/neg-imm.ll?rev=311038&r1=311037&r2=311038&view=diff>
>>>     ============================================================
>>> ==================
>>>     --- llvm/trunk/test/CodeGen/AArch64/neg-imm.ll (original)
>>>     +++ llvm/trunk/test/CodeGen/AArch64/neg-imm.ll Wed Aug 16 13:50:01
>>> 2017
>>>     @@ -7,8 +7,8 @@ declare void @foo(i32)
>>>       define void @test(i32 %px) {
>>>       ; CHECK_LABEL: test:
>>>       ; CHECK_LABEL: %entry
>>>     -; CHECK: subs
>>>     -; CHECK-NEXT: csel
>>>     +; CHECK: subs [[REG0:w[0-9]+]],
>>>     +; CHECK: csel {{w[0-9]+}}, wzr, [[REG0]]
>>>       entry:
>>>         %sub = add nsw i32 %px, -1
>>>         %cmp = icmp slt i32 %px, 1
>>>
>>>     Modified: llvm/trunk/test/CodeGen/AMDGPU/byval-frame-setup.ll
>>>     URL:
>>>     http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>> AMDGPU/byval-frame-setup.ll?rev=311038&r1=311037&r2=311038&view=diff
>>>     <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen
>>> /AMDGPU/byval-frame-setup.ll?rev=311038&r1=311037&r2=311038&view=diff>
>>>     ============================================================
>>> ==================
>>>     --- llvm/trunk/test/CodeGen/AMDGPU/byval-frame-setup.ll (original)
>>>     +++ llvm/trunk/test/CodeGen/AMDGPU/byval-frame-setup.ll Wed Aug 16
>>>     13:50:01 2017
>>>     @@ -127,20 +127,21 @@ entry:
>>>       }
>>>
>>>       ; GCN-LABEL: {{^}}call_void_func_byval_struct_kernel:
>>>     -; GCN: s_mov_b32 s33, s7
>>>     -; GCN: s_add_u32 s32, s33, 0xa00{{$}}
>>>     +; GCN: s_add_u32 s32, s7, 0xa00{{$}}
>>>
>>>       ; GCN-DAG: v_mov_b32_e32 [[NINE:v[0-9]+]], 9
>>>       ; GCN-DAG: v_mov_b32_e32 [[THIRTEEN:v[0-9]+]], 13
>>>     -; GCN-DAG: buffer_store_dword [[NINE]], off, s[0:3], s33 offset:8
>>>     -; GCN: buffer_store_dword [[THIRTEEN]], off, s[0:3], s33 offset:24
>>>     +; GCN-DAG: buffer_store_dword [[NINE]], off, s[0:3], s7 offset:8
>>>     +; GCN: buffer_store_dword [[THIRTEEN]], off, s[0:3], s7 offset:24
>>>     +
>>>     +; GCN: s_mov_b32 s33, s7
>>>
>>>       ; GCN-DAG: s_add_u32 s32, s32, 0x800{{$}}
>>>
>>>     -; GCN-DAG: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s33
>>>     offset:8
>>>     -; GCN-DAG: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s33
>>>     offset:12
>>>     -; GCN-DAG: buffer_load_dword [[LOAD2:v[0-9]+]], off, s[0:3], s33
>>>     offset:16
>>>     -; GCN-DAG: buffer_load_dword [[LOAD3:v[0-9]+]], off, s[0:3], s33
>>>     offset:20
>>>     +; GCN-DAG: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3],
>>>     s{{7|33}} offset:8
>>>     +; GCN-DAG: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3],
>>>     s{{7|33}} offset:12
>>>     +; GCN-DAG: buffer_load_dword [[LOAD2:v[0-9]+]], off, s[0:3],
>>>     s{{7|33}} offset:16
>>>     +; GCN-DAG: buffer_load_dword [[LOAD3:v[0-9]+]], off, s[0:3],
>>>     s{{7|33}} offset:20
>>>
>>>       ; GCN-DAG: buffer_store_dword [[LOAD0]], off, s[0:3], s32
>>>     offset:4{{$}}
>>>       ; GCN-DAG: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:8
>>>
>>>     Modified: llvm/trunk/test/CodeGen/AMDGPU/call-argument-types.ll
>>>     URL:
>>>     http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>> AMDGPU/call-argument-types.ll?rev=311038&r1=311037&r2=311038&view=diff
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170817/2e9ca200/attachment.html>


More information about the llvm-commits mailing list