[llvm] r314729 - Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Volkan Keles via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 3 15:34:35 PDT 2017
Hi Geoff,
In MachineCopyPropagation::updateForwardedCopyLiveInterval, OrigUseSeg might be nullptr. LiveRange::getSegmentContaining returns null if there is no live segment that contains the specified index. Could you check if this is expected?
Volkan
> On Oct 3, 2017, at 3:19 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>
> +Volkan who is looking into the ASan failure.
>
>> Le 3 oct. 2017 à 10:22, Geoff Berry <gberry at codeaurora.org> a écrit :
>>
>> Reverted in r314814.
>>
>> Diana, I'm going to do some more arm testing to see if I can find any failures outside of a stage2 build.
>>
>> Quentin, any info you can provide on the failure you're seeing would be appreciated.
>
> Volkan, could you share what you found so far?
>
> Thanks,
> Q
>
>>> On 10/3/2017 12:54 PM, Diana Picus wrote:
>>> Hi Geoff,
>>> I think this is also breaking one of our ARM bots:
>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost-neon/builds/2117
>>> I don't see the failure anymore with this reverted. Let me know if I
>>> can help investigate.
>>> Cheers,
>>> Diana
>>> On 3 October 2017 at 18:21, Geoff Berry via llvm-commits
>>> <llvm-commits at lists.llvm.org> wrote:
>>>> Hi Quentin,
>>>>
>>>> Sam's revert you referred to is from 4 weeks ago. My latest re-commit
>>>> doesn't seem to have been reverted. I'll take care of it.
>>>>
>>>>> On 10/3/2017 12:11 PM, Quentin Colombet wrote:
>>>>>
>>>>>
>>>>>> On Oct 3, 2017, at 9:05 AM, Geoff Berry <gberry at codeaurora.org
>>>>>> <mailto:gberry at codeaurora.org>> wrote:
>>>>>>
>>>>>> Hi Quentin,
>>>>>>
>>>>>> Yes, I'll revert shortly. The verifier issue sounds like an issue I
>>>>>> though I had worked around regarding forwarding to undef uses. Let me know
>>>>>> if I can help.
>>>>>
>>>>>
>>>>> Will do.
>>>>>
>>>>> Looks like we are not the only one having issue with the commit.
>>>>> Sam reverted it in r312490 already.
>>>>>
>>>>>>
>>>>>> -Geoff
>>>>>>
>>>>>>> On 10/3/2017 12:01 PM, Quentin Colombet wrote:
>>>>>>>
>>>>>>> Hi Geoff,
>>>>>>> I see the MachineVerifier complaining after this commit for our
>>>>>>> out-of-tree target. More over, address sanitizer is complaining about an
>>>>>>> invalid access.
>>>>>>> ==50770==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008
>>>>>>> (pc 0x000102e7e42e bp 0x700002f37170 sp 0x700002f36f60 T9)
>>>>>>> ==50770==The signal is caused by a READ memory access.
>>>>>>> ==50770==Hint: address points to the zero page.
>>>>>>> #0 0x102e7e42d in (anonymous
>>>>>>> namespace)::MachineCopyPropagation::forwardUses(llvm::MachineInstr&)
>>>>>>> MachineCopyPropagation.cpp:596
>>>>>>> #1 0x102e79c22 in (anonymous
>>>>>>> namespace)::MachineCopyPropagation::runOnMachineFunction(llvm::MachineFunction&)
>>>>>>> MachineCopyPropagation.cpp:806
>>>>>>> The MachineVerifier error is:
>>>>>>> *** Bad machine code: Instruction ending live segment doesn't read the
>>>>>>> register ***
>>>>>>> Could you revert while we investigate?
>>>>>>> Thanks,
>>>>>>> -Quentin
>>>>>>>>
>>>>>>>> On Oct 2, 2017, at 3:01 PM, Geoff Berry via llvm-commits
>>>>>>>> <llvm-commits at lists.llvm.org
>>>>>>>> <mailto:llvm-commits at lists.llvm.org><mailto:llvm-commits at lists.llvm.org>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Author: gberry
>>>>>>>> Date: Mon Oct 2 15:01:37 2017
>>>>>>>> New Revision: 314729
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project?rev=314729&view=rev
>>>>>>>> Log:
>>>>>>>> Re-enable "[MachineCopyPropagation] Extend pass to do COPY source
>>>>>>>> forwarding"
>>>>>>>>
>>>>>>>> Issues addressed since original review:
>>>>>>>> - Avoid bug in regalloc greedy/machine verifier when forwarding to use
>>>>>>>> in an instruction that re-defines the same virtual register.
>>>>>>>> - Fixed bug when forwarding to use in EarlyClobber instruction slot.
>>>>>>>> - Fixed incorrect forwarding to register definitions that showed up in
>>>>>>>> explicit_uses() iterator (e.g. in INLINEASM).
>>>>>>>> - Moved removal of dead instructions found by
>>>>>>>> LiveIntervals::shrinkToUses() outside of loop iterating over
>>>>>>>> instructions to avoid instructions being deleted while pointed to by
>>>>>>>> iterator.
>>>>>>>> - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
>>>>>>>> - The pass no longer forwards COPYs to physical register uses, since
>>>>>>>> doing so can break code that implicitly relies on the physical
>>>>>>>> register number of the use.
>>>>>>>> - The pass no longer forwards COPYs to undef uses, since doing so
>>>>>>>> can break the machine verifier by creating LiveRanges that don't
>>>>>>>> end on a use (since the undef operand is not considered a use).
>>>>>>>>
>>>>>>>> [MachineCopyPropagation] Extend pass to do COPY source forwarding
>>>>>>>>
>>>>>>>> This change extends MachineCopyPropagation to do COPY source
>>>>>>>> forwarding.
>>>>>>>>
>>>>>>>> This change also extends the MachineCopyPropagation pass to be able to
>>>>>>>> be run during register allocation, after physical registers have been
>>>>>>>> assigned, but before the virtual registers have been re-written, which
>>>>>>>> allows it to remove virtual register COPY LiveIntervals that become
>>>>>>>> dead
>>>>>>>> through the forwarding of all of their uses.
>>>>>>>>
>>>>>>>> Modified:
>>>>>>>> llvm/trunk/include/llvm/CodeGen/Passes.h
>>>>>>>> llvm/trunk/include/llvm/InitializePasses.h
>>>>>>>> llvm/trunk/lib/CodeGen/CodeGen.cpp
>>>>>>>> llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp
>>>>>>>> llvm/trunk/lib/CodeGen/TargetPassConfig.cpp
>>>>>>>> llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
>>>>>>>> llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll
>>>>>>>> llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
>>>>>>>> llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll
>>>>>>>> llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll
>>>>>>>> llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll
>>>>>>>> llvm/trunk/test/CodeGen/AArch64/neg-imm.ll
>>>>>>>> llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll
>>>>>>>> llvm/trunk/test/CodeGen/AMDGPU/mad-mix.ll
>>>>>>>> llvm/trunk/test/CodeGen/AMDGPU/ret.ll
>>>>>>>> llvm/trunk/test/CodeGen/ARM/atomic-op.ll
>>>>>>>> llvm/trunk/test/CodeGen/ARM/intrinsics-overflow.ll
>>>>>>>> llvm/trunk/test/CodeGen/ARM/swifterror.ll
>>>>>>>> llvm/trunk/test/CodeGen/Mips/llvm-ir/sub.ll
>>>>>>>> llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll
>>>>>>>> llvm/trunk/test/CodeGen/PowerPC/gpr-vsr-spill.ll
>>>>>>>> llvm/trunk/test/CodeGen/PowerPC/inlineasm-i64-reg.ll
>>>>>>>> llvm/trunk/test/CodeGen/PowerPC/opt-li-add-to-addi.ll
>>>>>>>> llvm/trunk/test/CodeGen/PowerPC/tail-dup-layout.ll
>>>>>>>> llvm/trunk/test/CodeGen/SPARC/atomics.ll
>>>>>>>> llvm/trunk/test/CodeGen/Thumb/thumb-shrink-wrapping.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/2006-03-01-InstrSchedBug.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/arg-copy-elide.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/avg.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/avx-load-store.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/avx512-bugfix-25270.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/avx512-calling-conv.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/avx512-mask-op.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/avx512-schedule.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/avx512bw-intrinsics-upgrade.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/buildvec-insertvec.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/combine-fcopysign.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/combine-shl.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/complex-fastmath.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/divide-by-constant.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/fmaxnum.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/fmf-flags.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/fminnum.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/fp128-i128.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/haddsub-2.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/haddsub-undef.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/half.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/inline-asm-fpstack.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/ipra-local-linkage.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/localescape.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/machine-cp.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/mul-i1024.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/mul-i512.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/mul128.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/mulvi32.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/pmul.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/powi.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/pr11334.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/pr29112.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/psubus.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/shrink-wrap-chkstk.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/sse1.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/sse3-avx-addsub-2.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/statepoint-live-in.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/statepoint-stack-usage.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vec_fp_to_int.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vec_minmax_sint.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vec_shift4.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vector-blend.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vector-idiv-sdiv-128.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vector-idiv-udiv-128.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vector-mul.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vector-rotate-128.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vector-sext.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vector-shift-ashr-128.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vector-shift-lshr-128.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vector-shift-shl-128.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vector-trunc-math.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vector-zext.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/vselect-minmax.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/widen_conv-3.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/widen_conv-4.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/x86-shrink-wrap-unwind.ll
>>>>>>>> llvm/trunk/test/CodeGen/X86/x86-shrink-wrapping.ll
>>>>>>>> llvm/trunk/test/DebugInfo/X86/live-debug-variables.ll
>>>>>>>> llvm/trunk/test/DebugInfo/X86/spill-nospill.ll
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/include/llvm/CodeGen/Passes.h
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/Passes.h?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/include/llvm/CodeGen/Passes.h (original)
>>>>>>>> +++ llvm/trunk/include/llvm/CodeGen/Passes.h Mon Oct 2 15:01:37 2017
>>>>>>>> @@ -278,6 +278,11 @@ namespace llvm {
>>>>>>>> /// MachineSinking - This pass performs sinking on machine
>>>>>>>> instructions.
>>>>>>>> extern char &MachineSinkingID;
>>>>>>>>
>>>>>>>> + /// MachineCopyPropagationPreRegRewrite - This pass performs copy
>>>>>>>> propagation
>>>>>>>> + /// on machine instructions after register allocation but before
>>>>>>>> virtual
>>>>>>>> + /// register re-writing..
>>>>>>>> + extern char &MachineCopyPropagationPreRegRewriteID;
>>>>>>>> +
>>>>>>>> /// MachineCopyPropagation - This pass performs copy propagation on
>>>>>>>> /// machine instructions.
>>>>>>>> extern char &MachineCopyPropagationID;
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/include/llvm/InitializePasses.h
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/InitializePasses.h?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/include/llvm/InitializePasses.h (original)
>>>>>>>> +++ llvm/trunk/include/llvm/InitializePasses.h Mon Oct 2 15:01:37 2017
>>>>>>>> @@ -233,6 +233,7 @@ void initializeMachineBranchProbabilityI
>>>>>>>> void initializeMachineCSEPass(PassRegistry&);
>>>>>>>> void initializeMachineCombinerPass(PassRegistry&);
>>>>>>>> void initializeMachineCopyPropagationPass(PassRegistry&);
>>>>>>>> +void initializeMachineCopyPropagationPreRegRewritePass(PassRegistry&);
>>>>>>>> void initializeMachineDominanceFrontierPass(PassRegistry&);
>>>>>>>> void initializeMachineDominatorTreePass(PassRegistry&);
>>>>>>>> void initializeMachineFunctionPrinterPassPass(PassRegistry&);
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/lib/CodeGen/CodeGen.cpp
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/CodeGen.cpp?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/lib/CodeGen/CodeGen.cpp (original)
>>>>>>>> +++ llvm/trunk/lib/CodeGen/CodeGen.cpp Mon Oct 2 15:01:37 2017
>>>>>>>> @@ -53,6 +53,7 @@ void llvm::initializeCodeGen(PassRegistr
>>>>>>>> initializeMachineCSEPass(Registry);
>>>>>>>> initializeMachineCombinerPass(Registry);
>>>>>>>> initializeMachineCopyPropagationPass(Registry);
>>>>>>>> + initializeMachineCopyPropagationPreRegRewritePass(Registry);
>>>>>>>> initializeMachineDominatorTreePass(Registry);
>>>>>>>> initializeMachineFunctionPrinterPassPass(Registry);
>>>>>>>> initializeMachineLICMPass(Registry);
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp (original)
>>>>>>>> +++ llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp Mon Oct 2
>>>>>>>> 15:01:37 2017
>>>>>>>> @@ -7,25 +7,71 @@
>>>>>>>> //
>>>>>>>>
>>>>>>>> //===----------------------------------------------------------------------===//
>>>>>>>> //
>>>>>>>> -// This is an extremely simple MachineInstr-level copy propagation
>>>>>>>> pass.
>>>>>>>> +// This is a simple MachineInstr-level copy forwarding pass. It may
>>>>>>>> be run at
>>>>>>>> +// two places in the codegen pipeline:
>>>>>>>> +// - After register allocation but before virtual registers have
>>>>>>>> been remapped
>>>>>>>> +// to physical registers.
>>>>>>>> +// - After physical register remapping.
>>>>>>>> +//
>>>>>>>> +// The optimizations done vary slightly based on whether virtual
>>>>>>>> registers are
>>>>>>>> +// still present. In both cases, this pass forwards the source of
>>>>>>>> COPYs to the
>>>>>>>> +// users of their destinations when doing so is legal. For example:
>>>>>>>> +//
>>>>>>>> +// %vreg1 = COPY %vreg0
>>>>>>>> +// ...
>>>>>>>> +// ... = OP %vreg1
>>>>>>>> +//
>>>>>>>> +// If
>>>>>>>> +// - the physical register assigned to %vreg0 has not been clobbered
>>>>>>>> by the
>>>>>>>> +// time of the use of %vreg1
>>>>>>>> +// - the register class constraints are satisfied
>>>>>>>> +// - the COPY def is the only value that reaches OP
>>>>>>>> +// then this pass replaces the above with:
>>>>>>>> +//
>>>>>>>> +// %vreg1 = COPY %vreg0
>>>>>>>> +// ...
>>>>>>>> +// ... = OP %vreg0
>>>>>>>> +//
>>>>>>>> +// and updates the relevant state required by VirtRegMap (e.g.
>>>>>>>> LiveIntervals).
>>>>>>>> +// COPYs whose LiveIntervals become dead as a result of this
>>>>>>>> forwarding (i.e. if
>>>>>>>> +// all uses of %vreg1 are changed to %vreg0) are removed.
>>>>>>>> +//
>>>>>>>> +// When being run with only physical registers, this pass will also
>>>>>>>> remove some
>>>>>>>> +// redundant COPYs. For example:
>>>>>>>> +//
>>>>>>>> +// %R1 = COPY %R0
>>>>>>>> +// ... // No clobber of %R1
>>>>>>>> +// %R0 = COPY %R1 <<< Removed
>>>>>>>> +//
>>>>>>>> +// or
>>>>>>>> +//
>>>>>>>> +// %R1 = COPY %R0
>>>>>>>> +// ... // No clobber of %R0
>>>>>>>> +// %R1 = COPY %R0 <<< Removed
>>>>>>>> //
>>>>>>>>
>>>>>>>> //===----------------------------------------------------------------------===//
>>>>>>>>
>>>>>>>> +#include "LiveDebugVariables.h"
>>>>>>>> #include "llvm/ADT/DenseMap.h"
>>>>>>>> #include "llvm/ADT/STLExtras.h"
>>>>>>>> #include "llvm/ADT/SetVector.h"
>>>>>>>> #include "llvm/ADT/SmallVector.h"
>>>>>>>> #include "llvm/ADT/Statistic.h"
>>>>>>>> #include "llvm/ADT/iterator_range.h"
>>>>>>>> +#include "llvm/CodeGen/LiveRangeEdit.h"
>>>>>>>> +#include "llvm/CodeGen/LiveStackAnalysis.h"
>>>>>>>> #include "llvm/CodeGen/MachineBasicBlock.h"
>>>>>>>> #include "llvm/CodeGen/MachineFunction.h"
>>>>>>>> #include "llvm/CodeGen/MachineFunctionPass.h"
>>>>>>>> #include "llvm/CodeGen/MachineInstr.h"
>>>>>>>> #include "llvm/CodeGen/MachineOperand.h"
>>>>>>>> #include "llvm/CodeGen/MachineRegisterInfo.h"
>>>>>>>> +#include "llvm/CodeGen/Passes.h"
>>>>>>>> +#include "llvm/CodeGen/VirtRegMap.h"
>>>>>>>> #include "llvm/MC/MCRegisterInfo.h"
>>>>>>>> #include "llvm/Pass.h"
>>>>>>>> #include "llvm/Support/Debug.h"
>>>>>>>> +#include "llvm/Support/DebugCounter.h"
>>>>>>>> #include "llvm/Support/raw_ostream.h"
>>>>>>>> #include "llvm/Target/TargetInstrInfo.h"
>>>>>>>> #include "llvm/Target/TargetRegisterInfo.h"
>>>>>>>> @@ -38,6 +84,9 @@ using namespace llvm;
>>>>>>>> #define DEBUG_TYPE "machine-cp"
>>>>>>>>
>>>>>>>> STATISTIC(NumDeletes, "Number of dead copies deleted");
>>>>>>>> +STATISTIC(NumCopyForwards, "Number of copy uses forwarded");
>>>>>>>> +DEBUG_COUNTER(FwdCounter, "machine-cp-fwd",
>>>>>>>> + "Controls which register COPYs are forwarded");
>>>>>>>>
>>>>>>>> namespace {
>>>>>>>>
>>>>>>>> @@ -45,19 +94,42 @@ using RegList = SmallVector<unsigned, 4>
>>>>>>>> using SourceMap = DenseMap<unsigned, RegList>;
>>>>>>>> using Reg2MIMap = DenseMap<unsigned, MachineInstr *>;
>>>>>>>>
>>>>>>>> - class MachineCopyPropagation : public MachineFunctionPass {
>>>>>>>> + class MachineCopyPropagation : public MachineFunctionPass,
>>>>>>>> + private LiveRangeEdit::Delegate {
>>>>>>>> const TargetRegisterInfo *TRI;
>>>>>>>> const TargetInstrInfo *TII;
>>>>>>>> - const MachineRegisterInfo *MRI;
>>>>>>>> + MachineRegisterInfo *MRI;
>>>>>>>> + MachineFunction *MF;
>>>>>>>> + SlotIndexes *Indexes;
>>>>>>>> + LiveIntervals *LIS;
>>>>>>>> + const VirtRegMap *VRM;
>>>>>>>> + // True if this pass being run before virtual registers are
>>>>>>>> remapped to
>>>>>>>> + // physical ones.
>>>>>>>> + bool PreRegRewrite;
>>>>>>>> + bool NoSubRegLiveness;
>>>>>>>> +
>>>>>>>> + protected:
>>>>>>>> + MachineCopyPropagation(char &ID, bool PreRegRewrite)
>>>>>>>> + : MachineFunctionPass(ID), PreRegRewrite(PreRegRewrite) {}
>>>>>>>>
>>>>>>>> public:
>>>>>>>> static char ID; // Pass identification, replacement for typeid
>>>>>>>>
>>>>>>>> - MachineCopyPropagation() : MachineFunctionPass(ID) {
>>>>>>>> + MachineCopyPropagation() : MachineCopyPropagation(ID, false) {
>>>>>>>>
>>>>>>>> initializeMachineCopyPropagationPass(*PassRegistry::getPassRegistry());
>>>>>>>> }
>>>>>>>>
>>>>>>>> void getAnalysisUsage(AnalysisUsage &AU) const override {
>>>>>>>> + if (PreRegRewrite) {
>>>>>>>> + AU.addRequired<SlotIndexes>();
>>>>>>>> + AU.addPreserved<SlotIndexes>();
>>>>>>>> + AU.addRequired<LiveIntervals>();
>>>>>>>> + AU.addPreserved<LiveIntervals>();
>>>>>>>> + AU.addRequired<VirtRegMap>();
>>>>>>>> + AU.addPreserved<VirtRegMap>();
>>>>>>>> + AU.addPreserved<LiveDebugVariables>();
>>>>>>>> + AU.addPreserved<LiveStacks>();
>>>>>>>> + }
>>>>>>>> AU.setPreservesCFG();
>>>>>>>> MachineFunctionPass::getAnalysisUsage(AU);
>>>>>>>> }
>>>>>>>> @@ -65,6 +137,10 @@ using Reg2MIMap = DenseMap<unsigned, Mac
>>>>>>>> bool runOnMachineFunction(MachineFunction &MF) override;
>>>>>>>>
>>>>>>>> MachineFunctionProperties getRequiredProperties() const override {
>>>>>>>> + if (PreRegRewrite)
>>>>>>>> + return MachineFunctionProperties()
>>>>>>>> + .set(MachineFunctionProperties::Property::NoPHIs)
>>>>>>>> + .set(MachineFunctionProperties::Property::TracksLiveness);
>>>>>>>> return MachineFunctionProperties().set(
>>>>>>>> MachineFunctionProperties::Property::NoVRegs);
>>>>>>>> }
>>>>>>>> @@ -74,9 +150,33 @@ using Reg2MIMap = DenseMap<unsigned, Mac
>>>>>>>> void ReadRegister(unsigned Reg);
>>>>>>>> void CopyPropagateBlock(MachineBasicBlock &MBB);
>>>>>>>> bool eraseIfRedundant(MachineInstr &Copy, unsigned Src, unsigned
>>>>>>>> Def);
>>>>>>>> + unsigned getPhysReg(unsigned Reg, unsigned SubReg);
>>>>>>>> + unsigned getPhysReg(const MachineOperand &Opnd) {
>>>>>>>> + return getPhysReg(Opnd.getReg(), Opnd.getSubReg());
>>>>>>>> + }
>>>>>>>> + unsigned getFullPhysReg(const MachineOperand &Opnd) {
>>>>>>>> + return getPhysReg(Opnd.getReg(), 0);
>>>>>>>> + }
>>>>>>>> + void forwardUses(MachineInstr &MI);
>>>>>>>> + bool isForwardableRegClassCopy(const MachineInstr &Copy,
>>>>>>>> + const MachineInstr &UseI);
>>>>>>>> + std::tuple<unsigned, unsigned, bool>
>>>>>>>> + checkUseSubReg(const MachineOperand &CopySrc, const MachineOperand
>>>>>>>> &MOUse);
>>>>>>>> + bool hasImplicitOverlap(const MachineInstr &MI, const
>>>>>>>> MachineOperand &Use);
>>>>>>>> + void narrowRegClass(const MachineInstr &MI, const MachineOperand
>>>>>>>> &MOUse,
>>>>>>>> + unsigned NewUseReg, unsigned NewUseSubReg);
>>>>>>>> + void updateForwardedCopyLiveInterval(const MachineInstr &Copy,
>>>>>>>> + const MachineInstr &UseMI,
>>>>>>>> + bool UseIsEarlyClobber,
>>>>>>>> + unsigned OrigUseReg,
>>>>>>>> + unsigned NewUseReg,
>>>>>>>> + unsigned NewUseSubReg);
>>>>>>>> + /// LiveRangeEdit callback for eliminateDeadDefs().
>>>>>>>> + void LRE_WillEraseInstruction(MachineInstr *MI) override;
>>>>>>>>
>>>>>>>> /// Candidates for deletion.
>>>>>>>> SmallSetVector<MachineInstr*, 8> MaybeDeadCopies;
>>>>>>>> + SmallVector<MachineInstr*, 8> ShrunkDeadInsts;
>>>>>>>>
>>>>>>>> /// Def -> available copies map.
>>>>>>>> Reg2MIMap AvailCopyMap;
>>>>>>>> @@ -90,6 +190,14 @@ using Reg2MIMap = DenseMap<unsigned, Mac
>>>>>>>> bool Changed;
>>>>>>>> };
>>>>>>>>
>>>>>>>> + class MachineCopyPropagationPreRegRewrite : public
>>>>>>>> MachineCopyPropagation {
>>>>>>>> + public:
>>>>>>>> + static char ID; // Pass identification, replacement for typeid
>>>>>>>> + MachineCopyPropagationPreRegRewrite()
>>>>>>>> + : MachineCopyPropagation(ID, true) {
>>>>>>>> +
>>>>>>>> initializeMachineCopyPropagationPreRegRewritePass(*PassRegistry::getPassRegistry());
>>>>>>>> + }
>>>>>>>> + };
>>>>>>>> } // end anonymous namespace
>>>>>>>>
>>>>>>>> char MachineCopyPropagation::ID = 0;
>>>>>>>> @@ -99,6 +207,29 @@ char &llvm::MachineCopyPropagationID = M
>>>>>>>> INITIALIZE_PASS(MachineCopyPropagation, DEBUG_TYPE,
>>>>>>>> "Machine Copy Propagation Pass", false, false)
>>>>>>>>
>>>>>>>> +/// We have two separate passes that are very similar, the only
>>>>>>>> difference being
>>>>>>>> +/// where they are meant to be run in the pipeline. This is done for
>>>>>>>> several
>>>>>>>> +/// reasons:
>>>>>>>> +/// - the two passes have different dependencies
>>>>>>>> +/// - some targets want to disable the later run of this pass, but not
>>>>>>>> the
>>>>>>>> +/// earlier one (e.g. NVPTX and WebAssembly)
>>>>>>>> +/// - it allows for easier debugging via llc
>>>>>>>> +
>>>>>>>> +char MachineCopyPropagationPreRegRewrite::ID = 0;
>>>>>>>> +char &llvm::MachineCopyPropagationPreRegRewriteID =
>>>>>>>> MachineCopyPropagationPreRegRewrite::ID;
>>>>>>>> +
>>>>>>>> +INITIALIZE_PASS_BEGIN(MachineCopyPropagationPreRegRewrite,
>>>>>>>> + "machine-cp-prerewrite",
>>>>>>>> + "Machine Copy Propagation Pre-Register Rewrite
>>>>>>>> Pass",
>>>>>>>> + false, false)
>>>>>>>> +INITIALIZE_PASS_DEPENDENCY(SlotIndexes)
>>>>>>>> +INITIALIZE_PASS_DEPENDENCY(LiveIntervals)
>>>>>>>> +INITIALIZE_PASS_DEPENDENCY(VirtRegMap)
>>>>>>>> +INITIALIZE_PASS_END(MachineCopyPropagationPreRegRewrite,
>>>>>>>> + "machine-cp-prerewrite",
>>>>>>>> + "Machine Copy Propagation Pre-Register Rewrite
>>>>>>>> Pass", false,
>>>>>>>> + false)
>>>>>>>> +
>>>>>>>> /// Remove any entry in \p Map where the register is a subregister or
>>>>>>>> equal to
>>>>>>>> /// a register contained in \p Regs.
>>>>>>>> static void removeRegsFromMap(Reg2MIMap &Map, const RegList &Regs,
>>>>>>>> @@ -139,6 +270,10 @@ void MachineCopyPropagation::ClobberRegi
>>>>>>>> }
>>>>>>>>
>>>>>>>> void MachineCopyPropagation::ReadRegister(unsigned Reg) {
>>>>>>>> + // We don't track MaybeDeadCopies when running pre-VirtRegRewriter.
>>>>>>>> + if (PreRegRewrite)
>>>>>>>> + return;
>>>>>>>> +
>>>>>>>> // If 'Reg' is defined by a copy, the copy is no longer a candidate
>>>>>>>> // for elimination.
>>>>>>>> for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI) {
>>>>>>>> @@ -170,6 +305,46 @@ static bool isNopCopy(const MachineInstr
>>>>>>>> return SubIdx == TRI->getSubRegIndex(PreviousDef, Def);
>>>>>>>> }
>>>>>>>>
>>>>>>>> +/// Return the physical register assigned to \p Reg if it is a virtual
>>>>>>>> register,
>>>>>>>> +/// otherwise just return the physical reg from the operand itself.
>>>>>>>> +///
>>>>>>>> +/// If \p SubReg is 0 then return the full physical register assigned
>>>>>>>> to the
>>>>>>>> +/// virtual register ignoring subregs. If we aren't tracking sub-reg
>>>>>>>> liveness
>>>>>>>> +/// then we need to use this to be more conservative with clobbers by
>>>>>>>> killing
>>>>>>>> +/// all super reg and their sub reg COPYs as well. This is to prevent
>>>>>>>> COPY
>>>>>>>> +/// forwarding in cases like the following:
>>>>>>>> +///
>>>>>>>> +/// %vreg2 = COPY %vreg1:sub1
>>>>>>>> +/// %vreg3 = COPY %vreg1:sub0
>>>>>>>> +/// ... = OP1 %vreg2
>>>>>>>> +/// ... = OP2 %vreg3
>>>>>>>> +///
>>>>>>>> +/// After forward %vreg2 (assuming this is the last use of %vreg1) and
>>>>>>>> +/// VirtRegRewriter adding kill markers we have:
>>>>>>>> +///
>>>>>>>> +/// %vreg3 = COPY %vreg1:sub0
>>>>>>>> +/// ... = OP1 %vreg1:sub1<kill>
>>>>>>>> +/// ... = OP2 %vreg3
>>>>>>>> +///
>>>>>>>> +/// If %vreg3 is assigned to a sub-reg of %vreg1, then after rewriting
>>>>>>>> we have:
>>>>>>>> +///
>>>>>>>> +/// ... = OP1 R0:sub1, R0<imp-use,kill>
>>>>>>>> +/// ... = OP2 R0:sub0
>>>>>>>> +///
>>>>>>>> +/// and the use of R0 by OP2 will not have a valid definition.
>>>>>>>> +unsigned MachineCopyPropagation::getPhysReg(unsigned Reg, unsigned
>>>>>>>> SubReg) {
>>>>>>>> +
>>>>>>>> + // Physical registers cannot have subregs.
>>>>>>>> + if (!TargetRegisterInfo::isVirtualRegister(Reg))
>>>>>>>> + return Reg;
>>>>>>>> +
>>>>>>>> + assert(PreRegRewrite && "Unexpected virtual register encountered");
>>>>>>>> + Reg = VRM->getPhys(Reg);
>>>>>>>> + if (SubReg && !NoSubRegLiveness)
>>>>>>>> + Reg = TRI->getSubReg(Reg, SubReg);
>>>>>>>> + return Reg;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> /// Remove instruction \p Copy if there exists a previous copy that
>>>>>>>> copies the
>>>>>>>> /// register \p Src to the register \p Def; This may happen indirectly
>>>>>>>> by
>>>>>>>> /// copying the super registers.
>>>>>>>> @@ -207,6 +382,397 @@ bool MachineCopyPropagation::eraseIfRedu
>>>>>>>> return true;
>>>>>>>> }
>>>>>>>>
>>>>>>>> +
>>>>>>>> +/// Decide whether we should forward the destination of \param Copy to
>>>>>>>> its use
>>>>>>>> +/// in \param UseI based on the register class of the Copy operands.
>>>>>>>> Same-class
>>>>>>>> +/// COPYs are always accepted by this function, but cross-class COPYs
>>>>>>>> are only
>>>>>>>> +/// accepted if they are forwarded to another COPY with the operand
>>>>>>>> register
>>>>>>>> +/// classes reversed. For example:
>>>>>>>> +///
>>>>>>>> +/// RegClassA = COPY RegClassB // Copy parameter
>>>>>>>> +/// ...
>>>>>>>> +/// RegClassB = COPY RegClassA // UseI parameter
>>>>>>>> +///
>>>>>>>> +/// which after forwarding becomes
>>>>>>>> +///
>>>>>>>> +/// RegClassA = COPY RegClassB
>>>>>>>> +/// ...
>>>>>>>> +/// RegClassB = COPY RegClassB
>>>>>>>> +///
>>>>>>>> +/// so we have reduced the number of cross-class COPYs and potentially
>>>>>>>> +/// introduced a no COPY that can be removed.
>>>>>>>> +bool MachineCopyPropagation::isForwardableRegClassCopy(
>>>>>>>> + const MachineInstr &Copy, const MachineInstr &UseI) {
>>>>>>>> + auto isCross = [&](const MachineOperand &Dst, const MachineOperand
>>>>>>>> &Src) {
>>>>>>>> + unsigned DstReg = Dst.getReg();
>>>>>>>> + unsigned SrcPhysReg = getPhysReg(Src);
>>>>>>>> + const TargetRegisterClass *DstRC;
>>>>>>>> + if (TargetRegisterInfo::isVirtualRegister(DstReg)) {
>>>>>>>> + DstRC = MRI->getRegClass(DstReg);
>>>>>>>> + unsigned DstSubReg = Dst.getSubReg();
>>>>>>>> + if (DstSubReg)
>>>>>>>> + SrcPhysReg = TRI->getMatchingSuperReg(SrcPhysReg, DstSubReg,
>>>>>>>> DstRC);
>>>>>>>> + } else
>>>>>>>> + DstRC = TRI->getMinimalPhysRegClass(DstReg);
>>>>>>>> +
>>>>>>>> + return !DstRC->contains(SrcPhysReg);
>>>>>>>> + };
>>>>>>>> +
>>>>>>>> + const MachineOperand &CopyDst = Copy.getOperand(0);
>>>>>>>> + const MachineOperand &CopySrc = Copy.getOperand(1);
>>>>>>>> +
>>>>>>>> + if (!isCross(CopyDst, CopySrc))
>>>>>>>> + return true;
>>>>>>>> +
>>>>>>>> + if (!UseI.isCopy())
>>>>>>>> + return false;
>>>>>>>> +
>>>>>>>> + assert(getFullPhysReg(UseI.getOperand(1)) ==
>>>>>>>> getFullPhysReg(CopyDst));
>>>>>>>> + return !isCross(UseI.getOperand(0), CopySrc);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/// Check that the subregs on the copy source operand (\p CopySrc) and
>>>>>>>> the use
>>>>>>>> +/// operand to be forwarded to (\p MOUse) are compatible with doing
>>>>>>>> the
>>>>>>>> +/// forwarding. Also computes the new register and subregister to be
>>>>>>>> used in
>>>>>>>> +/// the forwarded-to instruction.
>>>>>>>> +std::tuple<unsigned, unsigned, bool>
>>>>>>>> MachineCopyPropagation::checkUseSubReg(
>>>>>>>> + const MachineOperand &CopySrc, const MachineOperand &MOUse) {
>>>>>>>> + unsigned NewUseReg = CopySrc.getReg();
>>>>>>>> + unsigned NewUseSubReg;
>>>>>>>> +
>>>>>>>> + if (TargetRegisterInfo::isPhysicalRegister(NewUseReg)) {
>>>>>>>> + // If MOUse is a virtual reg, we need to apply it to the new
>>>>>>>> physical reg
>>>>>>>> + // we're going to replace it with.
>>>>>>>> + if (MOUse.getSubReg())
>>>>>>>> + NewUseReg = TRI->getSubReg(NewUseReg, MOUse.getSubReg());
>>>>>>>> + // If the original use subreg isn't valid on the new src reg, we
>>>>>>>> can't
>>>>>>>> + // forward it here.
>>>>>>>> + if (!NewUseReg)
>>>>>>>> + return std::make_tuple(0, 0, false);
>>>>>>>> + NewUseSubReg = 0;
>>>>>>>> + } else {
>>>>>>>> + // %v1 = COPY %v2:sub1
>>>>>>>> + // USE %v1:sub2
>>>>>>>> + // The new use is %v2:sub1:sub2
>>>>>>>> + NewUseSubReg =
>>>>>>>> + TRI->composeSubRegIndices(CopySrc.getSubReg(),
>>>>>>>> MOUse.getSubReg());
>>>>>>>> + // Check that NewUseSubReg is valid on NewUseReg
>>>>>>>> + if (NewUseSubReg &&
>>>>>>>> + !TRI->getSubClassWithSubReg(MRI->getRegClass(NewUseReg),
>>>>>>>> NewUseSubReg))
>>>>>>>> + return std::make_tuple(0, 0, false);
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + return std::make_tuple(NewUseReg, NewUseSubReg, true);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/// Check that \p MI does not have implicit uses that overlap with
>>>>>>>> it's \p Use
>>>>>>>> +/// operand (the register being replaced), since these can sometimes
>>>>>>>> be
>>>>>>>> +/// implicitly tied to other operands. For example, on AMDGPU:
>>>>>>>> +///
>>>>>>>> +/// V_MOVRELS_B32_e32 %VGPR2, %M0<imp-use>, %EXEC<imp-use>,
>>>>>>>> %VGPR2_VGPR3_VGPR4_VGPR5<imp-use>
>>>>>>>> +///
>>>>>>>> +/// the %VGPR2 is implicitly tied to the larger reg operand, but we
>>>>>>>> have no
>>>>>>>> +/// way of knowing we need to update the latter when updating the
>>>>>>>> former.
>>>>>>>> +bool MachineCopyPropagation::hasImplicitOverlap(const MachineInstr
>>>>>>>> &MI,
>>>>>>>> + const MachineOperand
>>>>>>>> &Use) {
>>>>>>>> + if (!TargetRegisterInfo::isPhysicalRegister(Use.getReg()))
>>>>>>>> + return false;
>>>>>>>> +
>>>>>>>> + for (const MachineOperand &MIUse : MI.uses())
>>>>>>>> + if (&MIUse != &Use && MIUse.isReg() && MIUse.isImplicit() &&
>>>>>>>> + TRI->regsOverlap(Use.getReg(), MIUse.getReg()))
>>>>>>>> + return true;
>>>>>>>> +
>>>>>>>> + return false;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/// Narrow the register class of the forwarded vreg so it matches any
>>>>>>>> +/// instruction constraints. \p MI is the instruction being forwarded
>>>>>>>> to. \p
>>>>>>>> +/// MOUse is the operand being replaced in \p MI (which hasn't yet
>>>>>>>> been updated
>>>>>>>> +/// at the time this function is called). \p NewUseReg and \p
>>>>>>>> NewUseSubReg are
>>>>>>>> +/// what the \p MOUse will be changed to after forwarding.
>>>>>>>> +///
>>>>>>>> +/// If we are forwarding
>>>>>>>> +/// A:RCA = COPY B:RCB
>>>>>>>> +/// into
>>>>>>>> +/// ... = OP A:RCA
>>>>>>>> +///
>>>>>>>> +/// then we need to narrow the register class of B so that it is a
>>>>>>>> subclass
>>>>>>>> +/// of RCA so that it meets the instruction register class
>>>>>>>> constraints.
>>>>>>>> +void MachineCopyPropagation::narrowRegClass(const MachineInstr &MI,
>>>>>>>> + const MachineOperand
>>>>>>>> &MOUse,
>>>>>>>> + unsigned NewUseReg,
>>>>>>>> + unsigned NewUseSubReg) {
>>>>>>>> + if (!TargetRegisterInfo::isVirtualRegister(NewUseReg))
>>>>>>>> + return;
>>>>>>>> +
>>>>>>>> + // Make sure the virtual reg class allows the subreg.
>>>>>>>> + if (NewUseSubReg) {
>>>>>>>> + const TargetRegisterClass *CurUseRC = MRI->getRegClass(NewUseReg);
>>>>>>>> + const TargetRegisterClass *NewUseRC =
>>>>>>>> + TRI->getSubClassWithSubReg(CurUseRC, NewUseSubReg);
>>>>>>>> + if (CurUseRC != NewUseRC) {
>>>>>>>> + DEBUG(dbgs() << "MCP: Setting regclass of " <<
>>>>>>>> PrintReg(NewUseReg, TRI)
>>>>>>>> + << " to " << TRI->getRegClassName(NewUseRC) <<
>>>>>>>> "\n");
>>>>>>>> + MRI->setRegClass(NewUseReg, NewUseRC);
>>>>>>>> + }
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + unsigned MOUseOpNo = &MOUse - &MI.getOperand(0);
>>>>>>>> + const TargetRegisterClass *InstRC =
>>>>>>>> + TII->getRegClass(MI.getDesc(), MOUseOpNo, TRI, *MF);
>>>>>>>> + if (InstRC) {
>>>>>>>> + const TargetRegisterClass *CurUseRC = MRI->getRegClass(NewUseReg);
>>>>>>>> + if (NewUseSubReg)
>>>>>>>> + InstRC = TRI->getMatchingSuperRegClass(CurUseRC, InstRC,
>>>>>>>> NewUseSubReg);
>>>>>>>> + if (!InstRC->hasSubClassEq(CurUseRC)) {
>>>>>>>> + const TargetRegisterClass *NewUseRC =
>>>>>>>> + TRI->getCommonSubClass(InstRC, CurUseRC);
>>>>>>>> + DEBUG(dbgs() << "MCP: Setting regclass of " <<
>>>>>>>> PrintReg(NewUseReg, TRI)
>>>>>>>> + << " to " << TRI->getRegClassName(NewUseRC) <<
>>>>>>>> "\n");
>>>>>>>> + MRI->setRegClass(NewUseReg, NewUseRC);
>>>>>>>> + }
>>>>>>>> + }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/// Update the LiveInterval information to reflect the destination of
>>>>>>>> \p Copy
>>>>>>>> +/// being forwarded to a use in \p UseMI. \p OrigUseReg is the
>>>>>>>> register being
>>>>>>>> +/// forwarded through. It should be the destination register of \p
>>>>>>>> Copy and has
>>>>>>>> +/// already been replaced in \p UseMI at the point this function is
>>>>>>>> called. \p
>>>>>>>> +/// NewUseReg and \p NewUseSubReg are the register and subregister
>>>>>>>> being
>>>>>>>> +/// forwarded. They should be the source register of the \p Copy and
>>>>>>>> should be
>>>>>>>> +/// the value of the \p UseMI operand being forwarded at the point
>>>>>>>> this function
>>>>>>>> +/// is called. \p UseIsEarlyClobber is true if the use being
>>>>>>>> re-written is in
>>>>>>>> +/// the EarlyClobber slot index (as opposed to the register slot of
>>>>>>>> most
>>>>>>>> +/// register operands).
>>>>>>>> +void MachineCopyPropagation::updateForwardedCopyLiveInterval(
>>>>>>>> + const MachineInstr &Copy, const MachineInstr &UseMI, bool
>>>>>>>> UseIsEarlyClobber,
>>>>>>>> + unsigned OrigUseReg, unsigned NewUseReg, unsigned NewUseSubReg) {
>>>>>>>> +
>>>>>>>> + assert(TRI->isSubRegisterEq(getPhysReg(OrigUseReg, 0),
>>>>>>>> + getFullPhysReg(Copy.getOperand(0))) &&
>>>>>>>> + "OrigUseReg mismatch");
>>>>>>>> + assert(TRI->isSubRegisterEq(getFullPhysReg(Copy.getOperand(1)),
>>>>>>>> + getPhysReg(NewUseReg, 0)) &&
>>>>>>>> + "NewUseReg mismatch");
>>>>>>>> +
>>>>>>>> + // Extend live range starting from COPY early-clobber slot, since
>>>>>>>> that
>>>>>>>> + // is where the original src live range ends.
>>>>>>>> + SlotIndex CopyUseIdx =
>>>>>>>> + Indexes->getInstructionIndex(Copy).getRegSlot(true
>>>>>>>> /*=EarlyClobber*/);
>>>>>>>> + SlotIndex UseEndIdx =
>>>>>>>> +
>>>>>>>> Indexes->getInstructionIndex(UseMI).getRegSlot(UseIsEarlyClobber);
>>>>>>>> + if (TargetRegisterInfo::isVirtualRegister(NewUseReg)) {
>>>>>>>> + LiveInterval &LI = LIS->getInterval(NewUseReg);
>>>>>>>> + LI.extendInBlock(CopyUseIdx, UseEndIdx);
>>>>>>>> + LaneBitmask UseMask = TRI->getSubRegIndexLaneMask(NewUseSubReg);
>>>>>>>> + for (auto &S : LI.subranges())
>>>>>>>> + if ((S.LaneMask & UseMask).any() && S.find(CopyUseIdx))
>>>>>>>> + S.extendInBlock(CopyUseIdx, UseEndIdx);
>>>>>>>> + } else {
>>>>>>>> + assert(NewUseSubReg == 0 && "Unexpected subreg on physical
>>>>>>>> register!");
>>>>>>>> + for (MCRegUnitIterator UI(NewUseReg, TRI); UI.isValid(); ++UI) {
>>>>>>>> + LiveRange &LR = LIS->getRegUnit(*UI);
>>>>>>>> + LR.extendInBlock(CopyUseIdx, UseEndIdx);
>>>>>>>> + }
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + if (!TargetRegisterInfo::isVirtualRegister(OrigUseReg))
>>>>>>>> + return;
>>>>>>>> +
>>>>>>>> + // Shrink the live-range for the old use reg if the forwarded use
>>>>>>>> was it's
>>>>>>>> + // last use.
>>>>>>>> + LiveInterval &OrigUseLI = LIS->getInterval(OrigUseReg);
>>>>>>>> +
>>>>>>>> + // Can happen for undef uses.
>>>>>>>> + if (OrigUseLI.empty())
>>>>>>>> + return;
>>>>>>>> +
>>>>>>>> + SlotIndex CopyDefIdx =
>>>>>>>> Indexes->getInstructionIndex(Copy).getRegSlot();
>>>>>>>> + const LiveRange::Segment *OrigUseSeg =
>>>>>>>> + OrigUseLI.getSegmentContaining(CopyDefIdx);
>>>>>>>> +
>>>>>>>> + // Only shrink if forwarded use is the end of a segment.
>>>>>>>> + if (OrigUseSeg->end != UseEndIdx)
>>>>>>>> + return;
>>>>>>>> +
>>>>>>>> + LIS->shrinkToUses(&OrigUseLI, &ShrunkDeadInsts);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void MachineCopyPropagation::LRE_WillEraseInstruction(MachineInstr
>>>>>>>> *MI) {
>>>>>>>> + Changed = true;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/// Look for available copies whose destination register is used by \p
>>>>>>>> MI and
>>>>>>>> +/// replace the use in \p MI with the copy's source register.
>>>>>>>> +void MachineCopyPropagation::forwardUses(MachineInstr &MI) {
>>>>>>>> + // We can't generally forward uses after virtual registers have been
>>>>>>>> renamed
>>>>>>>> + // because some targets generate code that has implicit dependencies
>>>>>>>> on the
>>>>>>>> + // physical register numbers. For example, in PowerPC, when
>>>>>>>> spilling
>>>>>>>> + // condition code registers, the following code pattern is
>>>>>>>> generated:
>>>>>>>> + //
>>>>>>>> + // %CR7 = COPY %CR0
>>>>>>>> + // %R6 = MFOCRF %CR7
>>>>>>>> + // %R6 = RLWINM %R6, 29, 31, 31
>>>>>>>> + //
>>>>>>>> + // where the shift amount in the RLWINM instruction depends on the
>>>>>>>> source
>>>>>>>> + // register number of the MFOCRF instruction. If we were to forward
>>>>>>>> %CR0 to
>>>>>>>> + // the MFOCRF instruction, the shift amount would no longer be
>>>>>>>> correct.
>>>>>>>> + //
>>>>>>>> + // FIXME: It may be possible to define a target hook that checks the
>>>>>>>> register
>>>>>>>> + // class or user opcode and allows some cases, but prevents cases
>>>>>>>> like the
>>>>>>>> + // above from being broken to enable later register copy forwarding.
>>>>>>>> + if (!PreRegRewrite)
>>>>>>>> + return;
>>>>>>>> +
>>>>>>>> + if (AvailCopyMap.empty())
>>>>>>>> + return;
>>>>>>>> +
>>>>>>>> + // Look for non-tied explicit vreg uses that have an active COPY
>>>>>>>> + // instruction that defines the physical register allocated to them.
>>>>>>>> + // Replace the vreg with the source of the active COPY.
>>>>>>>> + for (MachineOperand &MOUse : MI.explicit_uses()) {
>>>>>>>> + // Don't forward into undef use operands since doing so can cause
>>>>>>>> problems
>>>>>>>> + // with the machine verifier, since it doesn't treat undef reads
>>>>>>>> as reads,
>>>>>>>> + // so we can end up with a live range that ends on an undef read,
>>>>>>>> leading to
>>>>>>>> + // an error that the live range doesn't end on a read of the live
>>>>>>>> range
>>>>>>>> + // register.
>>>>>>>> + if (!MOUse.isReg() || MOUse.isTied() || MOUse.isUndef() ||
>>>>>>>> MOUse.isDef())
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + unsigned UseReg = MOUse.getReg();
>>>>>>>> + if (!UseReg)
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + // See comment above check for !PreRegRewrite regarding forwarding
>>>>>>>> changing
>>>>>>>> + // physical registers.
>>>>>>>> + if (!TargetRegisterInfo::isVirtualRegister(UseReg))
>>>>>>>> + continue;
>>>>>>>> + else {
>>>>>>>> + // FIXME: Don't forward COPYs to a use that is on an instruction
>>>>>>>> that
>>>>>>>> + // re-defines the same virtual register. This leads to machine
>>>>>>>> + // verification failures because of a bug in the greedy
>>>>>>>> + // allocator/verifier. The bug is that just after greedy
>>>>>>>> regalloc, we can
>>>>>>>> + // end up with code that looks like:
>>>>>>>> + //
>>>>>>>> + // %vreg1<def> = ...
>>>>>>>> + // ...
>>>>>>>> + // ... = %vreg1
>>>>>>>> + // ...
>>>>>>>> + // %vreg1<def> = %vreg1
>>>>>>>> + // ...
>>>>>>>> + //
>>>>>>>> + // verifyLiveInterval() accepts this code as valid since it sees
>>>>>>>> the
>>>>>>>> + // second def as part of the same live interval component
>>>>>>>> because
>>>>>>>> + // ConnectedVNInfoEqClasses::Classify() sees this second def as
>>>>>>>> a
>>>>>>>> + // "two-addr" redefinition, even though the def and source
>>>>>>>> operands are
>>>>>>>> + // not tied.
>>>>>>>> + //
>>>>>>>> + // If we replace just the second use of %vreg1 in the above
>>>>>>>> code, then we
>>>>>>>> + // end up with:
>>>>>>>> + //
>>>>>>>> + // ...
>>>>>>>> + // %vreg1<def> = ...
>>>>>>>> + // ...
>>>>>>>> + // ... = %vreg1
>>>>>>>> + // ...
>>>>>>>> + // %vreg1<def> = *%vreg2*
>>>>>>>> + // ...
>>>>>>>> + //
>>>>>>>> + // verifyLiveInterval() now rejects this code since it sees
>>>>>>>> these two def
>>>>>>>> + // live ranges as being separate components. To get rid of this
>>>>>>>> + // restriction on forwarding, regalloc greedy would need to be
>>>>>>>> fixed to
>>>>>>>> + // avoid generating code like the first snippet above, as well
>>>>>>>> as the
>>>>>>>> + // verifier being fixed to reject such code.
>>>>>>>> + if (llvm::any_of(MI.defs(), [UseReg](MachineOperand &Def) {
>>>>>>>> + if (!Def.isReg() || !Def.isDef() || Def.getReg() !=
>>>>>>>> UseReg)
>>>>>>>> + return false;
>>>>>>>> + // Only tied use/def operands or a subregister def without
>>>>>>>> the undef
>>>>>>>> + // flag should result in connected liveranges.
>>>>>>>> + if (Def.isTied())
>>>>>>>> + return false;
>>>>>>>> + if (Def.getSubReg() != 0 && Def.isUndef())
>>>>>>>> + return false;
>>>>>>>> + return true;
>>>>>>>> + }))
>>>>>>>> + continue;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + UseReg = VRM->getPhys(UseReg);
>>>>>>>> +
>>>>>>>> + // Don't forward COPYs via non-allocatable regs since they can
>>>>>>>> have
>>>>>>>> + // non-standard semantics.
>>>>>>>> + if (!MRI->isAllocatable(UseReg))
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + auto CI = AvailCopyMap.find(UseReg);
>>>>>>>> + if (CI == AvailCopyMap.end())
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + MachineInstr &Copy = *CI->second;
>>>>>>>> + MachineOperand &CopyDst = Copy.getOperand(0);
>>>>>>>> + MachineOperand &CopySrc = Copy.getOperand(1);
>>>>>>>> +
>>>>>>>> + // Don't forward COPYs that are already NOPs due to register
>>>>>>>> assignment.
>>>>>>>> + if (getPhysReg(CopyDst) == getPhysReg(CopySrc))
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + // FIXME: Don't handle partial uses of wider COPYs yet.
>>>>>>>> + if (CopyDst.getSubReg() != 0 || UseReg != getPhysReg(CopyDst))
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + // Don't forward COPYs of non-allocatable regs unless they are
>>>>>>>> constant.
>>>>>>>> + unsigned CopySrcReg = CopySrc.getReg();
>>>>>>>> + if (TargetRegisterInfo::isPhysicalRegister(CopySrcReg) &&
>>>>>>>> + !MRI->isAllocatable(CopySrcReg) &&
>>>>>>>> !MRI->isConstantPhysReg(CopySrcReg))
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + if (!isForwardableRegClassCopy(Copy, MI))
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + unsigned NewUseReg, NewUseSubReg;
>>>>>>>> + bool SubRegOK;
>>>>>>>> + std::tie(NewUseReg, NewUseSubReg, SubRegOK) =
>>>>>>>> + checkUseSubReg(CopySrc, MOUse);
>>>>>>>> + if (!SubRegOK)
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + if (hasImplicitOverlap(MI, MOUse))
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + if (!DebugCounter::shouldExecute(FwdCounter))
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + DEBUG(dbgs() << "MCP: Replacing "
>>>>>>>> + << PrintReg(MOUse.getReg(), TRI, MOUse.getSubReg())
>>>>>>>> + << "\n with "
>>>>>>>> + << PrintReg(NewUseReg, TRI, CopySrc.getSubReg())
>>>>>>>> + << "\n in "
>>>>>>>> + << MI
>>>>>>>> + << " from "
>>>>>>>> + << Copy);
>>>>>>>> +
>>>>>>>> + narrowRegClass(MI, MOUse, NewUseReg, NewUseSubReg);
>>>>>>>> +
>>>>>>>> + unsigned OrigUseReg = MOUse.getReg();
>>>>>>>> + MOUse.setReg(NewUseReg);
>>>>>>>> + MOUse.setSubReg(NewUseSubReg);
>>>>>>>> +
>>>>>>>> + DEBUG(dbgs() << "MCP: After replacement: " << MI << "\n");
>>>>>>>> +
>>>>>>>> + if (PreRegRewrite)
>>>>>>>> + updateForwardedCopyLiveInterval(Copy, MI,
>>>>>>>> MOUse.isEarlyClobber(),
>>>>>>>> + OrigUseReg, NewUseReg,
>>>>>>>> NewUseSubReg);
>>>>>>>> + else
>>>>>>>> + for (MachineInstr &KMI :
>>>>>>>> + make_range(Copy.getIterator(),
>>>>>>>> std::next(MI.getIterator())))
>>>>>>>> + KMI.clearRegisterKills(NewUseReg, TRI);
>>>>>>>> +
>>>>>>>> + ++NumCopyForwards;
>>>>>>>> + Changed = true;
>>>>>>>> + }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> void MachineCopyPropagation::CopyPropagateBlock(MachineBasicBlock &MBB)
>>>>>>>> {
>>>>>>>> DEBUG(dbgs() << "MCP: CopyPropagateBlock " << MBB.getName() << "\n");
>>>>>>>>
>>>>>>>> @@ -215,12 +781,8 @@ void MachineCopyPropagation::CopyPropaga
>>>>>>>> ++I;
>>>>>>>>
>>>>>>>> if (MI->isCopy()) {
>>>>>>>> - unsigned Def = MI->getOperand(0).getReg();
>>>>>>>> - unsigned Src = MI->getOperand(1).getReg();
>>>>>>>> -
>>>>>>>> - assert(!TargetRegisterInfo::isVirtualRegister(Def) &&
>>>>>>>> - !TargetRegisterInfo::isVirtualRegister(Src) &&
>>>>>>>> - "MachineCopyPropagation should be run after register
>>>>>>>> allocation!");
>>>>>>>> + unsigned Def = getPhysReg(MI->getOperand(0));
>>>>>>>> + unsigned Src = getPhysReg(MI->getOperand(1));
>>>>>>>>
>>>>>>>> // The two copies cancel out and the source of the first copy
>>>>>>>> // hasn't been overridden, eliminate the second one. e.g.
>>>>>>>> @@ -237,8 +799,16 @@ void MachineCopyPropagation::CopyPropaga
>>>>>>>> // %ECX<def> = COPY %EAX
>>>>>>>> // =>
>>>>>>>> // %ECX<def> = COPY %EAX
>>>>>>>> - if (eraseIfRedundant(*MI, Def, Src) || eraseIfRedundant(*MI,
>>>>>>>> Src, Def))
>>>>>>>> - continue;
>>>>>>>> + if (!PreRegRewrite)
>>>>>>>> + if (eraseIfRedundant(*MI, Def, Src) || eraseIfRedundant(*MI,
>>>>>>>> Src, Def))
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + forwardUses(*MI);
>>>>>>>> +
>>>>>>>> + // Src may have been changed by forwardUses()
>>>>>>>> + Src = getPhysReg(MI->getOperand(1));
>>>>>>>> + unsigned DefClobber = getFullPhysReg(MI->getOperand(0));
>>>>>>>> + unsigned SrcClobber = getFullPhysReg(MI->getOperand(1));
>>>>>>>>
>>>>>>>> // If Src is defined by a previous copy, the previous copy cannot
>>>>>>>> be
>>>>>>>> // eliminated.
>>>>>>>> @@ -255,7 +825,10 @@ void MachineCopyPropagation::CopyPropaga
>>>>>>>> DEBUG(dbgs() << "MCP: Copy is a deletion candidate: ";
>>>>>>>> MI->dump());
>>>>>>>>
>>>>>>>> // Copy is now a candidate for deletion.
>>>>>>>> - if (!MRI->isReserved(Def))
>>>>>>>> + // Only look for dead COPYs if we're not running just before
>>>>>>>> + // VirtRegRewriter, since presumably these COPYs will have
>>>>>>>> already been
>>>>>>>> + // removed.
>>>>>>>> + if (!PreRegRewrite && !MRI->isReserved(Def))
>>>>>>>> MaybeDeadCopies.insert(MI);
>>>>>>>>
>>>>>>>> // If 'Def' is previously source of another copy, then this
>>>>>>>> earlier copy's
>>>>>>>> @@ -265,11 +838,11 @@ void MachineCopyPropagation::CopyPropaga
>>>>>>>> // %xmm2<def> = copy %xmm0
>>>>>>>> // ...
>>>>>>>> // %xmm2<def> = copy %xmm9
>>>>>>>> - ClobberRegister(Def);
>>>>>>>> + ClobberRegister(DefClobber);
>>>>>>>> for (const MachineOperand &MO : MI->implicit_operands()) {
>>>>>>>> if (!MO.isReg() || !MO.isDef())
>>>>>>>> continue;
>>>>>>>> - unsigned Reg = MO.getReg();
>>>>>>>> + unsigned Reg = getFullPhysReg(MO);
>>>>>>>> if (!Reg)
>>>>>>>> continue;
>>>>>>>> ClobberRegister(Reg);
>>>>>>>> @@ -284,13 +857,27 @@ void MachineCopyPropagation::CopyPropaga
>>>>>>>>
>>>>>>>> // Remember source that's copied to Def. Once it's clobbered,
>>>>>>>> then
>>>>>>>> // it's no longer available for copy propagation.
>>>>>>>> - RegList &DestList = SrcMap[Src];
>>>>>>>> - if (!is_contained(DestList, Def))
>>>>>>>> - DestList.push_back(Def);
>>>>>>>> + RegList &DestList = SrcMap[SrcClobber];
>>>>>>>> + if (!is_contained(DestList, DefClobber))
>>>>>>>> + DestList.push_back(DefClobber);
>>>>>>>>
>>>>>>>> continue;
>>>>>>>> }
>>>>>>>>
>>>>>>>> + // Clobber any earlyclobber regs first.
>>>>>>>> + for (const MachineOperand &MO : MI->operands())
>>>>>>>> + if (MO.isReg() && MO.isEarlyClobber()) {
>>>>>>>> + unsigned Reg = getFullPhysReg(MO);
>>>>>>>> + // If we have a tied earlyclobber, that means it is also read
>>>>>>>> by this
>>>>>>>> + // instruction, so we need to make sure we don't remove it as
>>>>>>>> dead
>>>>>>>> + // later.
>>>>>>>> + if (MO.isTied())
>>>>>>>> + ReadRegister(Reg);
>>>>>>>> + ClobberRegister(Reg);
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + forwardUses(*MI);
>>>>>>>> +
>>>>>>>> // Not a copy.
>>>>>>>> SmallVector<unsigned, 2> Defs;
>>>>>>>> const MachineOperand *RegMask = nullptr;
>>>>>>>> @@ -299,14 +886,11 @@ void MachineCopyPropagation::CopyPropaga
>>>>>>>> RegMask = &MO;
>>>>>>>> if (!MO.isReg())
>>>>>>>> continue;
>>>>>>>> - unsigned Reg = MO.getReg();
>>>>>>>> + unsigned Reg = getFullPhysReg(MO);
>>>>>>>> if (!Reg)
>>>>>>>> continue;
>>>>>>>>
>>>>>>>> - assert(!TargetRegisterInfo::isVirtualRegister(Reg) &&
>>>>>>>> - "MachineCopyPropagation should be run after register
>>>>>>>> allocation!");
>>>>>>>> -
>>>>>>>> - if (MO.isDef()) {
>>>>>>>> + if (MO.isDef() && !MO.isEarlyClobber()) {
>>>>>>>> Defs.push_back(Reg);
>>>>>>>> continue;
>>>>>>>> } else if (MO.readsReg())
>>>>>>>> @@ -358,11 +942,22 @@ void MachineCopyPropagation::CopyPropaga
>>>>>>>> ClobberRegister(Reg);
>>>>>>>> }
>>>>>>>>
>>>>>>>> + // Remove instructions that were made dead by shrinking live ranges.
>>>>>>>> Do this
>>>>>>>> + // after iterating over instructions to avoid instructions changing
>>>>>>>> while
>>>>>>>> + // iterating.
>>>>>>>> + if (!ShrunkDeadInsts.empty()) {
>>>>>>>> + SmallVector<unsigned, 8> NewRegs;
>>>>>>>> + LiveRangeEdit(nullptr, NewRegs, *MF, *LIS, nullptr, this)
>>>>>>>> + .eliminateDeadDefs(ShrunkDeadInsts);
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> // If MBB doesn't have successors, delete the copies whose defs are
>>>>>>>> not used.
>>>>>>>> // If MBB does have successors, then conservative assume the defs are
>>>>>>>> live-out
>>>>>>>> // since we don't want to trust live-in lists.
>>>>>>>> if (MBB.succ_empty()) {
>>>>>>>> for (MachineInstr *MaybeDead : MaybeDeadCopies) {
>>>>>>>> + DEBUG(dbgs() << "MCP: Removing copy due to no live-out succ: ";
>>>>>>>> + MaybeDead->dump());
>>>>>>>> assert(!MRI->isReserved(MaybeDead->getOperand(0).getReg()));
>>>>>>>> MaybeDead->eraseFromParent();
>>>>>>>> Changed = true;
>>>>>>>> @@ -374,6 +969,7 @@ void MachineCopyPropagation::CopyPropaga
>>>>>>>> AvailCopyMap.clear();
>>>>>>>> CopyMap.clear();
>>>>>>>> SrcMap.clear();
>>>>>>>> + ShrunkDeadInsts.clear();
>>>>>>>> }
>>>>>>>>
>>>>>>>> bool MachineCopyPropagation::runOnMachineFunction(MachineFunction &MF)
>>>>>>>> {
>>>>>>>> @@ -385,6 +981,13 @@ bool MachineCopyPropagation::runOnMachin
>>>>>>>> TRI = MF.getSubtarget().getRegisterInfo();
>>>>>>>> TII = MF.getSubtarget().getInstrInfo();
>>>>>>>> MRI = &MF.getRegInfo();
>>>>>>>> + this->MF = &MF;
>>>>>>>> + if (PreRegRewrite) {
>>>>>>>> + Indexes = &getAnalysis<SlotIndexes>();
>>>>>>>> + LIS = &getAnalysis<LiveIntervals>();
>>>>>>>> + VRM = &getAnalysis<VirtRegMap>();
>>>>>>>> + }
>>>>>>>> + NoSubRegLiveness = !MRI->subRegLivenessEnabled();
>>>>>>>>
>>>>>>>> for (MachineBasicBlock &MBB : MF)
>>>>>>>> CopyPropagateBlock(MBB);
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/lib/CodeGen/TargetPassConfig.cpp
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetPassConfig.cpp?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/lib/CodeGen/TargetPassConfig.cpp (original)
>>>>>>>> +++ llvm/trunk/lib/CodeGen/TargetPassConfig.cpp Mon Oct 2 15:01:37
>>>>>>>> 2017
>>>>>>>> @@ -88,6 +88,8 @@ static cl::opt<bool> DisableCGP("disable
>>>>>>>> cl::desc("Disable Codegen Prepare"));
>>>>>>>> static cl::opt<bool> DisableCopyProp("disable-copyprop", cl::Hidden,
>>>>>>>> cl::desc("Disable Copy Propagation pass"));
>>>>>>>> +static cl::opt<bool>
>>>>>>>> DisableCopyPropPreRegRewrite("disable-copyprop-prerewrite", cl::Hidden,
>>>>>>>> + cl::desc("Disable Copy Propagation Pre-Register Re-write pass"));
>>>>>>>> static cl::opt<bool>
>>>>>>>> DisablePartialLibcallInlining("disable-partial-libcall-inlining",
>>>>>>>> cl::Hidden, cl::desc("Disable Partial Libcall Inlining"));
>>>>>>>> static cl::opt<bool> EnableImplicitNullChecks(
>>>>>>>> @@ -252,6 +254,9 @@ static IdentifyingPassPtr overridePass(A
>>>>>>>> if (StandardID == &MachineCopyPropagationID)
>>>>>>>> return applyDisable(TargetID, DisableCopyProp);
>>>>>>>>
>>>>>>>> + if (StandardID == &MachineCopyPropagationPreRegRewriteID)
>>>>>>>> + return applyDisable(TargetID, DisableCopyPropPreRegRewrite);
>>>>>>>> +
>>>>>>>> return TargetID;
>>>>>>>> }
>>>>>>>>
>>>>>>>> @@ -1064,6 +1069,10 @@ void TargetPassConfig::addOptimizedRegAl
>>>>>>>> // Allow targets to change the register assignments before
>>>>>>>> rewriting.
>>>>>>>> addPreRewrite();
>>>>>>>>
>>>>>>>> + // Copy propagate to forward register uses and try to eliminate
>>>>>>>> COPYs that
>>>>>>>> + // were not coalesced.
>>>>>>>> + addPass(&MachineCopyPropagationPreRegRewriteID);
>>>>>>>> +
>>>>>>>> // Finally rewrite virtual registers.
>>>>>>>> addPass(&VirtRegRewriterID);
>>>>>>>>
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll Mon Oct 2
>>>>>>>> 15:01:37 2017
>>>>>>>> @@ -9,7 +9,8 @@ define i16 @halfword(%struct.a* %ctx, i3
>>>>>>>> ; CHECK-LABEL: halfword:
>>>>>>>> ; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8
>>>>>>>> ; CHECK: ldrh [[REG1:w[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]], lsl
>>>>>>>> #1]
>>>>>>>> -; CHECK: strh [[REG1]], [{{.*}}[[REG2]], [[REG]], lsl #1]
>>>>>>>> +; CHECK: mov [[REG3:x[0-9]+]], [[REG2]]
>>>>>>>> +; CHECK: strh [[REG1]], [{{.*}}[[REG3]], [[REG]], lsl #1]
>>>>>>>> %shr81 = lshr i32 %xor72, 9
>>>>>>>> %conv82 = zext i32 %shr81 to i64
>>>>>>>> %idxprom83 = and i64 %conv82, 255
>>>>>>>> @@ -24,7 +25,8 @@ define i32 @word(%struct.b* %ctx, i32 %x
>>>>>>>> ; CHECK-LABEL: word:
>>>>>>>> ; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8
>>>>>>>> ; CHECK: ldr [[REG1:w[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]], lsl
>>>>>>>> #2]
>>>>>>>> -; CHECK: str [[REG1]], [{{.*}}[[REG2]], [[REG]], lsl #2]
>>>>>>>> +; CHECK: mov [[REG3:x[0-9]+]], [[REG2]]
>>>>>>>> +; CHECK: str [[REG1]], [{{.*}}[[REG3]], [[REG]], lsl #2]
>>>>>>>> %shr81 = lshr i32 %xor72, 9
>>>>>>>> %conv82 = zext i32 %shr81 to i64
>>>>>>>> %idxprom83 = and i64 %conv82, 255
>>>>>>>> @@ -39,7 +41,8 @@ define i64 @doubleword(%struct.c* %ctx,
>>>>>>>> ; CHECK-LABEL: doubleword:
>>>>>>>> ; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8
>>>>>>>> ; CHECK: ldr [[REG1:x[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]], lsl
>>>>>>>> #3]
>>>>>>>> -; CHECK: str [[REG1]], [{{.*}}[[REG2]], [[REG]], lsl #3]
>>>>>>>> +; CHECK: mov [[REG3:x[0-9]+]], [[REG2]]
>>>>>>>> +; CHECK: str [[REG1]], [{{.*}}[[REG3]], [[REG]], lsl #3]
>>>>>>>> %shr81 = lshr i32 %xor72, 9
>>>>>>>> %conv82 = zext i32 %shr81 to i64
>>>>>>>> %idxprom83 = and i64 %conv82, 255
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll Mon Oct 2
>>>>>>>> 15:01:37 2017
>>>>>>>> @@ -8,15 +8,9 @@ define <2 x i64> @bar(<2 x i64> %a, <2 x
>>>>>>>> ; CHECK: add.2dv[[REG:[0-9]+]], v0, v1
>>>>>>>> ; CHECK: addd[[REG3:[0-9]+]], d[[REG]], d1
>>>>>>>> ; CHECK: subd[[REG2:[0-9]+]], d[[REG]], d1
>>>>>>>> -; Without advanced copy optimization, we end up with cross register
>>>>>>>> -; banks copies that cannot be coalesced.
>>>>>>>> -; CHECK-NOOPT: fmov [[COPY_REG3:x[0-9]+]], d[[REG3]]
>>>>>>>> -; With advanced copy optimization, we end up with just one copy
>>>>>>>> -; to insert the computed high part into the V register.
>>>>>>>> -; CHECK-OPT-NOT: fmov
>>>>>>>> +; CHECK-NOT: fmov
>>>>>>>> ; CHECK: fmov [[COPY_REG2:x[0-9]+]], d[[REG2]]
>>>>>>>> -; CHECK-NOOPT: fmov d0, [[COPY_REG3]]
>>>>>>>> -; CHECK-OPT-NOT: fmov
>>>>>>>> +; CHECK-NOT: fmov
>>>>>>>> ; CHECK: ins.d v0[1], [[COPY_REG2]]
>>>>>>>> ; CHECK-NEXT: ret
>>>>>>>> ;
>>>>>>>> @@ -24,11 +18,9 @@ define <2 x i64> @bar(<2 x i64> %a, <2 x
>>>>>>>> ; GENERIC: addv[[REG:[0-9]+]].2d, v0.2d, v1.2d
>>>>>>>> ; GENERIC: addd[[REG3:[0-9]+]], d[[REG]], d1
>>>>>>>> ; GENERIC: subd[[REG2:[0-9]+]], d[[REG]], d1
>>>>>>>> -; GENERIC-NOOPT: fmov [[COPY_REG3:x[0-9]+]], d[[REG3]]
>>>>>>>> -; GENERIC-OPT-NOT: fmov
>>>>>>>> +; GENERIC-NOT: fmov
>>>>>>>> ; GENERIC: fmov [[COPY_REG2:x[0-9]+]], d[[REG2]]
>>>>>>>> -; GENERIC-NOOPT: fmov d0, [[COPY_REG3]]
>>>>>>>> -; GENERIC-OPT-NOT: fmov
>>>>>>>> +; GENERIC-NOT: fmov
>>>>>>>> ; GENERIC: ins v0.d[1], [[COPY_REG2]]
>>>>>>>> ; GENERIC-NEXT: ret
>>>>>>>> %add = add <2 x i64> %a, %b
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
>>>>>>>> (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll Mon Oct
>>>>>>>> 2 15:01:37 2017
>>>>>>>> @@ -4,8 +4,10 @@
>>>>>>>> define i32 @t(i32 %a, i32 %b, i32 %c, i32 %d) nounwind ssp {
>>>>>>>> entry:
>>>>>>>> ; CHECK-LABEL: t:
>>>>>>>> -; CHECK: mov x0, [[REG1:x[0-9]+]]
>>>>>>>> -; CHECK: mov x1, [[REG2:x[0-9]+]]
>>>>>>>> +; CHECK: mov [[REG2:x[0-9]+]], x3
>>>>>>>> +; CHECK: mov [[REG1:x[0-9]+]], x2
>>>>>>>> +; CHECK: mov x0, x2
>>>>>>>> +; CHECK: mov x1, x3
>>>>>>>> ; CHECK: bl _foo
>>>>>>>> ; CHECK: mov x0, [[REG1]]
>>>>>>>> ; CHECK: mov x1, [[REG2]]
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll Mon Oct 2
>>>>>>>> 15:01:37 2017
>>>>>>>> @@ -489,7 +489,7 @@ else:
>>>>>>>>
>>>>>>>> ; CHECK-COMMON-LABEL: test_phi:
>>>>>>>> ; CHECK-COMMON: mov x[[PTR:[0-9]+]], x0
>>>>>>>> -; CHECK-COMMON: ldr h[[AB:[0-9]+]], [x[[PTR]]]
>>>>>>>> +; CHECK-COMMON: ldr h[[AB:[0-9]+]], [x0]
>>>>>>>> ; CHECK-COMMON: [[LOOP:LBB[0-9_]+]]:
>>>>>>>> ; CHECK-COMMON: mov.16b v[[R:[0-9]+]], v[[AB]]
>>>>>>>> ; CHECK-COMMON: ldr h[[AB]], [x[[PTR]]]
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll Mon Oct 2
>>>>>>>> 15:01:37 2017
>>>>>>>> @@ -17,6 +17,9 @@ define i32 @test_multiflag(i32 %n, i32 %
>>>>>>>> %val = zext i1 %test to i32
>>>>>>>> ; CHECK: cset {{[xw][0-9]+}}, ne
>>>>>>>>
>>>>>>>> +; CHECK: mov [[RHSCOPY:w[0-9]+]], [[RHS]]
>>>>>>>> +; CHECK: mov [[LHSCOPY:w[0-9]+]], [[LHS]]
>>>>>>>> +
>>>>>>>> store i32 %val, i32* @var
>>>>>>>>
>>>>>>>> call void @bar()
>>>>>>>> @@ -25,7 +28,7 @@ define i32 @test_multiflag(i32 %n, i32 %
>>>>>>>> ; Currently, the comparison is emitted again. An MSR/MRS pair would
>>>>>>>> also be
>>>>>>>> ; acceptable, but assuming the call preserves NZCV is not.
>>>>>>>> br i1 %test, label %iftrue, label %iffalse
>>>>>>>> -; CHECK: cmp [[LHS]], [[RHS]]
>>>>>>>> +; CHECK: cmp [[LHSCOPY]], [[RHSCOPY]]
>>>>>>>> ; CHECK: b.eq
>>>>>>>>
>>>>>>>> iftrue:
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll
>>>>>>>> (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll Mon Oct
>>>>>>>> 2 15:01:37 2017
>>>>>>>> @@ -8,10 +8,9 @@
>>>>>>>> define void @test(%struct1* %fde, i32 %fd, void (i32, i32, i8*)* %func,
>>>>>>>> i8* %arg) {
>>>>>>>> ;CHECK-LABEL: test
>>>>>>>> entry:
>>>>>>>> -; A53: mov [[DATA:w[0-9]+]], w1
>>>>>>>> ; A53: str q{{[0-9]+}}, {{.*}}
>>>>>>>> ; A53: str q{{[0-9]+}}, {{.*}}
>>>>>>>> -; A53: str [[DATA]], {{.*}}
>>>>>>>> +; A53: str w1, {{.*}}
>>>>>>>>
>>>>>>>> %0 = bitcast %struct1* %fde to i8*
>>>>>>>> tail call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 40, i32 8, i1
>>>>>>>> false)
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/AArch64/neg-imm.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/neg-imm.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/AArch64/neg-imm.ll (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/AArch64/neg-imm.ll Mon Oct 2 15:01:37 2017
>>>>>>>> @@ -7,8 +7,8 @@ declare void @foo(i32)
>>>>>>>> define void @test(i32 %px) {
>>>>>>>> ; CHECK_LABEL: test:
>>>>>>>> ; CHECK_LABEL: %entry
>>>>>>>> -; CHECK: subs
>>>>>>>> -; CHECK-NEXT: csel
>>>>>>>> +; CHECK: subs [[REG0:w[0-9]+]],
>>>>>>>> +; CHECK: csel {{w[0-9]+}}, wzr, [[REG0]]
>>>>>>>> entry:
>>>>>>>> %sub = add nsw i32 %px, -1
>>>>>>>> %cmp = icmp slt i32 %px, 1
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll
>>>>>>>> (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll Mon
>>>>>>>> Oct 2 15:01:37 2017
>>>>>>>> @@ -547,16 +547,16 @@ define void @func_use_every_sgpr_input_c
>>>>>>>> ; GCN: s_mov_b32 s5, s32
>>>>>>>> ; GCN: s_add_u32 s32, s32, 0x300
>>>>>>>>
>>>>>>>> -; GCN-DAG: s_mov_b32 [[SAVE_X:s[0-9]+]], s14
>>>>>>>> -; GCN-DAG: s_mov_b32 [[SAVE_Y:s[0-9]+]], s15
>>>>>>>> -; GCN-DAG: s_mov_b32 [[SAVE_Z:s[0-9]+]], s16
>>>>>>>> +; GCN-DAG: s_mov_b32 [[SAVE_X:s[0-57-9][0-9]*]], s14
>>>>>>>> +; GCN-DAG: s_mov_b32 [[SAVE_Y:s[0-68-9][0-9]*]], s15
>>>>>>>> +; GCN-DAG: s_mov_b32 [[SAVE_Z:s[0-79][0-9]*]], s16
>>>>>>>> ; GCN-DAG: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, s[6:7]
>>>>>>>> ; GCN-DAG: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, s[8:9]
>>>>>>>> ; GCN-DAG: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, s[10:11]
>>>>>>>>
>>>>>>>> -; GCN-DAG: s_mov_b32 s6, [[SAVE_X]]
>>>>>>>> -; GCN-DAG: s_mov_b32 s7, [[SAVE_Y]]
>>>>>>>> -; GCN-DAG: s_mov_b32 s8, [[SAVE_Z]]
>>>>>>>> +; GCN-DAG: s_mov_b32 s6, s14
>>>>>>>> +; GCN-DAG: s_mov_b32 s7, s15
>>>>>>>> +; GCN-DAG: s_mov_b32 s8, s16
>>>>>>>> ; GCN: s_swappc_b64
>>>>>>>>
>>>>>>>> ; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s5 offset:4
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/AMDGPU/mad-mix.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/mad-mix.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/AMDGPU/mad-mix.ll (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/AMDGPU/mad-mix.ll Mon Oct 2 15:01:37 2017
>>>>>>>> @@ -51,7 +51,7 @@ define float @v_mad_mix_f32_f16hi_f16hi_
>>>>>>>>
>>>>>>>> ; GCN-LABEL: {{^}}v_mad_mix_v2f32:
>>>>>>>> ; GFX9: v_mov_b32_e32 v3, v1
>>>>>>>> -; GFX9-NEXT: v_mad_mix_f32 v1, v0, v3, v2 op_sel:[1,1,1]
>>>>>>>> +; GFX9-NEXT: v_mad_mix_f32 v1, v0, v1, v2 op_sel:[1,1,1]
>>>>>>>> ; GFX9-NEXT: v_mad_mix_f32 v0, v0, v3, v2
>>>>>>>>
>>>>>>>> ; CIVI: v_mac_f32
>>>>>>>> @@ -66,7 +66,7 @@ define <2 x float> @v_mad_mix_v2f32(<2 x
>>>>>>>> ; GCN-LABEL: {{^}}v_mad_mix_v2f32_shuffle:
>>>>>>>> ; GCN: s_waitcnt
>>>>>>>> ; GFX9-NEXT: v_mov_b32_e32 v3, v1
>>>>>>>> -; GFX9-NEXT: v_mad_mix_f32 v1, v0, v3, v2 op_sel:[0,1,1]
>>>>>>>> +; GFX9-NEXT: v_mad_mix_f32 v1, v0, v1, v2 op_sel:[0,1,1]
>>>>>>>> ; GFX9-NEXT: v_mad_mix_f32 v0, v0, v3, v2 op_sel:[1,0,1]
>>>>>>>> ; GFX9-NEXT: s_setpc_b64
>>>>>>>>
>>>>>>>> @@ -246,7 +246,7 @@ define float @v_mad_mix_f32_f16lo_f16lo_
>>>>>>>> ; GCN-LABEL: {{^}}v_mad_mix_v2f32_f32imm1:
>>>>>>>> ; GFX9: v_mov_b32_e32 v2, v1
>>>>>>>> ; GFX9: v_mov_b32_e32 v3, 1.0
>>>>>>>> -; GFX9: v_mad_mix_f32 v1, v0, v2, v3 op_sel:[1,1,0] op_sel_hi:[1,1,0]
>>>>>>>> ; encoding
>>>>>>>> +; GFX9: v_mad_mix_f32 v1, v0, v1, v3 op_sel:[1,1,0] op_sel_hi:[1,1,0]
>>>>>>>> ; encoding
>>>>>>>> ; GFX9: v_mad_mix_f32 v0, v0, v2, v3 op_sel_hi:[1,1,0] ; encoding
>>>>>>>> define <2 x float> @v_mad_mix_v2f32_f32imm1(<2 x half> %src0, <2 x
>>>>>>>> half> %src1) #0 {
>>>>>>>> %src0.ext = fpext <2 x half> %src0 to <2 x float>
>>>>>>>> @@ -258,7 +258,7 @@ define <2 x float> @v_mad_mix_v2f32_f32i
>>>>>>>> ; GCN-LABEL: {{^}}v_mad_mix_v2f32_cvtf16imminv2pi:
>>>>>>>> ; GFX9: v_mov_b32_e32 v2, v1
>>>>>>>> ; GFX9: v_mov_b32_e32 v3, 0x3e230000
>>>>>>>> -; GFX9: v_mad_mix_f32 v1, v0, v2, v3 op_sel:[1,1,0] op_sel_hi:[1,1,0]
>>>>>>>> ; encoding
>>>>>>>> +; GFX9: v_mad_mix_f32 v1, v0, v1, v3 op_sel:[1,1,0] op_sel_hi:[1,1,0]
>>>>>>>> ; encoding
>>>>>>>> ; GFX9: v_mad_mix_f32 v0, v0, v2, v3 op_sel_hi:[1,1,0] ; encoding
>>>>>>>> define <2 x float> @v_mad_mix_v2f32_cvtf16imminv2pi(<2 x half> %src0,
>>>>>>>> <2 x half> %src1) #0 {
>>>>>>>> %src0.ext = fpext <2 x half> %src0 to <2 x float>
>>>>>>>> @@ -271,7 +271,7 @@ define <2 x float> @v_mad_mix_v2f32_cvtf
>>>>>>>> ; GCN-LABEL: {{^}}v_mad_mix_v2f32_f32imminv2pi:
>>>>>>>> ; GFX9: v_mov_b32_e32 v2, v1
>>>>>>>> ; GFX9: v_mov_b32_e32 v3, 0.15915494
>>>>>>>> -; GFX9: v_mad_mix_f32 v1, v0, v2, v3 op_sel:[1,1,0] op_sel_hi:[1,1,0]
>>>>>>>> ; encoding
>>>>>>>> +; GFX9: v_mad_mix_f32 v1, v0, v1, v3 op_sel:[1,1,0] op_sel_hi:[1,1,0]
>>>>>>>> ; encoding
>>>>>>>> ; GFX9: v_mad_mix_f32 v0, v0, v2, v3 op_sel_hi:[1,1,0] ; encoding
>>>>>>>> define <2 x float> @v_mad_mix_v2f32_f32imminv2pi(<2 x half> %src0, <2 x
>>>>>>>> half> %src1) #0 {
>>>>>>>> %src0.ext = fpext <2 x half> %src0 to <2 x float>
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/AMDGPU/ret.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/ret.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/AMDGPU/ret.ll (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/AMDGPU/ret.ll Mon Oct 2 15:01:37 2017
>>>>>>>> @@ -2,10 +2,10 @@
>>>>>>>> ; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s |
>>>>>>>> FileCheck -check-prefix=GCN %s
>>>>>>>>
>>>>>>>> ; GCN-LABEL: {{^}}vgpr:
>>>>>>>> -; GCN: v_mov_b32_e32 v1, v0
>>>>>>>> -; GCN-DAG: v_add_f32_e32 v0, 1.0, v1
>>>>>>>> -; GCN-DAG: exp mrt0 v1, v1, v1, v1 done vm
>>>>>>>> +; GCN-DAG: v_mov_b32_e32 v1, v0
>>>>>>>> +; GCN-DAG: exp mrt0 v0, v0, v0, v0 done vm
>>>>>>>> ; GCN: s_waitcnt expcnt(0)
>>>>>>>> +; GCN: v_add_f32_e32 v0, 1.0, v0
>>>>>>>> ; GCN-NOT: s_endpgm
>>>>>>>> define amdgpu_vs { float, float } @vgpr([9 x <16 x i8>] addrspace(2)*
>>>>>>>> byval %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0 {
>>>>>>>> bb:
>>>>>>>> @@ -204,13 +204,13 @@ bb:
>>>>>>>> }
>>>>>>>>
>>>>>>>> ; GCN-LABEL: {{^}}both:
>>>>>>>> -; GCN: v_mov_b32_e32 v1, v0
>>>>>>>> -; GCN-DAG: exp mrt0 v1, v1, v1, v1 done vm
>>>>>>>> -; GCN-DAG: v_add_f32_e32 v0, 1.0, v1
>>>>>>>> -; GCN-DAG: s_add_i32 s0, s3, 2
>>>>>>>> +; GCN-DAG: exp mrt0 v0, v0, v0, v0 done vm
>>>>>>>> +; GCN-DAG: v_mov_b32_e32 v1, v0
>>>>>>>> ; GCN-DAG: s_mov_b32 s1, s2
>>>>>>>> -; GCN: s_mov_b32 s2, s3
>>>>>>>> ; GCN: s_waitcnt expcnt(0)
>>>>>>>> +; GCN: v_add_f32_e32 v0, 1.0, v0
>>>>>>>> +; GCN-DAG: s_add_i32 s0, s3, 2
>>>>>>>> +; GCN-DAG: s_mov_b32 s2, s3
>>>>>>>> ; GCN-NOT: s_endpgm
>>>>>>>> define amdgpu_vs { float, i32, float, i32, i32 } @both([9 x <16 x i8>]
>>>>>>>> addrspace(2)* byval %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0
>>>>>>>> {
>>>>>>>> bb:
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/ARM/atomic-op.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/atomic-op.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/ARM/atomic-op.ll (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/ARM/atomic-op.ll Mon Oct 2 15:01:37 2017
>>>>>>>> @@ -287,7 +287,8 @@ define i32 @test_cmpxchg_fail_order(i32
>>>>>>>>
>>>>>>>> %pair = cmpxchg i32* %addr, i32 %desired, i32 %new seq_cst monotonic
>>>>>>>> %oldval = extractvalue { i32, i1 } %pair, 0
>>>>>>>> -; CHECK-ARMV7: ldrex [[OLDVAL:r[0-9]+]], [r[[ADDR:[0-9]+]]]
>>>>>>>> +; CHECK-ARMV7: mov r[[ADDR:[0-9]+]], r0
>>>>>>>> +; CHECK-ARMV7: ldrex [[OLDVAL:r[0-9]+]], [r0]
>>>>>>>> ; CHECK-ARMV7: cmp [[OLDVAL]], r1
>>>>>>>> ; CHECK-ARMV7: bne [[FAIL_BB:\.?LBB[0-9]+_[0-9]+]]
>>>>>>>> ; CHECK-ARMV7: dmb ish
>>>>>>>> @@ -305,7 +306,8 @@ define i32 @test_cmpxchg_fail_order(i32
>>>>>>>> ; CHECK-ARMV7: dmb ish
>>>>>>>> ; CHECK-ARMV7: bx lr
>>>>>>>>
>>>>>>>> -; CHECK-T2: ldrex [[OLDVAL:r[0-9]+]], [r[[ADDR:[0-9]+]]]
>>>>>>>> +; CHECK-T2: mov r[[ADDR:[0-9]+]], r0
>>>>>>>> +; CHECK-T2: ldrex [[OLDVAL:r[0-9]+]], [r0]
>>>>>>>> ; CHECK-T2: cmp [[OLDVAL]], r1
>>>>>>>> ; CHECK-T2: bne [[FAIL_BB:\.?LBB.*]]
>>>>>>>> ; CHECK-T2: dmb ish
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/ARM/intrinsics-overflow.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/intrinsics-overflow.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/ARM/intrinsics-overflow.ll (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/ARM/intrinsics-overflow.ll Mon Oct 2
>>>>>>>> 15:01:37 2017
>>>>>>>> @@ -39,7 +39,7 @@ define i32 @sadd_overflow(i32 %a, i32 %b
>>>>>>>> ; ARM: movvc r[[R1]], #0
>>>>>>>>
>>>>>>>> ; THUMBV6: mov r[[R2:[0-9]+]], r[[R0:[0-9]+]]
>>>>>>>> - ; THUMBV6: adds r[[R3:[0-9]+]], r[[R2]], r[[R1:[0-9]+]]
>>>>>>>> + ; THUMBV6: adds r[[R3:[0-9]+]], r[[R0]], r[[R1:[0-9]+]]
>>>>>>>> ; THUMBV6: movs r[[R0]], #0
>>>>>>>> ; THUMBV6: movs r[[R1]], #1
>>>>>>>> ; THUMBV6: cmp r[[R3]], r[[R2]]
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/ARM/swifterror.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/swifterror.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/ARM/swifterror.ll (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/ARM/swifterror.ll Mon Oct 2 15:01:37 2017
>>>>>>>> @@ -182,7 +182,7 @@ define float @foo_loop(%swift_error** sw
>>>>>>>> ; CHECK-APPLE: beq
>>>>>>>> ; CHECK-APPLE: mov r0, #16
>>>>>>>> ; CHECK-APPLE: malloc
>>>>>>>> -; CHECK-APPLE: strb r{{.*}}, [{{.*}}[[ID]], #8]
>>>>>>>> +; CHECK-APPLE: strb r{{.*}}, [r0, #8]
>>>>>>>> ; CHECK-APPLE: ble
>>>>>>>> ; CHECK-APPLE: mov r8, [[ID]]
>>>>>>>>
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/Mips/llvm-ir/sub.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Mips/llvm-ir/sub.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/Mips/llvm-ir/sub.ll (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/Mips/llvm-ir/sub.ll Mon Oct 2 15:01:37
>>>>>>>> 2017
>>>>>>>> @@ -165,7 +165,7 @@ entry:
>>>>>>>> ; MMR3: subu16 $5, $[[T19]], $[[T20]]
>>>>>>>>
>>>>>>>> ; MMR6: move $[[T0:[0-9]+]], $7
>>>>>>>> -; MMR6: sw $[[T0]], 8($sp)
>>>>>>>> +; MMR6: sw $7, 8($sp)
>>>>>>>> ; MMR6: move $[[T1:[0-9]+]], $5
>>>>>>>> ; MMR6: sw $4, 12($sp)
>>>>>>>> ; MMR6: lw $[[T2:[0-9]+]], 48($sp)
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll?rev=314729&r1=314728&r2=314729&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll (original)
>>>>>>>> +++ llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll Mon Oct 2 15:01:37
>>>>>>>> 2017
>>>>>>>> @@ -14,7 +14,8 @@ define double @foo3(double %a) nounwind
>>>>>>>> ret double %r
>>>>>>>>
>>>>>>>> ; CHECK: @foo3
>>>>>>>> -; CHECK: xsnmsubadp [[REG:[0-9]+]], {{[0-9]+}}, [[REG]]
>>>>>>>> +; CHECK: fmr [[REG:[0-9]+]], [[REG2:[0-9]+]]
>>>>>>>> +; CHECK: xsnmsubadp [[REG]], {{[0-9]+}}, [[REG2]]
>>>>>>>> ; CHECK: xsmaddmdp
>>>>>>>> ; CHECK: xsmaddadp
>>>>>>>> }
>>>>>>>>
>>>>>>>> Modified: llvm/trunk/test/CodeGen/PowerPC/gpr-vsr-spill.ll
>>>>>>>>
>>>>>>>> URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/gpr-vsr-spill.ll?rev=314729&r1=314728&r2=314729&view=diff
>>
>> --
>> Geoff Berry
>> Employee of Qualcomm Datacenter Technologies, Inc.
>> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
More information about the llvm-commits
mailing list