[PATCH] D41835: [MachineCopyPropagation] Extend pass to do COPY source forwarding

Quentin Colombet via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 2 13:12:59 PST 2018


I am fine with whatever global switch we want as long as the default is conservatively correct :).

> On Feb 2, 2018, at 1:10 PM, gberry at codeaurora.org wrote:
> 
> In the case of AMDGPU there were just one or two opcode base classes in the td files that needed to have the hasExtraSrcRegAllocReq attribute added, so it wasn’t a big deal.
> I don’t think flipping the default value of this opcode property makes sense, as that would involve changing all of the opcodes (or their base classes) in all of the in-tree targets (other than AMDGPU).
> How about a target property that says “all opcodes are hasExtraSrcRegAllocReq”? 
>  
> -- 
> Geoff Berry
> Employee of Qualcomm Datacenter Technologies, Inc.
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
>  
> From: qcolombet at apple.com [mailto:qcolombet at apple.com] 
> Sent: Friday, February 2, 2018 3:55 PM
> To: Geoff Berry <gberry at codeaurora.org>
> Cc: reviews+D41835+public+9c1dec7fb6e75ce0 at reviews.llvm.org; Geoff Berry via Phabricator <reviews at reviews.llvm.org>; javed.absar at arm.com; Matthias Braun <matze at braunis.de>; Jonas Paulsson <paulsson at linux.vnet.ibm.com>; tstellar at redhat.com; Matt Arsenault <Matthew.Arsenault at amd.com>; junbuml at codeaurora.org; marina.yatsina at intel.com; wei.ding2 at amd.com; kannan.narayanan at amd.com; nhaehnle at gmail.com; Nemanja Ivanovic <nemanja.i.ibm at gmail.com>; llvm-commits <llvm-commits at lists.llvm.org>; tpr.llvm at botech.co.uk; escha at apple.com
> Subject: Re: [PATCH] D41835: [MachineCopyPropagation] Extend pass to do COPY source forwarding
>  
>  
> 
> 
>> On Feb 2, 2018, at 12:21 PM, escha at apple.com <mailto:escha at apple.com> wrote:
>>  
>> I mean, in that case we are likely to have to mark every single opcode (all 12,000 or so) with this requirement. At that point we might as well just opt out of the pass, I think? At least, it feels like a gross hack that papers over the fact that LLVM has changed how register classes work such that our entire approach is no longer valid.
>>  
>> Also, it seems very weird to make this constraint-violating behavior *opt-out*. Maybe it should be opt-in, i.e. put doesNotHaveExtraSrcRegAllocReq on all instructions it’s okay for?
>  
> That sounds like a better approach to me.
> After talking with escha, I agree that TableGen is not necessarily expressive enough to model all the constraints that need to be met and I would err on the safe side of being opt-in instead of opt-out.
> 
> 
>>  
>> —escha
>> 
>> 
>>> On Feb 2, 2018, at 12:17 PM, gberry at codeaurora.org <mailto:gberry at codeaurora.org> wrote:
>>>  
>>> Hi escha, 
>>>  
>>> This sounds very similar to the issues this change caused on AMDGPU.  The way we worked around them for those effected targets was to mark the relevant opcodes with hasExtraSrcRegAllocReq, which will prevent the MachineOperands from being marked as isRenamable, which will in turn prevent MachineCopyPropagation from changing them.  Is this a reasonable fix for your targets?
>>>  
>>> -- 
>>> Geoff Berry
>>> Employee of Qualcomm Datacenter Technologies, Inc.
>>> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
>>>  
>>> From: fglaser at apple.com <mailto:fglaser at apple.com> [mailto:fglaser at apple.com <mailto:fglaser at apple.com>] On Behalf Of escha at apple.com <mailto:escha at apple.com>
>>> Sent: Friday, February 2, 2018 2:56 PM
>>> To: reviews+D41835+public+9c1dec7fb6e75ce0 at reviews.llvm.org <mailto:reviews+D41835+public+9c1dec7fb6e75ce0 at reviews.llvm.org>; Geoff Berry via Phabricator <reviews at reviews.llvm.org <mailto:reviews at reviews.llvm.org>>
>>> Cc: gberry at codeaurora.org <mailto:gberry at codeaurora.org>; qcolombet at apple.com <mailto:qcolombet at apple.com>; javed.absar at arm.com <mailto:javed.absar at arm.com>; matze at braunis.de <mailto:matze at braunis.de>; paulsson at linux.vnet.ibm.com <mailto:paulsson at linux.vnet.ibm.com>; tstellar at redhat.com <mailto:tstellar at redhat.com>; Matthew.Arsenault at amd.com <mailto:Matthew.Arsenault at amd.com>; junbuml at codeaurora.org <mailto:junbuml at codeaurora.org>; marina.yatsina at intel.com <mailto:marina.yatsina at intel.com>; wei.ding2 at amd.com <mailto:wei.ding2 at amd.com>; kannan.narayanan at amd.com <mailto:kannan.narayanan at amd.com>; nhaehnle at gmail.com <mailto:nhaehnle at gmail.com>; nemanja.i.ibm at gmail.com <mailto:nemanja.i.ibm at gmail.com>; llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>; tpr.llvm at botech.co.uk <mailto:tpr.llvm at botech.co.uk>
>>> Subject: Re: [PATCH] D41835: [MachineCopyPropagation] Extend pass to do COPY source forwarding
>>>  
>>> FYI, this completely broke our (out-of-tree) backends in more ways than i can count. In particular, it seems to break the ability for us to enforce constraints on register classes at all: we heavily rely on this for a wide variety of instructions whose behavior cannot be fully expressed in tablegen, or really, anywhere that we need to say “X argument of Y instruction should be constrained to class Z”.
>>>  
>>> Here is one particular example that illustrates this (but what breaks is potentially far more than this).
>>>  
>>> Suppose you have a machine in which instructions can have at most 2 arguments in the category {immediates, special registers}. We limit immediates to 2 in instruction selection, and then run a small pass after instruction selection that identifies any instructions with more than 2 {immediates, special registers} and marks the special register arguments as “no SR”. For example:
>>>  
>>> %6:gpr32 = COPY $sr12
>>> 
>>> 
>>> 
>>> becomes
>>>  
>>> %6:gpr32nosr = COPY $sr12
>>>  
>>> reasonably, this should mean the COPY cannot be folded into the instruction that uses it, because the argument is a “gpr32nosr”, which prohibits special registers.
>>>  
>>> however, with this patch…
>>>  
>>> # *** IR Dump After Greedy Register Allocator ***:
>>> %6:gpr32nosr = COPY $sr12
>>> %12.sub2:gpr32tup3 = /* some instruction */ 0, 0, %6, 0, 4294967312, 0, 0
>>> # *** IR Dump After Virtual Register Rewriter ***:
>>> renamable $r2 = COPY $sr12
>>> renamable $r2 = /* some instruction */ 0, 0, %6, 0, 4294967312, 0, 0
>>> […]
>>> # *** IR Dump After Machine Copy Propagation Pass ***:
>>> renamable $r2 = /* some instruction */ 0, 0, $sr12, 0, 4294967312, 0, 0
>>>  
>>> in the second step the information that “%6” cannot be an SR is lost due to register allocation, and then copy propagation happily plows through and violates the constraint we set earlier.
>>>  
>>> this is the simplest example, but this entire patch makes me extremely nervous because we rely on register classes heavily in our backend and if the compiler is now free to violate our constraints, all bets are off.
>>>  
>>> —escha
>>> 
>>> 
>>> 
>>>> On Feb 1, 2018, at 10:56 AM, Geoff Berry via Phabricator via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
>>>>  
>>>> This revision was automatically updated to reflect the committed changes.
>>>> Closed by commit rL323991: [MachineCopyPropagation] Extend pass to do COPY source forwarding (authored by gberry, committed by ).
>>>> 
>>>> Changed prior to commit:
>>>>  https://reviews.llvm.org/D41835?vs=131323&id=132429#toc <https://reviews.llvm.org/D41835?vs=131323&id=132429#toc>
>>>> 
>>>> Repository:
>>>>  rL LLVM
>>>> 
>>>> https://reviews.llvm.org/D41835 <https://reviews.llvm.org/D41835>
>>>> 
>>>> Files:
>>>>  llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp
>>>>  llvm/trunk/lib/CodeGen/TargetPassConfig.cpp
>>>>  llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
>>>>  llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll
>>>>  llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
>>>>  llvm/trunk/test/CodeGen/AArch64/cmpxchg-idioms.ll
>>>>  llvm/trunk/test/CodeGen/AArch64/copyprop.mir
>>>>  llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll
>>>>  llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll
>>>>  llvm/trunk/test/CodeGen/AArch64/ldst-opt.ll
>>>>  llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll
>>>>  llvm/trunk/test/CodeGen/AArch64/neg-imm.ll
>>>>  llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll
>>>>  llvm/trunk/test/CodeGen/AMDGPU/fix-vgpr-copies.mir
>>>>  llvm/trunk/test/CodeGen/AMDGPU/multilevel-break.ll
>>>>  llvm/trunk/test/CodeGen/AMDGPU/ret.ll
>>>>  llvm/trunk/test/CodeGen/ARM/atomic-op.ll
>>>>  llvm/trunk/test/CodeGen/ARM/intrinsics-overflow.ll
>>>>  llvm/trunk/test/CodeGen/ARM/select-imm.ll
>>>>  llvm/trunk/test/CodeGen/ARM/swifterror.ll
>>>>  llvm/trunk/test/CodeGen/Mips/llvm-ir/ashr.ll
>>>>  llvm/trunk/test/CodeGen/Mips/llvm-ir/lshr.ll
>>>>  llvm/trunk/test/CodeGen/Mips/llvm-ir/shl.ll
>>>>  llvm/trunk/test/CodeGen/Mips/llvm-ir/sub.ll
>>>>  llvm/trunk/test/CodeGen/PowerPC/MCSE-caller-preserved-reg.ll
>>>>  llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll
>>>>  llvm/trunk/test/CodeGen/PowerPC/gpr-vsr-spill.ll
>>>>  llvm/trunk/test/CodeGen/PowerPC/licm-remat.ll
>>>>  llvm/trunk/test/CodeGen/PowerPC/opt-li-add-to-addi.ll
>>>>  llvm/trunk/test/CodeGen/PowerPC/tail-dup-layout.ll
>>>>  llvm/trunk/test/CodeGen/SPARC/32abi.ll
>>>>  llvm/trunk/test/CodeGen/SPARC/atomics.ll
>>>>  llvm/trunk/test/CodeGen/SystemZ/vec-sub-01.ll
>>>>  llvm/trunk/test/CodeGen/Thumb/pr35836.ll
>>>>  llvm/trunk/test/CodeGen/Thumb/thumb-shrink-wrapping.ll
>>>>  llvm/trunk/test/CodeGen/X86/2006-03-01-InstrSchedBug.ll
>>>>  llvm/trunk/test/CodeGen/X86/arg-copy-elide.ll
>>>>  llvm/trunk/test/CodeGen/X86/avx-load-store.ll
>>>>  llvm/trunk/test/CodeGen/X86/avx512-bugfix-25270.ll
>>>>  llvm/trunk/test/CodeGen/X86/avx512-calling-conv.ll
>>>>  llvm/trunk/test/CodeGen/X86/avx512-regcall-NoMask.ll
>>>>  llvm/trunk/test/CodeGen/X86/avx512bw-intrinsics-fast-isel.ll
>>>>  llvm/trunk/test/CodeGen/X86/avx512bw-intrinsics-upgrade.ll
>>>>  llvm/trunk/test/CodeGen/X86/buildvec-insertvec.ll
>>>>  llvm/trunk/test/CodeGen/X86/combine-fcopysign.ll
>>>>  llvm/trunk/test/CodeGen/X86/combine-shl.ll
>>>>  llvm/trunk/test/CodeGen/X86/complex-fastmath.ll
>>>>  llvm/trunk/test/CodeGen/X86/divide-by-constant.ll
>>>>  llvm/trunk/test/CodeGen/X86/fmaxnum.ll
>>>>  llvm/trunk/test/CodeGen/X86/fmf-flags.ll
>>>>  llvm/trunk/test/CodeGen/X86/fminnum.ll
>>>>  llvm/trunk/test/CodeGen/X86/fp128-i128.ll
>>>>  llvm/trunk/test/CodeGen/X86/h-registers-1.ll
>>>>  llvm/trunk/test/CodeGen/X86/haddsub-2.ll
>>>>  llvm/trunk/test/CodeGen/X86/haddsub-3.ll
>>>>  llvm/trunk/test/CodeGen/X86/haddsub-undef.ll
>>>>  llvm/trunk/test/CodeGen/X86/half.ll
>>>>  llvm/trunk/test/CodeGen/X86/horizontal-reduce-smax.ll
>>>>  llvm/trunk/test/CodeGen/X86/horizontal-reduce-smin.ll
>>>>  llvm/trunk/test/CodeGen/X86/horizontal-reduce-umax.ll
>>>>  llvm/trunk/test/CodeGen/X86/horizontal-reduce-umin.ll
>>>>  llvm/trunk/test/CodeGen/X86/inline-asm-fpstack.ll
>>>>  llvm/trunk/test/CodeGen/X86/ipra-local-linkage.ll
>>>>  llvm/trunk/test/CodeGen/X86/localescape.ll
>>>>  llvm/trunk/test/CodeGen/X86/machine-cp.ll
>>>>  llvm/trunk/test/CodeGen/X86/mul-i1024.ll
>>>>  llvm/trunk/test/CodeGen/X86/mul-i256.ll
>>>>  llvm/trunk/test/CodeGen/X86/mul-i512.ll
>>>>  llvm/trunk/test/CodeGen/X86/mul128.ll
>>>>  llvm/trunk/test/CodeGen/X86/mulvi32.ll
>>>>  llvm/trunk/test/CodeGen/X86/pmul.ll
>>>>  llvm/trunk/test/CodeGen/X86/powi.ll
>>>>  llvm/trunk/test/CodeGen/X86/pr11334.ll
>>>>  llvm/trunk/test/CodeGen/X86/pr29112.ll
>>>>  llvm/trunk/test/CodeGen/X86/pr34080-2.ll
>>>>  llvm/trunk/test/CodeGen/X86/psubus.ll
>>>>  llvm/trunk/test/CodeGen/X86/retpoline-external.ll
>>>>  llvm/trunk/test/CodeGen/X86/retpoline.ll
>>>>  llvm/trunk/test/CodeGen/X86/sad.ll
>>>>  llvm/trunk/test/CodeGen/X86/safestack.ll
>>>>  llvm/trunk/test/CodeGen/X86/safestack_inline.ll
>>>>  llvm/trunk/test/CodeGen/X86/scalar_widen_div.ll
>>>>  llvm/trunk/test/CodeGen/X86/select.ll
>>>>  llvm/trunk/test/CodeGen/X86/shrink-wrap-chkstk.ll
>>>>  llvm/trunk/test/CodeGen/X86/slow-pmulld.ll
>>>>  llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll
>>>>  llvm/trunk/test/CodeGen/X86/sse-scalar-fp-arith.ll
>>>>  llvm/trunk/test/CodeGen/X86/sse1.ll
>>>>  llvm/trunk/test/CodeGen/X86/sse3-avx-addsub-2.ll
>>>>  llvm/trunk/test/CodeGen/X86/statepoint-live-in.ll
>>>>  llvm/trunk/test/CodeGen/X86/statepoint-stack-usage.ll
>>>>  llvm/trunk/test/CodeGen/X86/vec_fp_to_int.ll
>>>>  llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll
>>>>  llvm/trunk/test/CodeGen/X86/vec_minmax_sint.ll
>>>>  llvm/trunk/test/CodeGen/X86/vec_shift4.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-blend.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-idiv-sdiv-128.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-idiv-udiv-128.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-mul.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-rotate-128.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-sext.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-shift-ashr-128.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-shift-lshr-128.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-shift-shl-128.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-trunc-math.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-trunc-packus.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-trunc-ssat.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-trunc-usat.ll
>>>>  llvm/trunk/test/CodeGen/X86/vector-zext.ll
>>>>  llvm/trunk/test/CodeGen/X86/vselect-minmax.ll
>>>>  llvm/trunk/test/CodeGen/X86/widen_conv-3.ll
>>>>  llvm/trunk/test/CodeGen/X86/widen_conv-4.ll
>>>>  llvm/trunk/test/CodeGen/X86/win64_frame.ll
>>>>  llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll
>>>>  llvm/trunk/test/CodeGen/X86/x86-shrink-wrap-unwind.ll
>>>>  llvm/trunk/test/CodeGen/X86/x86-shrink-wrapping.ll
>>>>  llvm/trunk/test/DebugInfo/COFF/fpo-shrink-wrap.ll
>>>>  llvm/trunk/test/DebugInfo/X86/spill-nospill.ll
>>>> 
>>>> <D41835.132429.patch>_______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180202/b936a305/attachment.html>


More information about the llvm-commits mailing list