[PATCH] D41835: [MachineCopyPropagation] Extend pass to do COPY source forwarding

via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 2 13:33:15 PST 2018


oh no problem, you’re not being difficult! it’s a subtle issue and this part of the compilation pipeline is honestly a mess.

the problem here is that after regalloc, *the constraints are lost*, because you no longer have vregs and vreg classes. so for example, while you might have a “gpr32nosr” operand before alloc, now you just have “r2”. at least, i think that’s my understanding of it, going by matthias and quentin’s explanations?

—escha

> On Feb 2, 2018, at 1:28 PM, Nemanja Ivanovic <nemanja.i.ibm at gmail.com> wrote:
> 
> Sure, but won't getRegClassConstraint() just return a nullptr in which case the copy source forwarding can't happen?
> 
> Sorry, I'm really not trying to be difficult, just to understand the issue more thoroughly.
> 
> On Fri, Feb 2, 2018 at 4:23 PM, <escha at apple.com <mailto:escha at apple.com>> wrote:
> The problem is that MCInstrDesc can only represent a static set of operands and register classes. In practice, targets may need to apply extra constraints to a given instruction, constraints which are not constant per-opcode.
> 
> An extreme example is our sample instructions, which are entirely manually constructed: the tablegen definitions literally define the inputs as “unknown_tuple” and we assign the actual classes in instruction selection based on the parameters. This is necessary because the number of possible permutations of sample instructions is obscenely prohibitive and it would be fairly unreasonable to have a separate opcode for each one.
> 
> —escha
> 
> 
>> On Feb 2, 2018, at 1:18 PM, Nemanja Ivanovic <nemanja.i.ibm at gmail.com <mailto:nemanja.i.ibm at gmail.com>> wrote:
>> 
>> Forgive me if I am misinterpreting what the suggestion is, but while I agree that a pass with the potential to break things should be opt-in rather than opt-out, I'm curious about a couple of things:
>> 
>> 1. The granularity of the opt-in/opt-out. Seems that as Escha pointed out, opting out for every instruction that could receive a result of a copy can be prohibitive. But equivalently, opting in for every instruction can be quite daunting as well. Would it make more sense for the pass itself to be opt-in only and then instructions themselves to be opt-out? That way targets that can just use the pass as-is can just go ahead and opt-in (and fix any instructions that need to opt out) and targets for which it would be prohibitive to do this per-instruction opt-out would simply just not opt-in to the pass.
>> 
>> 2. Seems like the issue that Escha describes can be handled in the pass by ensuring that the source is forwarded only if it is a register that is in the register class for the user of the copy. Isn't everything required for this available from MCInstrDesc and TargetRegisterClass? Or maybe I'm misinterpreting what the problem actually is...
>> 
>> On Fri, Feb 2, 2018 at 4:10 PM, <gberry at codeaurora.org <mailto:gberry at codeaurora.org>> wrote:
>> In the case of AMDGPU there were just one or two opcode base classes in the td files that needed to have the hasExtraSrcRegAllocReq attribute added, so it wasn’t a big deal.
>> 
>> I don’t think flipping the default value of this opcode property makes sense, as that would involve changing all of the opcodes (or their base classes) in all of the in-tree targets (other than AMDGPU).
>> 
>> How about a target property that says “all opcodes are hasExtraSrcRegAllocReq”?
>> 
>>  
>> 
>> --
>> 
>> Geoff Berry
>> 
>> Employee of Qualcomm Datacenter Technologies, Inc.
>> 
>> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
>> 
>>  
>> 
>> From: qcolombet at apple.com <mailto:qcolombet at apple.com> [mailto:qcolombet at apple.com <mailto:qcolombet at apple.com>] 
>> Sent: Friday, February 2, 2018 3:55 PM
>> To: Geoff Berry <gberry at codeaurora.org <mailto:gberry at codeaurora.org>>
>> Cc: reviews+D41835+public+9c1dec7fb6e75ce0 at reviews.llvm.org <mailto:reviews%2BD41835%2Bpublic%2B9c1dec7fb6e75ce0 at reviews.llvm.org>; Geoff Berry via Phabricator <reviews at reviews.llvm.org <mailto:reviews at reviews.llvm.org>>; javed.absar at arm.com <mailto:javed.absar at arm.com>; Matthias Braun <matze at braunis.de <mailto:matze at braunis.de>>; Jonas Paulsson <paulsson at linux.vnet.ibm.com <mailto:paulsson at linux.vnet.ibm.com>>; tstellar at redhat.com <mailto:tstellar at redhat.com>; Matt Arsenault <Matthew.Arsenault at amd.com <mailto:Matthew.Arsenault at amd.com>>; junbuml at codeaurora.org <mailto:junbuml at codeaurora.org>; marina.yatsina at intel.com <mailto:marina.yatsina at intel.com>; wei.ding2 at amd.com <mailto:wei.ding2 at amd.com>; kannan.narayanan at amd.com <mailto:kannan.narayanan at amd.com>; nhaehnle at gmail.com <mailto:nhaehnle at gmail.com>; Nemanja Ivanovic <nemanja.i.ibm at gmail.com <mailto:nemanja.i.ibm at gmail.com>>; llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>>; tpr.llvm at botech.co.uk <mailto:tpr.llvm at botech.co.uk>; escha at apple.com <mailto:escha at apple.com>
>> 
>> Subject: Re: [PATCH] D41835: [MachineCopyPropagation] Extend pass to do COPY source forwarding
>> 
>>  
>> 
>>  
>> 
>> 
>> 
>> 
>> On Feb 2, 2018, at 12:21 PM, escha at apple.com <mailto:escha at apple.com> wrote:
>> 
>>  
>> 
>> I mean, in that case we are likely to have to mark every single opcode (all 12,000 or so) with this requirement. At that point we might as well just opt out of the pass, I think? At least, it feels like a gross hack that papers over the fact that LLVM has changed how register classes work such that our entire approach is no longer valid.
>> 
>>  
>> 
>> Also, it seems very weird to make this constraint-violating behavior *opt-out*. Maybe it should be opt-in, i.e. put doesNotHaveExtraSrcRegAllocReq on all instructions it’s okay for?
>> 
>>  
>> 
>> That sounds like a better approach to me.
>> 
>> After talking with escha, I agree that TableGen is not necessarily expressive enough to model all the constraints that need to be met and I would err on the safe side of being opt-in instead of opt-out.
>> 
>> 
>> 
>> 
>>  
>> 
>> —escha
>> 
>> 
>> 
>> 
>> On Feb 2, 2018, at 12:17 PM, gberry at codeaurora.org <mailto:gberry at codeaurora.org> wrote:
>> 
>>  
>> 
>> Hi escha, 
>> 
>>  
>> 
>> This sounds very similar to the issues this change caused on AMDGPU.  The way we worked around them for those effected targets was to mark the relevant opcodes with hasExtraSrcRegAllocReq, which will prevent the MachineOperands from being marked as isRenamable, which will in turn prevent MachineCopyPropagation from changing them.  Is this a reasonable fix for your targets?
>> 
>>  
>> 
>> -- 
>> 
>> Geoff Berry
>> 
>> Employee of Qualcomm Datacenter Technologies, Inc.
>> 
>> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
>> 
>>  
>> 
>> From: fglaser at apple.com <mailto:fglaser at apple.com> [mailto:fglaser at apple.com <mailto:fglaser at apple.com>] On Behalf Of escha at apple.com <mailto:escha at apple.com>
>> Sent: Friday, February 2, 2018 2:56 PM
>> To: reviews+D41835+public+9c1dec7fb6e75ce0 at reviews.llvm.org <mailto:reviews+D41835+public+9c1dec7fb6e75ce0 at reviews.llvm.org>; Geoff Berry via Phabricator <reviews at reviews.llvm.org <mailto:reviews at reviews.llvm.org>>
>> Cc: gberry at codeaurora.org <mailto:gberry at codeaurora.org>; qcolombet at apple.com <mailto:qcolombet at apple.com>; javed.absar at arm.com <mailto:javed.absar at arm.com>; matze at braunis.de <mailto:matze at braunis.de>; paulsson at linux.vnet.ibm.com <mailto:paulsson at linux.vnet.ibm.com>; tstellar at redhat.com <mailto:tstellar at redhat.com>; Matthew.Arsenault at amd.com <mailto:Matthew.Arsenault at amd.com>; junbuml at codeaurora.org <mailto:junbuml at codeaurora.org>; marina.yatsina at intel.com <mailto:marina.yatsina at intel.com>; wei.ding2 at amd.com <mailto:wei.ding2 at amd.com>; kannan.narayanan at amd.com <mailto:kannan.narayanan at amd.com>; nhaehnle at gmail.com <mailto:nhaehnle at gmail.com>; nemanja.i.ibm at gmail.com <mailto:nemanja.i.ibm at gmail.com>; llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>; tpr.llvm at botech.co.uk <mailto:tpr.llvm at botech.co.uk>
>> Subject: Re: [PATCH] D41835: [MachineCopyPropagation] Extend pass to do COPY source forwarding
>> 
>>  
>> 
>> FYI, this completely broke our (out-of-tree) backends in more ways than i can count. In particular, it seems to break the ability for us to enforce constraints on register classes at all: we heavily rely on this for a wide variety of instructions whose behavior cannot be fully expressed in tablegen, or really, anywhere that we need to say “X argument of Y instruction should be constrained to class Z”.
>> 
>>  
>> 
>> Here is one particular example that illustrates this (but what breaks is potentially far more than this).
>> 
>>  
>> 
>> Suppose you have a machine in which instructions can have at most 2 arguments in the category {immediates, special registers}. We limit immediates to 2 in instruction selection, and then run a small pass after instruction selection that identifies any instructions with more than 2 {immediates, special registers} and marks the special register arguments as “no SR”. For example:
>> 
>>  
>> 
>> %6:gpr32 = COPY $sr12
>> 
>> 
>> 
>> 
>> 
>> becomes
>> 
>>  
>> 
>> %6:gpr32nosr = COPY $sr12
>> 
>>  
>> 
>> reasonably, this should mean the COPY cannot be folded into the instruction that uses it, because the argument is a “gpr32nosr”, which prohibits special registers.
>> 
>>  
>> 
>> however, with this patch…
>> 
>>  
>> 
>> # *** IR Dump After Greedy Register Allocator ***:
>> 
>> %6:gpr32nosr = COPY $sr12
>> 
>> %12.sub2:gpr32tup3 = /* some instruction */ 0, 0, %6, 0, 4294967312, 0, 0
>> 
>> # *** IR Dump After Virtual Register Rewriter ***:
>> 
>> renamable $r2 = COPY $sr12
>> 
>> renamable $r2 = /* some instruction */ 0, 0, %6, 0, 4294967312, 0, 0
>> 
>> […]
>> 
>> # *** IR Dump After Machine Copy Propagation Pass ***:
>> 
>> renamable $r2 = /* some instruction */ 0, 0, $sr12, 0, 4294967312, 0, 0
>> 
>>  
>> 
>> in the second step the information that “%6” cannot be an SR is lost due to register allocation, and then copy propagation happily plows through and violates the constraint we set earlier.
>> 
>>  
>> 
>> this is the simplest example, but this entire patch makes me extremely nervous because we rely on register classes heavily in our backend and if the compiler is now free to violate our constraints, all bets are off.
>> 
>>  
>> 
>> —escha
>> 
>> 
>> 
>> 
>> 
>> On Feb 1, 2018, at 10:56 AM, Geoff Berry via Phabricator via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
>> 
>>  
>> 
>> This revision was automatically updated to reflect the committed changes.
>> Closed by commit rL323991: [MachineCopyPropagation] Extend pass to do COPY source forwarding (authored by gberry, committed by ).
>> 
>> Changed prior to commit:
>>  https://reviews.llvm.org/D41835?vs=131323&id=132429#toc <https://reviews.llvm.org/D41835?vs=131323&id=132429#toc>
>> 
>> Repository:
>>  rL LLVM
>> 
>> https://reviews.llvm.org/D41835 <https://reviews.llvm.org/D41835>
>> 
>> Files:
>>  llvm/trunk/lib/CodeGen/MachineCopyPropagation.cpp
>>  llvm/trunk/lib/CodeGen/TargetPassConfig.cpp
>>  llvm/trunk/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
>>  llvm/trunk/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll
>>  llvm/trunk/test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
>>  llvm/trunk/test/CodeGen/AArch64/cmpxchg-idioms.ll
>>  llvm/trunk/test/CodeGen/AArch64/copyprop.mir
>>  llvm/trunk/test/CodeGen/AArch64/f16-instructions.ll
>>  llvm/trunk/test/CodeGen/AArch64/flags-multiuse.ll
>>  llvm/trunk/test/CodeGen/AArch64/ldst-opt.ll
>>  llvm/trunk/test/CodeGen/AArch64/merge-store-dependency.ll
>>  llvm/trunk/test/CodeGen/AArch64/neg-imm.ll
>>  llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll
>>  llvm/trunk/test/CodeGen/AMDGPU/fix-vgpr-copies.mir
>>  llvm/trunk/test/CodeGen/AMDGPU/multilevel-break.ll
>>  llvm/trunk/test/CodeGen/AMDGPU/ret.ll
>>  llvm/trunk/test/CodeGen/ARM/atomic-op.ll
>>  llvm/trunk/test/CodeGen/ARM/intrinsics-overflow.ll
>>  llvm/trunk/test/CodeGen/ARM/select-imm.ll
>>  llvm/trunk/test/CodeGen/ARM/swifterror.ll
>>  llvm/trunk/test/CodeGen/Mips/llvm-ir/ashr.ll
>>  llvm/trunk/test/CodeGen/Mips/llvm-ir/lshr.ll
>>  llvm/trunk/test/CodeGen/Mips/llvm-ir/shl.ll
>>  llvm/trunk/test/CodeGen/Mips/llvm-ir/sub.ll
>>  llvm/trunk/test/CodeGen/PowerPC/MCSE-caller-preserved-reg.ll
>>  llvm/trunk/test/CodeGen/PowerPC/fma-mutate.ll
>>  llvm/trunk/test/CodeGen/PowerPC/gpr-vsr-spill.ll
>>  llvm/trunk/test/CodeGen/PowerPC/licm-remat.ll
>>  llvm/trunk/test/CodeGen/PowerPC/opt-li-add-to-addi.ll
>>  llvm/trunk/test/CodeGen/PowerPC/tail-dup-layout.ll
>>  llvm/trunk/test/CodeGen/SPARC/32abi.ll
>>  llvm/trunk/test/CodeGen/SPARC/atomics.ll
>>  llvm/trunk/test/CodeGen/SystemZ/vec-sub-01.ll
>>  llvm/trunk/test/CodeGen/Thumb/pr35836.ll
>>  llvm/trunk/test/CodeGen/Thumb/thumb-shrink-wrapping.ll
>>  llvm/trunk/test/CodeGen/X86/2006-03-01-InstrSchedBug.ll
>>  llvm/trunk/test/CodeGen/X86/arg-copy-elide.ll
>>  llvm/trunk/test/CodeGen/X86/avx-load-store.ll
>>  llvm/trunk/test/CodeGen/X86/avx512-bugfix-25270.ll
>>  llvm/trunk/test/CodeGen/X86/avx512-calling-conv.ll
>>  llvm/trunk/test/CodeGen/X86/avx512-regcall-NoMask.ll
>>  llvm/trunk/test/CodeGen/X86/avx512bw-intrinsics-fast-isel.ll
>>  llvm/trunk/test/CodeGen/X86/avx512bw-intrinsics-upgrade.ll
>>  llvm/trunk/test/CodeGen/X86/buildvec-insertvec.ll
>>  llvm/trunk/test/CodeGen/X86/combine-fcopysign.ll
>>  llvm/trunk/test/CodeGen/X86/combine-shl.ll
>>  llvm/trunk/test/CodeGen/X86/complex-fastmath.ll
>>  llvm/trunk/test/CodeGen/X86/divide-by-constant.ll
>>  llvm/trunk/test/CodeGen/X86/fmaxnum.ll
>>  llvm/trunk/test/CodeGen/X86/fmf-flags.ll
>>  llvm/trunk/test/CodeGen/X86/fminnum.ll
>>  llvm/trunk/test/CodeGen/X86/fp128-i128.ll
>>  llvm/trunk/test/CodeGen/X86/h-registers-1.ll
>>  llvm/trunk/test/CodeGen/X86/haddsub-2.ll
>>  llvm/trunk/test/CodeGen/X86/haddsub-3.ll
>>  llvm/trunk/test/CodeGen/X86/haddsub-undef.ll
>>  llvm/trunk/test/CodeGen/X86/half.ll
>>  llvm/trunk/test/CodeGen/X86/horizontal-reduce-smax.ll
>>  llvm/trunk/test/CodeGen/X86/horizontal-reduce-smin.ll
>>  llvm/trunk/test/CodeGen/X86/horizontal-reduce-umax.ll
>>  llvm/trunk/test/CodeGen/X86/horizontal-reduce-umin.ll
>>  llvm/trunk/test/CodeGen/X86/inline-asm-fpstack.ll
>>  llvm/trunk/test/CodeGen/X86/ipra-local-linkage.ll
>>  llvm/trunk/test/CodeGen/X86/localescape.ll
>>  llvm/trunk/test/CodeGen/X86/machine-cp.ll
>>  llvm/trunk/test/CodeGen/X86/mul-i1024.ll
>>  llvm/trunk/test/CodeGen/X86/mul-i256.ll
>>  llvm/trunk/test/CodeGen/X86/mul-i512.ll
>>  llvm/trunk/test/CodeGen/X86/mul128.ll
>>  llvm/trunk/test/CodeGen/X86/mulvi32.ll
>>  llvm/trunk/test/CodeGen/X86/pmul.ll
>>  llvm/trunk/test/CodeGen/X86/powi.ll
>>  llvm/trunk/test/CodeGen/X86/pr11334.ll
>>  llvm/trunk/test/CodeGen/X86/pr29112.ll
>>  llvm/trunk/test/CodeGen/X86/pr34080-2.ll
>>  llvm/trunk/test/CodeGen/X86/psubus.ll
>>  llvm/trunk/test/CodeGen/X86/retpoline-external.ll
>>  llvm/trunk/test/CodeGen/X86/retpoline.ll
>>  llvm/trunk/test/CodeGen/X86/sad.ll
>>  llvm/trunk/test/CodeGen/X86/safestack.ll
>>  llvm/trunk/test/CodeGen/X86/safestack_inline.ll
>>  llvm/trunk/test/CodeGen/X86/scalar_widen_div.ll
>>  llvm/trunk/test/CodeGen/X86/select.ll
>>  llvm/trunk/test/CodeGen/X86/shrink-wrap-chkstk.ll
>>  llvm/trunk/test/CodeGen/X86/slow-pmulld.ll
>>  llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll
>>  llvm/trunk/test/CodeGen/X86/sse-scalar-fp-arith.ll
>>  llvm/trunk/test/CodeGen/X86/sse1.ll
>>  llvm/trunk/test/CodeGen/X86/sse3-avx-addsub-2.ll
>>  llvm/trunk/test/CodeGen/X86/statepoint-live-in.ll
>>  llvm/trunk/test/CodeGen/X86/statepoint-stack-usage.ll
>>  llvm/trunk/test/CodeGen/X86/vec_fp_to_int.ll
>>  llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll
>>  llvm/trunk/test/CodeGen/X86/vec_minmax_sint.ll
>>  llvm/trunk/test/CodeGen/X86/vec_shift4.ll
>>  llvm/trunk/test/CodeGen/X86/vector-blend.ll
>>  llvm/trunk/test/CodeGen/X86/vector-idiv-sdiv-128.ll
>>  llvm/trunk/test/CodeGen/X86/vector-idiv-udiv-128.ll
>>  llvm/trunk/test/CodeGen/X86/vector-mul.ll
>>  llvm/trunk/test/CodeGen/X86/vector-rotate-128.ll
>>  llvm/trunk/test/CodeGen/X86/vector-sext.ll
>>  llvm/trunk/test/CodeGen/X86/vector-shift-ashr-128.ll
>>  llvm/trunk/test/CodeGen/X86/vector-shift-lshr-128.ll
>>  llvm/trunk/test/CodeGen/X86/vector-shift-shl-128.ll
>>  llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll
>>  llvm/trunk/test/CodeGen/X86/vector-trunc-math.ll
>>  llvm/trunk/test/CodeGen/X86/vector-trunc-packus.ll
>>  llvm/trunk/test/CodeGen/X86/vector-trunc-ssat.ll
>>  llvm/trunk/test/CodeGen/X86/vector-trunc-usat.ll
>>  llvm/trunk/test/CodeGen/X86/vector-zext.ll
>>  llvm/trunk/test/CodeGen/X86/vselect-minmax.ll
>>  llvm/trunk/test/CodeGen/X86/widen_conv-3.ll
>>  llvm/trunk/test/CodeGen/X86/widen_conv-4.ll
>>  llvm/trunk/test/CodeGen/X86/win64_frame.ll
>>  llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll
>>  llvm/trunk/test/CodeGen/X86/x86-shrink-wrap-unwind.ll
>>  llvm/trunk/test/CodeGen/X86/x86-shrink-wrapping.ll
>>  llvm/trunk/test/DebugInfo/COFF/fpo-shrink-wrap.ll
>>  llvm/trunk/test/DebugInfo/X86/spill-nospill.ll
>> 
>> <D41835.132429.patch>_______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>  
>> 
>>  
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180202/d0b505a7/attachment.html>


More information about the llvm-commits mailing list