[llvm] r287792 - [X86] Allow folding of stack reloads when loading a subreg of the spilled reg

Fri Nov 25 14:03:29 PST 2016

On 11/25/2016 06:55 AM, Michael Kuperstein wrote:
> So, I had this exact discussion with Matthias on the review thread, 
> and on IRC.
Ah, glad to know it got discussed.  That was my primary concern. I'll 
share my 2cts below, but that's just for the record, not because I'm 
asking for changes.
>
> The problem isn't in-tree targets, it's out-of-tree targets. Now, 
> generally speaking, breaking out-of-tree targets is fine, but in this 
> case, I think it's a particularly nasty kind of break - it's a change 
> that silently relaxes an API invariant. And the way the breakage would 
> manifest is by creating nasty-to-debug miscompiles.
> I'd really rather not be *that* hostile to downstream.
Honestly, this really feels like the wrong tradeoff to me.  We shouldn't 
be taking code complexity upstream to prevent possible problems in 
downstream out of tree backends.  We should give notice of potentially 
breaking changes (llvm-dev email, release notes, etc..), but the 
maintenance responsibility for the out of tree code should lie on the 
out of tree users.  Beyond the obvious goal of avoiding confusing 
complexity upstream, this is one of our main incentive mechanisms for 
out of tree users to follow ToT and eventually become upstream 
contributors.

One possible middle ground would be to offer the callback (with the safe 
default) for a limited migration period.  Explicitly document the 
callback as only being present in one release, update the release 
documents to clearly state the required migration, and remove the 
callback one day after the next release has landed.  This would give a 
softer migration period without accumulating technical complexity long 
term.
>
> We could make the break less silent by changing the foldMemoryOperand 
> API in a way that'll break in compile time, but it's really not clear 
> it's worth it.
>
> On Thu, Nov 24, 2016 at 7:50 PM, Philip Reames 
> <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote:
>
>     On 11/23/2016 10:33 AM, Michael Kuperstein via llvm-commits wrote:
>
>         Author: mkuper
>         Date: Wed Nov 23 12:33:49 2016
>         New Revision: 287792
>
>         URL: http://llvm.org/viewvc/llvm-project?rev=287792&view=rev
>         <http://llvm.org/viewvc/llvm-project?rev=287792&view=rev>
>         Log:
>         [X86] Allow folding of stack reloads when loading a subreg of
>         the spilled reg
>
>         We did not support subregs in
>         InlineSpiller:foldMemoryOperand() because targets
>         may not deal with them correctly.
>
>         This adds a target hook to let the spiller know that a target
>         can handle
>         subregs, and actually enables it for x86 for the case of stack
>         slot reloads.
>         This fixes PR30832.
>
>     This feels like a weird design.  If I remember correctly,
>     foldMemoryOperand is allowed to do nothing if it doesn't know how
>     to fold.  Given this, why not just update the in tree targets to
>     check for a sub-reg load and bail out?  Why do we need yet another
>     target hook?
>
>
>
>         Differential Revision: https://reviews.llvm.org/D26521
>         <https://reviews.llvm.org/D26521>
>
>         Modified:
>              llvm/trunk/include/llvm/Target/TargetInstrInfo.h
>              llvm/trunk/lib/CodeGen/InlineSpiller.cpp
>              llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp
>              llvm/trunk/lib/Target/X86/X86InstrInfo.cpp
>              llvm/trunk/lib/Target/X86/X86InstrInfo.h
>              llvm/trunk/test/CodeGen/X86/partial-fold32.ll
>              llvm/trunk/test/CodeGen/X86/partial-fold64.ll
>              llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll
>
>         Modified: llvm/trunk/include/llvm/Target/TargetInstrInfo.h
>         URL:
>         http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetInstrInfo.h?rev=287792&r1=287791&r2=287792&view=diff
>         <http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetInstrInfo.h?rev=287792&r1=287791&r2=287792&view=diff>
>         ==============================================================================
>         --- llvm/trunk/include/llvm/Target/TargetInstrInfo.h (original)
>         +++ llvm/trunk/include/llvm/Target/TargetInstrInfo.h Wed Nov
>         23 12:33:49 2016
>         @@ -817,6 +817,20 @@ public:
>             /// anything was changed.
>             virtual bool expandPostRAPseudo(MachineInstr &MI) const {
>         return false; }
>           +  /// Check whether the target can fold a load that feeds a
>         subreg operand
>         +  /// (or a subreg operand that feeds a store).
>         +  /// For example, X86 may want to return true if it can fold
>         +  /// movl (%esp), %eax
>         +  /// subb, %al, ...
>         +  /// Into:
>         +  /// subb (%esp), ...
>         +  ///
>         +  /// Ideally, we'd like the target implementation of
>         foldMemoryOperand() to
>         +  /// reject subregs - but since this behavior used to be
>         enforced in the
>         +  /// target-independent code, moving this responsibility to
>         the targets
>         +  /// has the potential of causing nasty silent breakage in
>         out-of-tree targets.
>         +  virtual bool isSubregFoldable() const { return false; }
>         +
>             /// Attempt to fold a load or store of the specified stack
>             /// slot into the specified machine instruction for the
>         specified operand(s).
>             /// If this is possible, a new instruction is returned
>         with the specified
>
>         Modified: llvm/trunk/lib/CodeGen/InlineSpiller.cpp
>         URL:
>         http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/InlineSpiller.cpp?rev=287792&r1=287791&r2=287792&view=diff
>         <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/InlineSpiller.cpp?rev=287792&r1=287791&r2=287792&view=diff>
>         ==============================================================================
>         --- llvm/trunk/lib/CodeGen/InlineSpiller.cpp (original)
>         +++ llvm/trunk/lib/CodeGen/InlineSpiller.cpp Wed Nov 23
>         12:33:49 2016
>         @@ -739,9 +739,12 @@ foldMemoryOperand(ArrayRef<std::pair<Mac
>             bool WasCopy = MI->isCopy();
>             unsigned ImpReg = 0;
>           -  bool SpillSubRegs = (MI->getOpcode() ==
>         TargetOpcode::STATEPOINT ||
>         -                       MI->getOpcode() ==
>         TargetOpcode::PATCHPOINT ||
>         -                       MI->getOpcode() ==
>         TargetOpcode::STACKMAP);
>         +  // Spill subregs if the target allows it.
>         +  // We always want to spill subregs for stackmap/patchpoint
>         pseudos.
>         +  bool SpillSubRegs = TII.isSubregFoldable() ||
>         +                      MI->getOpcode() ==
>         TargetOpcode::STATEPOINT ||
>         +                      MI->getOpcode() ==
>         TargetOpcode::PATCHPOINT ||
>         +                      MI->getOpcode() == TargetOpcode::STACKMAP;
>               // TargetInstrInfo::foldMemoryOperand only expects
>         explicit, non-tied
>             // operands.
>         @@ -754,7 +757,7 @@ foldMemoryOperand(ArrayRef<std::pair<Mac
>                 ImpReg = MO.getReg();
>                 continue;
>               }
>         -    // FIXME: Teach targets to deal with subregs.
>         +
>               if (!SpillSubRegs && MO.getSubReg())
>                 return false;
>               // We cannot fold a load instruction into a def.
>
>         Modified: llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp
>         URL:
>         http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp?rev=287792&r1=287791&r2=287792&view=diff
>         <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp?rev=287792&r1=287791&r2=287792&view=diff>
>         ==============================================================================
>         --- llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp (original)
>         +++ llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp Wed Nov 23
>         12:33:49 2016
>         @@ -515,6 +515,31 @@ MachineInstr *TargetInstrInfo::foldMemor
>             assert(MBB && "foldMemoryOperand needs an inserted
>         instruction");
>             MachineFunction &MF = *MBB->getParent();
>           +  // If we're not folding a load into a subreg, the size of
>         the load is the
>         +  // size of the spill slot. But if we are, we need to figure
>         out what the
>         +  // actual load size is.
>         +  int64_t MemSize = 0;
>         +  const MachineFrameInfo &MFI = MF.getFrameInfo();
>         +  const TargetRegisterInfo *TRI =
>         MF.getSubtarget().getRegisterInfo();
>         +
>         +  if (Flags & MachineMemOperand::MOStore) {
>         +    MemSize = MFI.getObjectSize(FI);
>         +  } else {
>         +    for (unsigned Idx : Ops) {
>         +      int64_t OpSize = MFI.getObjectSize(FI);
>         +
>         +      if (auto SubReg = MI.getOperand(Idx).getSubReg()) {
>         +        unsigned SubRegSize = TRI->getSubRegIdxSize(SubReg);
>         +        if (SubRegSize > 0 && !(SubRegSize % 8))
>         +          OpSize = SubRegSize / 8;
>         +      }
>         +
>         +      MemSize = std::max(MemSize, OpSize);
>         +    }
>         +  }
>         +
>         +  assert(MemSize && "Did not expect a zero-sized stack slot");
>         +
>             MachineInstr *NewMI = nullptr;
>               if (MI.getOpcode() == TargetOpcode::STACKMAP ||
>         @@ -538,10 +563,9 @@ MachineInstr *TargetInstrInfo::foldMemor
>               assert((!(Flags & MachineMemOperand::MOLoad) ||
>                       NewMI->mayLoad()) &&
>                      "Folded a use to a non-load!");
>         -    const MachineFrameInfo &MFI = MF.getFrameInfo();
>               assert(MFI.getObjectOffset(FI) != -1);
>               MachineMemOperand *MMO = MF.getMachineMemOperand(
>         -        MachinePointerInfo::getFixedStack(MF, FI), Flags,
>         MFI.getObjectSize(FI),
>         +        MachinePointerInfo::getFixedStack(MF, FI), Flags,
>         MemSize,
>                   MFI.getObjectAlignment(FI));
>               NewMI->addMemOperand(MF, MMO);
>           @@ -558,7 +582,6 @@ MachineInstr *TargetInstrInfo::foldMemor
>               const MachineOperand &MO = MI.getOperand(1 - Ops[0]);
>             MachineBasicBlock::iterator Pos = MI;
>         -  const TargetRegisterInfo *TRI =
>         MF.getSubtarget().getRegisterInfo();
>               if (Flags == MachineMemOperand::MOStore)
>               storeRegToStackSlot(*MBB, Pos, MO.getReg(), MO.isKill(),
>         FI, RC, TRI);
>
>         Modified: llvm/trunk/lib/Target/X86/X86InstrInfo.cpp
>         URL:
>         http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp?rev=287792&r1=287791&r2=287792&view=diff
>         <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp?rev=287792&r1=287791&r2=287792&view=diff>
>         ==============================================================================
>         --- llvm/trunk/lib/Target/X86/X86InstrInfo.cpp (original)
>         +++ llvm/trunk/lib/Target/X86/X86InstrInfo.cpp Wed Nov 23
>         12:33:49 2016
>         @@ -6843,6 +6843,14 @@ X86InstrInfo::foldMemoryOperandImpl(Mach
>             if (!MF.getFunction()->optForSize() &&
>         hasPartialRegUpdate(MI.getOpcode()))
>               return nullptr;
>           +  // Don't fold subreg spills, or reloads that use a high
>         subreg.
>         +  for (auto Op : Ops) {
>         +    MachineOperand &MO = MI.getOperand(Op);
>         +    auto SubReg = MO.getSubReg();
>         +    if (SubReg && (MO.isDef() || SubReg == X86::sub_8bit_hi))
>         +      return nullptr;
>         +  }
>         +
>             const MachineFrameInfo &MFI = MF.getFrameInfo();
>             unsigned Size = MFI.getObjectSize(FrameIndex);
>             unsigned Alignment = MFI.getObjectAlignment(FrameIndex);
>         @@ -6967,6 +6975,14 @@ MachineInstr *X86InstrInfo::foldMemoryOp
>               MachineFunction &MF, MachineInstr &MI,
>         ArrayRef<unsigned> Ops,
>               MachineBasicBlock::iterator InsertPt, MachineInstr &LoadMI,
>               LiveIntervals *LIS) const {
>         +
>         +  // TODO: Support the case where LoadMI loads a wide
>         register, but MI
>         +  // only uses a subreg.
>         +  for (auto Op : Ops) {
>         +    if (MI.getOperand(Op).getSubReg())
>         +      return nullptr;
>         +  }
>         +
>             // If loading from a FrameIndex, fold directly from the
>         FrameIndex.
>             unsigned NumOps = LoadMI.getDesc().getNumOperands();
>             int FrameIndex;
>
>         Modified: llvm/trunk/lib/Target/X86/X86InstrInfo.h
>         URL:
>         http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.h?rev=287792&r1=287791&r2=287792&view=diff
>         <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.h?rev=287792&r1=287791&r2=287792&view=diff>
>         ==============================================================================
>         --- llvm/trunk/lib/Target/X86/X86InstrInfo.h (original)
>         +++ llvm/trunk/lib/Target/X86/X86InstrInfo.h Wed Nov 23
>         12:33:49 2016
>         @@ -378,6 +378,10 @@ public:
>               bool expandPostRAPseudo(MachineInstr &MI) const override;
>           +  /// Check whether the target can fold a load that feeds a
>         subreg operand
>         +  /// (or a subreg operand that feeds a store).
>         +  bool isSubregFoldable() const override { return true; }
>         +
>             /// foldMemoryOperand - If this target supports it, fold a
>         load or store of
>             /// the specified stack slot into the specified machine
>         instruction for the
>             /// specified operand(s).  If this is possible, the target
>         should perform the
>
>         Modified: llvm/trunk/test/CodeGen/X86/partial-fold32.ll
>         URL:
>         http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/partial-fold32.ll?rev=287792&r1=287791&r2=287792&view=diff
>         <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/partial-fold32.ll?rev=287792&r1=287791&r2=287792&view=diff>
>         ==============================================================================
>         --- llvm/trunk/test/CodeGen/X86/partial-fold32.ll (original)
>         +++ llvm/trunk/test/CodeGen/X86/partial-fold32.ll Wed Nov 23
>         12:33:49 2016
>         @@ -3,8 +3,7 @@
>           define fastcc i8 @fold32to8(i32 %add, i8 %spill) {
>           ; CHECK-LABEL: fold32to8:
>           ; CHECK:    movl %ecx, (%esp) # 4-byte Spill
>         -; CHECK:    movl (%esp), %eax # 4-byte Reload
>         -; CHECK:    subb %al, %dl
>         +; CHECK:    subb (%esp), %dl  # 1-byte Folded Reload
>           entry:
>             tail call void asm sideeffect "",
>         "~{eax},~{ebx},~{ecx},~{edi},~{esi},~{ebp},~{dirflag},~{fpsr},~{flags}"()
>             %trunc = trunc i32 %add to i8
>
>         Modified: llvm/trunk/test/CodeGen/X86/partial-fold64.ll
>         URL:
>         http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/partial-fold64.ll?rev=287792&r1=287791&r2=287792&view=diff
>         <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/partial-fold64.ll?rev=287792&r1=287791&r2=287792&view=diff>
>         ==============================================================================
>         --- llvm/trunk/test/CodeGen/X86/partial-fold64.ll (original)
>         +++ llvm/trunk/test/CodeGen/X86/partial-fold64.ll Wed Nov 23
>         12:33:49 2016
>         @@ -3,8 +3,7 @@
>           define i32 @fold64to32(i64 %add, i32 %spill) {
>           ; CHECK-LABEL: fold64to32:
>           ; CHECK:    movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill
>         -; CHECK:    movq -{{[0-9]+}}(%rsp), %rax # 8-byte Reload
>         -; CHECK:    subl %eax, %esi
>         +; CHECK:    subl -{{[0-9]+}}(%rsp), %esi # 4-byte Folded Reload
>           entry:
>             tail call void asm sideeffect "",
>         "~{rax},~{rbx},~{rcx},~{rdx},~{rdi},~{rbp},~{r8},~{r9},~{r10},~{r11},~{r12},~{r13},~{r14},~{r15},~{dirflag},~{fpsr},~{flags}"()
>             %trunc = trunc i64 %add to i32
>         @@ -15,8 +14,7 @@ entry:
>           define i8 @fold64to8(i64 %add, i8 %spill) {
>           ; CHECK-LABEL: fold64to8:
>           ; CHECK:    movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill
>         -; CHECK:    movq -{{[0-9]+}}(%rsp), %rax # 8-byte Reload
>         -; CHECK:    subb %al, %sil
>         +; CHECK:    subb -{{[0-9]+}}(%rsp), %sil # 1-byte Folded Reload
>           entry:
>             tail call void asm sideeffect "",
>         "~{rax},~{rbx},~{rcx},~{rdx},~{rdi},~{rbp},~{r8},~{r9},~{r10},~{r11},~{r12},~{r13},~{r14},~{r15},~{dirflag},~{fpsr},~{flags}"()
>             %trunc = trunc i64 %add to i8
>
>         Modified: llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll
>         URL:
>         http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll?rev=287792&r1=287791&r2=287792&view=diff
>         <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll?rev=287792&r1=287791&r2=287792&view=diff>
>         ==============================================================================
>         --- llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll
>         (original)
>         +++ llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll Wed
>         Nov 23 12:33:49 2016
>         @@ -4788,9 +4788,8 @@ define <8 x i16> @cvt_8f64_to_8i16(<8 x
>           ; AVX1-NEXT:    orl %ebx, %r14d
>           ; AVX1-NEXT:    shlq $32, %r14
>           ; AVX1-NEXT:    orq %r15, %r14
>         -; AVX1-NEXT:    vmovupd (%rsp), %ymm0 # 32-byte Reload
>         -; AVX1-NEXT:    vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
>         -; AVX1-NEXT:    vzeroupper
>         +; AVX1-NEXT:    vpermilpd $1, (%rsp), %xmm0 # 16-byte Folded
>         Reload
>         +; AVX1-NEXT:    # xmm0 = mem[1,0]
>           ; AVX1-NEXT:    callq __truncdfhf2
>           ; AVX1-NEXT:    movw %ax, %bx
>           ; AVX1-NEXT:    shll $16, %ebx
>         @@ -4856,9 +4855,8 @@ define <8 x i16> @cvt_8f64_to_8i16(<8 x
>           ; AVX2-NEXT:    orl %ebx, %r14d
>           ; AVX2-NEXT:    shlq $32, %r14
>           ; AVX2-NEXT:    orq %r15, %r14
>         -; AVX2-NEXT:    vmovupd (%rsp), %ymm0 # 32-byte Reload
>         -; AVX2-NEXT:    vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
>         -; AVX2-NEXT:    vzeroupper
>         +; AVX2-NEXT:    vpermilpd $1, (%rsp), %xmm0 # 16-byte Folded
>         Reload
>         +; AVX2-NEXT:    # xmm0 = mem[1,0]
>           ; AVX2-NEXT:    callq __truncdfhf2
>           ; AVX2-NEXT:    movw %ax, %bx
>           ; AVX2-NEXT:    shll $16, %ebx
>         @@ -5585,9 +5583,8 @@ define void @store_cvt_8f64_to_8i16(<8 x
>           ; AVX1-NEXT:    vzeroupper
>           ; AVX1-NEXT:    callq __truncdfhf2
>           ; AVX1-NEXT:    movw %ax, {{[0-9]+}}(%rsp) # 2-byte Spill
>         -; AVX1-NEXT:    vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload
>         -; AVX1-NEXT:    vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
>         -; AVX1-NEXT:    vzeroupper
>         +; AVX1-NEXT:    vpermilpd $1, {{[0-9]+}}(%rsp), %xmm0 #
>         16-byte Folded Reload
>         +; AVX1-NEXT:    # xmm0 = mem[1,0]
>           ; AVX1-NEXT:    callq __truncdfhf2
>           ; AVX1-NEXT:    movl %eax, %r12d
>           ; AVX1-NEXT:    vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload
>         @@ -5654,9 +5651,8 @@ define void @store_cvt_8f64_to_8i16(<8 x
>           ; AVX2-NEXT:    vzeroupper
>           ; AVX2-NEXT:    callq __truncdfhf2
>           ; AVX2-NEXT:    movw %ax, {{[0-9]+}}(%rsp) # 2-byte Spill
>         -; AVX2-NEXT:    vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload
>         -; AVX2-NEXT:    vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
>         -; AVX2-NEXT:    vzeroupper
>         +; AVX2-NEXT:    vpermilpd $1, {{[0-9]+}}(%rsp), %xmm0 #
>         16-byte Folded Reload
>         +; AVX2-NEXT:    # xmm0 = mem[1,0]
>           ; AVX2-NEXT:    callq __truncdfhf2
>           ; AVX2-NEXT:    movl %eax, %r12d
>           ; AVX2-NEXT:    vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload
>
>
>         _______________________________________________
>         llvm-commits mailing list
>         llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161125/5d25f4c7/attachment.html>