<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Michael,<div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Nov 23, 2016, at 6:33 PM, Michael Kuperstein via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" class="">llvm-commits@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">Author: mkuper<br class="">Date: Wed Nov 23 12:33:49 2016<br class="">New Revision: 287792<br class=""><br class="">URL: <a href="http://llvm.org/viewvc/llvm-project?rev=287792&view=rev" class="">http://llvm.org/viewvc/llvm-project?rev=287792&view=rev</a><br class="">Log:<br class="">[X86] Allow folding of stack reloads when loading a subreg of the spilled reg<br class=""><br class="">We did not support subregs in InlineSpiller:foldMemoryOperand() because targets<br class="">may not deal with them correctly.<br class=""><br class="">This adds a target hook to let the spiller know that a target can handle<br class="">subregs, and actually enables it for x86 for the case of stack slot reloads.<br class="">This fixes PR30832.<br class=""><br class="">Differential Revision: <a href="https://reviews.llvm.org/D26521" class="">https://reviews.llvm.org/D26521</a><br class=""><br class="">Modified:<br class=""> llvm/trunk/include/llvm/Target/TargetInstrInfo.h<br class=""> llvm/trunk/lib/CodeGen/InlineSpiller.cpp<br class=""> llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp<br class=""> llvm/trunk/lib/Target/X86/X86InstrInfo.cpp<br class=""> llvm/trunk/lib/Target/X86/X86InstrInfo.h<br class=""> llvm/trunk/test/CodeGen/X86/partial-fold32.ll<br class=""> llvm/trunk/test/CodeGen/X86/partial-fold64.ll<br class=""> llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll<br class=""><br class="">Modified: llvm/trunk/include/llvm/Target/TargetInstrInfo.h<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetInstrInfo.h?rev=287792&r1=287791&r2=287792&view=diff" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetInstrInfo.h?rev=287792&r1=287791&r2=287792&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/include/llvm/Target/TargetInstrInfo.h (original)<br class="">+++ llvm/trunk/include/llvm/Target/TargetInstrInfo.h Wed Nov 23 12:33:49 2016<br class="">@@ -817,6 +817,20 @@ public:<br class=""> /// anything was changed.<br class=""> virtual bool expandPostRAPseudo(MachineInstr &MI) const { return false; }<br class=""><br class="">+ /// Check whether the target can fold a load that feeds a subreg operand<br class="">+ /// (or a subreg operand that feeds a store).<br class="">+ /// For example, X86 may want to return true if it can fold<br class="">+ /// movl (%esp), %eax<br class="">+ /// subb, %al, ...<br class="">+ /// Into:<br class="">+ /// subb (%esp), ...<br class="">+ ///<br class="">+ /// Ideally, we'd like the target implementation of foldMemoryOperand() to<br class="">+ /// reject subregs - but since this behavior used to be enforced in the<br class="">+ /// target-independent code, moving this responsibility to the targets<br class="">+ /// has the potential of causing nasty silent breakage in out-of-tree targets.<br class="">+ virtual bool isSubregFoldable() const { return false; }<br class="">+<br class=""> /// Attempt to fold a load or store of the specified stack<br class=""> /// slot into the specified machine instruction for the specified operand(s).<br class=""> /// If this is possible, a new instruction is returned with the specified<br class=""><br class="">Modified: llvm/trunk/lib/CodeGen/InlineSpiller.cpp<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/InlineSpiller.cpp?rev=287792&r1=287791&r2=287792&view=diff" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/InlineSpiller.cpp?rev=287792&r1=287791&r2=287792&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/lib/CodeGen/InlineSpiller.cpp (original)<br class="">+++ llvm/trunk/lib/CodeGen/InlineSpiller.cpp Wed Nov 23 12:33:49 2016<br class="">@@ -739,9 +739,12 @@ foldMemoryOperand(ArrayRef<std::pair<Mac<br class=""> bool WasCopy = MI->isCopy();<br class=""> unsigned ImpReg = 0;<br class=""><br class="">- bool SpillSubRegs = (MI->getOpcode() == TargetOpcode::STATEPOINT ||<br class="">- MI->getOpcode() == TargetOpcode::PATCHPOINT ||<br class="">- MI->getOpcode() == TargetOpcode::STACKMAP);<br class="">+ // Spill subregs if the target allows it.<br class="">+ // We always want to spill subregs for stackmap/patchpoint pseudos.<br class="">+ bool SpillSubRegs = TII.isSubregFoldable() ||<br class="">+ MI->getOpcode() == TargetOpcode::STATEPOINT ||<br class="">+ MI->getOpcode() == TargetOpcode::PATCHPOINT ||<br class="">+ MI->getOpcode() == TargetOpcode::STACKMAP;<br class=""><br class=""> // TargetInstrInfo::foldMemoryOperand only expects explicit, non-tied<br class=""> // operands.<br class="">@@ -754,7 +757,7 @@ foldMemoryOperand(ArrayRef<std::pair<Mac<br class=""> ImpReg = MO.getReg();<br class=""> continue;<br class=""> }<br class="">- // FIXME: Teach targets to deal with subregs.<br class="">+<br class=""> if (!SpillSubRegs && MO.getSubReg())<br class=""> return false;<br class=""> // We cannot fold a load instruction into a def.<br class=""><br class="">Modified: llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp?rev=287792&r1=287791&r2=287792&view=diff" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp?rev=287792&r1=287791&r2=287792&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp (original)<br class="">+++ llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp Wed Nov 23 12:33:49 2016<br class="">@@ -515,6 +515,31 @@ MachineInstr *TargetInstrInfo::foldMemor<br class=""> assert(MBB && "foldMemoryOperand needs an inserted instruction");<br class=""> MachineFunction &MF = *MBB->getParent();<br class=""><br class="">+ // If we're not folding a load into a subreg, the size of the load is the<br class="">+ // size of the spill slot. But if we are, we need to figure out what the<br class="">+ // actual load size is.<br class="">+ int64_t MemSize = 0;<br class="">+ const MachineFrameInfo &MFI = MF.getFrameInfo();<br class="">+ const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();<br class="">+<br class=""></div></div></blockquote><div><br class=""></div><div>I think there is a missing check here. If the Ops is empty, MemSize will be 0 and the compiler will hit the assertion below.</div><div>What about:</div><div>+ if ((Flags & MachineMemOperand::MOStore) || Ops.empty()) {<br class="">+ MemSize = MFI.getObjectSize(FI);<br class="">+ } else {</div><div><br class=""></div><div>Can you please look into this?</div><br class=""><blockquote type="cite" class=""><div class=""><div class="">+ if (Flags & MachineMemOperand::MOStore) {<br class="">+ MemSize = MFI.getObjectSize(FI);<br class="">+ } else {<br class="">+ for (unsigned Idx : Ops) {<br class="">+ int64_t OpSize = MFI.getObjectSize(FI);<br class="">+<br class="">+ if (auto SubReg = MI.getOperand(Idx).getSubReg()) {<br class="">+ unsigned SubRegSize = TRI->getSubRegIdxSize(SubReg);<br class="">+ if (SubRegSize > 0 && !(SubRegSize % 8))<br class="">+ OpSize = SubRegSize / 8;<br class="">+ }<br class="">+<br class="">+ MemSize = std::max(MemSize, OpSize);<br class="">+ }<br class="">+ }<br class="">+<br class="">+ assert(MemSize && "Did not expect a zero-sized stack slot");<br class="">+<br class=""> MachineInstr *NewMI = nullptr;<br class=""><br class=""> if (MI.getOpcode() == TargetOpcode::STACKMAP ||<br class="">@@ -538,10 +563,9 @@ MachineInstr *TargetInstrInfo::foldMemor<br class=""> assert((!(Flags & MachineMemOperand::MOLoad) ||<br class=""> NewMI->mayLoad()) &&<br class=""> "Folded a use to a non-load!");<br class="">- const MachineFrameInfo &MFI = MF.getFrameInfo();<br class=""> assert(MFI.getObjectOffset(FI) != -1);<br class=""> MachineMemOperand *MMO = MF.getMachineMemOperand(<br class="">- MachinePointerInfo::getFixedStack(MF, FI), Flags, MFI.getObjectSize(FI),<br class="">+ MachinePointerInfo::getFixedStack(MF, FI), Flags, MemSize,<br class=""> MFI.getObjectAlignment(FI));<br class=""> NewMI->addMemOperand(MF, MMO);<br class=""><br class="">@@ -558,7 +582,6 @@ MachineInstr *TargetInstrInfo::foldMemor<br class=""><br class=""> const MachineOperand &MO = MI.getOperand(1 - Ops[0]);<br class=""> MachineBasicBlock::iterator Pos = MI;<br class="">- const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();<br class=""><br class=""> if (Flags == MachineMemOperand::MOStore)<br class=""> storeRegToStackSlot(*MBB, Pos, MO.getReg(), MO.isKill(), FI, RC, TRI);<br class=""><br class="">Modified: llvm/trunk/lib/Target/X86/X86InstrInfo.cpp<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp?rev=287792&r1=287791&r2=287792&view=diff" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp?rev=287792&r1=287791&r2=287792&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/lib/Target/X86/X86InstrInfo.cpp (original)<br class="">+++ llvm/trunk/lib/Target/X86/X86InstrInfo.cpp Wed Nov 23 12:33:49 2016<br class="">@@ -6843,6 +6843,14 @@ X86InstrInfo::foldMemoryOperandImpl(Mach<br class=""> if (!MF.getFunction()->optForSize() && hasPartialRegUpdate(MI.getOpcode()))<br class=""> return nullptr;<br class=""><br class="">+ // Don't fold subreg spills, or reloads that use a high subreg.<br class="">+ for (auto Op : Ops) {<br class="">+ MachineOperand &MO = MI.getOperand(Op);<br class="">+ auto SubReg = MO.getSubReg();<br class="">+ if (SubReg && (MO.isDef() || SubReg == X86::sub_8bit_hi))<br class="">+ return nullptr;<br class="">+ }<br class="">+<br class=""> const MachineFrameInfo &MFI = MF.getFrameInfo();<br class=""> unsigned Size = MFI.getObjectSize(FrameIndex);<br class=""> unsigned Alignment = MFI.getObjectAlignment(FrameIndex);<br class="">@@ -6967,6 +6975,14 @@ MachineInstr *X86InstrInfo::foldMemoryOp<br class=""> MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,<br class=""> MachineBasicBlock::iterator InsertPt, MachineInstr &LoadMI,<br class=""> LiveIntervals *LIS) const {<br class="">+<br class="">+ // TODO: Support the case where LoadMI loads a wide register, but MI<br class="">+ // only uses a subreg.<br class="">+ for (auto Op : Ops) {<br class="">+ if (MI.getOperand(Op).getSubReg())<br class="">+ return nullptr;<br class="">+ }<br class="">+<br class=""> // If loading from a FrameIndex, fold directly from the FrameIndex.<br class=""> unsigned NumOps = LoadMI.getDesc().getNumOperands();<br class=""> int FrameIndex;<br class=""><br class="">Modified: llvm/trunk/lib/Target/X86/X86InstrInfo.h<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.h?rev=287792&r1=287791&r2=287792&view=diff" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.h?rev=287792&r1=287791&r2=287792&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/lib/Target/X86/X86InstrInfo.h (original)<br class="">+++ llvm/trunk/lib/Target/X86/X86InstrInfo.h Wed Nov 23 12:33:49 2016<br class="">@@ -378,6 +378,10 @@ public:<br class=""><br class=""> bool expandPostRAPseudo(MachineInstr &MI) const override;<br class=""><br class="">+ /// Check whether the target can fold a load that feeds a subreg operand<br class="">+ /// (or a subreg operand that feeds a store).<br class="">+ bool isSubregFoldable() const override { return true; }<br class="">+<br class=""> /// foldMemoryOperand - If this target supports it, fold a load or store of<br class=""> /// the specified stack slot into the specified machine instruction for the<br class=""> /// specified operand(s). If this is possible, the target should perform the<br class=""><br class="">Modified: llvm/trunk/test/CodeGen/X86/partial-fold32.ll<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/partial-fold32.ll?rev=287792&r1=287791&r2=287792&view=diff" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/partial-fold32.ll?rev=287792&r1=287791&r2=287792&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/test/CodeGen/X86/partial-fold32.ll (original)<br class="">+++ llvm/trunk/test/CodeGen/X86/partial-fold32.ll Wed Nov 23 12:33:49 2016<br class="">@@ -3,8 +3,7 @@<br class=""> define fastcc i8 @fold32to8(i32 %add, i8 %spill) {<br class=""> ; CHECK-LABEL: fold32to8:<br class=""> ; CHECK: movl %ecx, (%esp) # 4-byte Spill<br class="">-; CHECK: movl (%esp), %eax # 4-byte Reload<br class="">-; CHECK: subb %al, %dl<br class="">+; CHECK: subb (%esp), %dl # 1-byte Folded Reload<br class=""> entry:<br class=""> tail call void asm sideeffect "", "~{eax},~{ebx},~{ecx},~{edi},~{esi},~{ebp},~{dirflag},~{fpsr},~{flags}"()<br class=""> %trunc = trunc i32 %add to i8<br class=""><br class="">Modified: llvm/trunk/test/CodeGen/X86/partial-fold64.ll<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/partial-fold64.ll?rev=287792&r1=287791&r2=287792&view=diff" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/partial-fold64.ll?rev=287792&r1=287791&r2=287792&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/test/CodeGen/X86/partial-fold64.ll (original)<br class="">+++ llvm/trunk/test/CodeGen/X86/partial-fold64.ll Wed Nov 23 12:33:49 2016<br class="">@@ -3,8 +3,7 @@<br class=""> define i32 @fold64to32(i64 %add, i32 %spill) {<br class=""> ; CHECK-LABEL: fold64to32:<br class=""> ; CHECK: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill<br class="">-; CHECK: movq -{{[0-9]+}}(%rsp), %rax # 8-byte Reload<br class="">-; CHECK: subl %eax, %esi<br class="">+; CHECK: subl -{{[0-9]+}}(%rsp), %esi # 4-byte Folded Reload<br class=""> entry:<br class=""> tail call void asm sideeffect "", "~{rax},~{rbx},~{rcx},~{rdx},~{rdi},~{rbp},~{r8},~{r9},~{r10},~{r11},~{r12},~{r13},~{r14},~{r15},~{dirflag},~{fpsr},~{flags}"()<br class=""> %trunc = trunc i64 %add to i32<br class="">@@ -15,8 +14,7 @@ entry:<br class=""> define i8 @fold64to8(i64 %add, i8 %spill) {<br class=""> ; CHECK-LABEL: fold64to8:<br class=""> ; CHECK: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill<br class="">-; CHECK: movq -{{[0-9]+}}(%rsp), %rax # 8-byte Reload<br class="">-; CHECK: subb %al, %sil<br class="">+; CHECK: subb -{{[0-9]+}}(%rsp), %sil # 1-byte Folded Reload<br class=""> entry:<br class=""> tail call void asm sideeffect "", "~{rax},~{rbx},~{rcx},~{rdx},~{rdi},~{rbp},~{r8},~{r9},~{r10},~{r11},~{r12},~{r13},~{r14},~{r15},~{dirflag},~{fpsr},~{flags}"()<br class=""> %trunc = trunc i64 %add to i8<br class=""><br class="">Modified: llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll<br class="">URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll?rev=287792&r1=287791&r2=287792&view=diff" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll?rev=287792&r1=287791&r2=287792&view=diff</a><br class="">==============================================================================<br class="">--- llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll (original)<br class="">+++ llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll Wed Nov 23 12:33:49 2016<br class="">@@ -4788,9 +4788,8 @@ define <8 x i16> @cvt_8f64_to_8i16(<8 x<br class=""> ; AVX1-NEXT: orl %ebx, %r14d<br class=""> ; AVX1-NEXT: shlq $32, %r14<br class=""> ; AVX1-NEXT: orq %r15, %r14<br class="">-; AVX1-NEXT: vmovupd (%rsp), %ymm0 # 32-byte Reload<br class="">-; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]<br class="">-; AVX1-NEXT: vzeroupper<br class="">+; AVX1-NEXT: vpermilpd $1, (%rsp), %xmm0 # 16-byte Folded Reload<br class="">+; AVX1-NEXT: # xmm0 = mem[1,0]<br class=""> ; AVX1-NEXT: callq __truncdfhf2<br class=""> ; AVX1-NEXT: movw %ax, %bx<br class=""> ; AVX1-NEXT: shll $16, %ebx<br class="">@@ -4856,9 +4855,8 @@ define <8 x i16> @cvt_8f64_to_8i16(<8 x<br class=""> ; AVX2-NEXT: orl %ebx, %r14d<br class=""> ; AVX2-NEXT: shlq $32, %r14<br class=""> ; AVX2-NEXT: orq %r15, %r14<br class="">-; AVX2-NEXT: vmovupd (%rsp), %ymm0 # 32-byte Reload<br class="">-; AVX2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]<br class="">-; AVX2-NEXT: vzeroupper<br class="">+; AVX2-NEXT: vpermilpd $1, (%rsp), %xmm0 # 16-byte Folded Reload<br class="">+; AVX2-NEXT: # xmm0 = mem[1,0]<br class=""> ; AVX2-NEXT: callq __truncdfhf2<br class=""> ; AVX2-NEXT: movw %ax, %bx<br class=""> ; AVX2-NEXT: shll $16, %ebx<br class="">@@ -5585,9 +5583,8 @@ define void @store_cvt_8f64_to_8i16(<8 x<br class=""> ; AVX1-NEXT: vzeroupper<br class=""> ; AVX1-NEXT: callq __truncdfhf2<br class=""> ; AVX1-NEXT: movw %ax, {{[0-9]+}}(%rsp) # 2-byte Spill<br class="">-; AVX1-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload<br class="">-; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]<br class="">-; AVX1-NEXT: vzeroupper<br class="">+; AVX1-NEXT: vpermilpd $1, {{[0-9]+}}(%rsp), %xmm0 # 16-byte Folded Reload<br class="">+; AVX1-NEXT: # xmm0 = mem[1,0]<br class=""> ; AVX1-NEXT: callq __truncdfhf2<br class=""> ; AVX1-NEXT: movl %eax, %r12d<br class=""> ; AVX1-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload<br class="">@@ -5654,9 +5651,8 @@ define void @store_cvt_8f64_to_8i16(<8 x<br class=""> ; AVX2-NEXT: vzeroupper<br class=""> ; AVX2-NEXT: callq __truncdfhf2<br class=""> ; AVX2-NEXT: movw %ax, {{[0-9]+}}(%rsp) # 2-byte Spill<br class="">-; AVX2-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload<br class="">-; AVX2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]<br class="">-; AVX2-NEXT: vzeroupper<br class="">+; AVX2-NEXT: vpermilpd $1, {{[0-9]+}}(%rsp), %xmm0 # 16-byte Folded Reload<br class="">+; AVX2-NEXT: # xmm0 = mem[1,0]<br class=""> ; AVX2-NEXT: callq __truncdfhf2<br class=""> ; AVX2-NEXT: movl %eax, %r12d<br class=""> ; AVX2-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload<br class=""><br class=""><br class="">_______________________________________________<br class="">llvm-commits mailing list<br class=""><a href="mailto:llvm-commits@lists.llvm.org" class="">llvm-commits@lists.llvm.org</a><br class=""><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br class=""></div></div></blockquote><br class=""></div><div>Thank you,</div><div>Volkan</div><br class=""></div></body></html>