<div dir="ltr">Can you file a PR with a reduced reproducer?<div><br><div>I would really like to understand what's going on, not just blindly add a check. I really don't think we should be getting there with an empty Ops, so I'd prefer to fix the issue at its origin - or see a case where having an empty Ops is ok. </div></div><div><br></div><div>(Also, having a test for the change would be nice)</div><div><br></div><div>Thanks,</div><div> Michael</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Nov 25, 2016 at 7:20 AM, Volkan Keles <span dir="ltr"><<a href="mailto:vkeles@apple.com" target="_blank">vkeles@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><br><div><span class=""><blockquote type="cite"><div>On Nov 25, 2016, at 2:50 PM, Michael Kuperstein <<a href="mailto:mkuper@google.com" target="_blank">mkuper@google.com</a>> wrote:</div><br class="m_-9130038467432654211Apple-interchange-newline"><div><div dir="ltr">Hi Volkan,<div><br></div><div>I'll take a look once I'm back from vacation (mid next week), but I don't think we should be passing an empty Ops here.</div></div></div></blockquote><div><br></div></span><div>InlineSpiller::<wbr>foldMemoryOperand(…) doesn’t check if FoldOps is empty, so it is possible. I think we should fix this in either TargetInstrInfo::<wbr>foldMemoryOperand or InlineSpiller::<wbr>foldMemoryOperand.</div><span class=""><div> </div><blockquote type="cite"><div><div dir="ltr"><div>Do you have a test that this fails with?</div></div></div></blockquote><div><br></div></span><div>This change breaks one of our internal tests.</div><br><blockquote type="cite"><div><div dir="ltr"><div><br></div><div>Thanks,</div><div> Michael</div></div></div></blockquote><div><br></div>Thank you,</div><div>Volkan</div><div><div class="h5"><div><br><blockquote type="cite"><div><div dir="ltr"><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 24, 2016 at 7:45 AM, Volkan Keles <span dir="ltr"><<a href="mailto:vkeles@apple.com" target="_blank">vkeles@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Hi Michael,<div><br><div><div><div class="m_-9130038467432654211h5"><blockquote type="cite"><div>On Nov 23, 2016, at 6:33 PM, Michael Kuperstein via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>> wrote:</div><br class="m_-9130038467432654211m_2580039044826008080Apple-interchange-newline"><div><div>Author: mkuper<br>Date: Wed Nov 23 12:33:49 2016<br>New Revision: 287792<br><br>URL: <a href="http://llvm.org/viewvc/llvm-project?rev=287792&view=rev" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject?rev=287792&view=rev</a><br>Log:<br>[X86] Allow folding of stack reloads when loading a subreg of the spilled reg<br><br>We did not support subregs in InlineSpiller:foldMemoryOperan<wbr>d() because targets<br>may not deal with them correctly.<br><br>This adds a target hook to let the spiller know that a target can handle<br>subregs, and actually enables it for x86 for the case of stack slot reloads.<br>This fixes PR30832.<br><br>Differential Revision: <a href="https://reviews.llvm.org/D26521" target="_blank">https://reviews.llvm.org/D2652<wbr>1</a><br><br>Modified:<br> llvm/trunk/include/llvm/Tar<wbr>get/TargetInstrInfo.h<br> llvm/trunk/lib/CodeGen/Inli<wbr>neSpiller.cpp<br> llvm/trunk/lib/CodeGen/Targ<wbr>etInstrInfo.cpp<br> llvm/trunk/lib/Target/X86/X<wbr>86InstrInfo.cpp<br> llvm/trunk/lib/Target/X86/X<wbr>86InstrInfo.h<br> llvm/trunk/test/CodeGen/X86<wbr>/partial-fold32.ll<br> llvm/trunk/test/CodeGen/X86<wbr>/partial-fold64.ll<br> llvm/trunk/test/CodeGen/X86<wbr>/vector-half-conversions.ll<br><br>Modified: llvm/trunk/include/llvm/Target<wbr>/TargetInstrInfo.h<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetInstrInfo.h?rev=287792&r1=287791&r2=287792&view=diff" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/include/llvm/<wbr>Target/TargetInstrInfo.h?rev=<wbr>287792&r1=287791&r2=287792&<wbr>view=diff</a><br>==============================<wbr>==============================<wbr>==================<br>--- llvm/trunk/include/llvm/Target<wbr>/TargetInstrInfo.h (original)<br>+++ llvm/trunk/include/llvm/Target<wbr>/TargetInstrInfo.h Wed Nov 23 12:33:49 2016<br>@@ -817,6 +817,20 @@ public:<br> /// anything was changed.<br> virtual bool expandPostRAPseudo(MachineInst<wbr>r &MI) const { return false; }<br><br>+ /// Check whether the target can fold a load that feeds a subreg operand<br>+ /// (or a subreg operand that feeds a store).<br>+ /// For example, X86 may want to return true if it can fold<br>+ /// movl (%esp), %eax<br>+ /// subb, %al, ...<br>+ /// Into:<br>+ /// subb (%esp), ...<br>+ ///<br>+ /// Ideally, we'd like the target implementation of foldMemoryOperand() to<br>+ /// reject subregs - but since this behavior used to be enforced in the<br>+ /// target-independent code, moving this responsibility to the targets<br>+ /// has the potential of causing nasty silent breakage in out-of-tree targets.<br>+ virtual bool isSubregFoldable() const { return false; }<br>+<br> /// Attempt to fold a load or store of the specified stack<br> /// slot into the specified machine instruction for the specified operand(s).<br> /// If this is possible, a new instruction is returned with the specified<br><br>Modified: llvm/trunk/lib/CodeGen/InlineS<wbr>piller.cpp<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/InlineSpiller.cpp?rev=287792&r1=287791&r2=287792&view=diff" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/CodeGen/<wbr>InlineSpiller.cpp?rev=287792&<wbr>r1=287791&r2=287792&view=diff</a><br>==============================<wbr>==============================<wbr>==================<br>--- llvm/trunk/lib/CodeGen/InlineS<wbr>piller.cpp (original)<br>+++ llvm/trunk/lib/CodeGen/InlineS<wbr>piller.cpp Wed Nov 23 12:33:49 2016<br>@@ -739,9 +739,12 @@ foldMemoryOperand(ArrayRef<std<wbr>::pair<Mac<br> bool WasCopy = MI->isCopy();<br> unsigned ImpReg = 0;<br><br>- bool SpillSubRegs = (MI->getOpcode() == TargetOpcode::STATEPOINT ||<br>- MI->getO<wbr>pcode() == TargetOpcode::PATCHPOINT ||<br>- MI->getO<wbr>pcode() == TargetOpcode::STACKMAP);<br>+ // Spill subregs if the target allows it.<br>+ // We always want to spill subregs for stackmap/patchpoint pseudos.<br>+ bool SpillSubRegs = TII.isSubregFoldable() ||<br>+ MI->getOp<wbr>code() == TargetOpcode::STATEPOINT ||<br>+ MI->getOp<wbr>code() == TargetOpcode::PATCHPOINT ||<br>+ MI->getOp<wbr>code() == TargetOpcode::STACKMAP;<br><br> // TargetInstrInfo::foldMemoryOpe<wbr>rand only expects explicit, non-tied<br> // operands.<br>@@ -754,7 +757,7 @@ foldMemoryOperand(ArrayRef<std<wbr>::pair<Mac<br> ImpReg = MO.getReg();<br> continue;<br> }<br>- // FIXME: Teach targets to deal with subregs.<br>+<br> if (!SpillSubRegs && MO.getSubReg())<br> return false;<br> // We cannot fold a load instruction into a def.<br><br>Modified: llvm/trunk/lib/CodeGen/TargetI<wbr>nstrInfo.cpp<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp?rev=287792&r1=287791&r2=287792&view=diff" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/CodeGen/<wbr>TargetInstrInfo.cpp?rev=<wbr>287792&r1=287791&r2=287792&<wbr>view=diff</a><br>==============================<wbr>==============================<wbr>==================<br>--- llvm/trunk/lib/CodeGen/TargetI<wbr>nstrInfo.cpp (original)<br>+++ llvm/trunk/lib/CodeGen/TargetI<wbr>nstrInfo.cpp Wed Nov 23 12:33:49 2016<br>@@ -515,6 +515,31 @@ MachineInstr *TargetInstrInfo::foldMemor<br> assert(MBB && "foldMemoryOperand needs an inserted instruction");<br> MachineFunction &MF = *MBB->getParent();<br><br>+ // If we're not folding a load into a subreg, the size of the load is the<br>+ // size of the spill slot. But if we are, we need to figure out what the<br>+ // actual load size is.<br>+ int64_t MemSize = 0;<br>+ const MachineFrameInfo &MFI = MF.getFrameInfo();<br>+ const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterI<wbr>nfo();<br>+<br></div></div></blockquote><div><br></div></div></div><div>I think there is a missing check here. If the Ops is empty, MemSize will be 0 and the compiler will hit the assertion below.</div><div>What about:</div><div>+ if ((Flags & MachineMemOperand::MOStore) || Ops.empty()) {<span><br>+ MemSize = MFI.getObjectSize(FI);<br>+ } else {</span></div><div><br></div><div>Can you please look into this?</div><div><div class="m_-9130038467432654211h5"><br><blockquote type="cite"><div><div>+ if (Flags & MachineMemOperand::MOStore) {<br>+ MemSize = MFI.getObjectSize(FI);<br>+ } else {<br>+ for (unsigned Idx : Ops) {<br>+ int64_t OpSize = MFI.getObjectSize(FI);<br>+<br>+ if (auto SubReg = MI.getOperand(Idx).getSubReg()<wbr>) {<br>+ unsigned SubRegSize = TRI->getSubRegIdxSize(SubReg);<br>+ if (SubRegSize > 0 && !(SubRegSize % 8))<br>+ OpSize = SubRegSize / 8;<br>+ }<br>+<br>+ MemSize = std::max(MemSize, OpSize);<br>+ }<br>+ }<br>+<br>+ assert(MemSize && "Did not expect a zero-sized stack slot");<br>+<br> MachineInstr *NewMI = nullptr;<br><br> if (MI.getOpcode() == TargetOpcode::STACKMAP ||<br>@@ -538,10 +563,9 @@ MachineInstr *TargetInstrInfo::foldMemor<br> assert((!(Flags & MachineMemOperand::MOLoad) ||<br> NewMI->mayLoad()) &&<br> "Folded a use to a non-load!");<br>- const MachineFrameInfo &MFI = MF.getFrameInfo();<br> assert(MFI.getObjectOffset<wbr>(FI) != -1);<br> MachineMemOperand *MMO = MF.getMachineMemOperand(<br>- MachinePointerInfo::get<wbr>FixedStack(MF, FI), Flags, MFI.getObjectSize(FI),<br>+ MachinePointerInfo::get<wbr>FixedStack(MF, FI), Flags, MemSize,<br> MFI.getObjectAlignment<wbr>(FI));<br> NewMI->addMemOperand(MF, MMO);<br><br>@@ -558,7 +582,6 @@ MachineInstr *TargetInstrInfo::foldMemor<br><br> const MachineOperand &MO = MI.getOperand(1 - Ops[0]);<br> MachineBasicBlock::iterator Pos = MI;<br>- const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterI<wbr>nfo();<br><br> if (Flags == MachineMemOperand::MOStore)<br> storeRegToStackSlot(*MBB, Pos, MO.getReg(), MO.isKill(), FI, RC, TRI);<br><br>Modified: llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.cpp<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp?rev=287792&r1=287791&r2=287792&view=diff" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/Target/X8<wbr>6/X86InstrInfo.cpp?rev=287792&<wbr>r1=287791&r2=287792&view=diff</a><br>==============================<wbr>==============================<wbr>==================<br>--- llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.cpp (original)<br>+++ llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.cpp Wed Nov 23 12:33:49 2016<br>@@ -6843,6 +6843,14 @@ X86InstrInfo::foldMemoryOperan<wbr>dImpl(Mach<br> if (!MF.getFunction()->optForSize<wbr>() && hasPartialRegUpdate(MI.getOpco<wbr>de()))<br> return nullptr;<br><br>+ // Don't fold subreg spills, or reloads that use a high subreg.<br>+ for (auto Op : Ops) {<br>+ MachineOperand &MO = MI.getOperand(Op);<br>+ auto SubReg = MO.getSubReg();<br>+ if (SubReg && (MO.isDef() || SubReg == X86::sub_8bit_hi))<br>+ return nullptr;<br>+ }<br>+<br> const MachineFrameInfo &MFI = MF.getFrameInfo();<br> unsigned Size = MFI.getObjectSize(FrameIndex);<br> unsigned Alignment = MFI.getObjectAlignment(FrameIn<wbr>dex);<br>@@ -6967,6 +6975,14 @@ MachineInstr *X86InstrInfo::foldMemoryOp<br> MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,<br> MachineBasicBlock::iterato<wbr>r InsertPt, MachineInstr &LoadMI,<br> LiveIntervals *LIS) const {<br>+<br>+ // TODO: Support the case where LoadMI loads a wide register, but MI<br>+ // only uses a subreg.<br>+ for (auto Op : Ops) {<br>+ if (MI.getOperand(Op).getSubReg()<wbr>)<br>+ return nullptr;<br>+ }<br>+<br> // If loading from a FrameIndex, fold directly from the FrameIndex.<br> unsigned NumOps = LoadMI.getDesc().getNumOperand<wbr>s();<br> int FrameIndex;<br><br>Modified: llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.h<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.h?rev=287792&r1=287791&r2=287792&view=diff" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/Target/X8<wbr>6/X86InstrInfo.h?rev=287792&r1<wbr>=287791&r2=287792&view=diff</a><br>==============================<wbr>==============================<wbr>==================<br>--- llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.h (original)<br>+++ llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.h Wed Nov 23 12:33:49 2016<br>@@ -378,6 +378,10 @@ public:<br><br> bool expandPostRAPseudo(MachineInst<wbr>r &MI) const override;<br><br>+ /// Check whether the target can fold a load that feeds a subreg operand<br>+ /// (or a subreg operand that feeds a store).<br>+ bool isSubregFoldable() const override { return true; }<br>+<br> /// foldMemoryOperand - If this target supports it, fold a load or store of<br> /// the specified stack slot into the specified machine instruction for the<br> /// specified operand(s). If this is possible, the target should perform the<br><br>Modified: llvm/trunk/test/CodeGen/X86/pa<wbr>rtial-fold32.ll<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/partial-fold32.ll?rev=287792&r1=287791&r2=287792&view=diff" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/CodeGen/<wbr>X86/partial-fold32.ll?rev=<wbr>287792&r1=287791&r2=287792&<wbr>view=diff</a><br>==============================<wbr>==============================<wbr>==================<br>--- llvm/trunk/test/CodeGen/X86/pa<wbr>rtial-fold32.ll (original)<br>+++ llvm/trunk/test/CodeGen/X86/pa<wbr>rtial-fold32.ll Wed Nov 23 12:33:49 2016<br>@@ -3,8 +3,7 @@<br> define fastcc i8 @fold32to8(i32 %add, i8 %spill) {<br> ; CHECK-LABEL: fold32to8:<br> ; CHECK: movl %ecx, (%esp) # 4-byte Spill<br>-; CHECK: movl (%esp), %eax # 4-byte Reload<br>-; CHECK: subb %al, %dl<br>+; CHECK: subb (%esp), %dl # 1-byte Folded Reload<br> entry:<br> tail call void asm sideeffect "", "~{eax},~{ebx},~{ecx},~{edi},~<wbr>{esi},~{ebp},~{dirflag},~{fpsr<wbr>},~{flags}"()<br> %trunc = trunc i32 %add to i8<br><br>Modified: llvm/trunk/test/CodeGen/X86/pa<wbr>rtial-fold64.ll<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/partial-fold64.ll?rev=287792&r1=287791&r2=287792&view=diff" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/CodeGen/<wbr>X86/partial-fold64.ll?rev=<wbr>287792&r1=287791&r2=287792&<wbr>view=diff</a><br>==============================<wbr>==============================<wbr>==================<br>--- llvm/trunk/test/CodeGen/X86/pa<wbr>rtial-fold64.ll (original)<br>+++ llvm/trunk/test/CodeGen/X86/pa<wbr>rtial-fold64.ll Wed Nov 23 12:33:49 2016<br>@@ -3,8 +3,7 @@<br> define i32 @fold64to32(i64 %add, i32 %spill) {<br> ; CHECK-LABEL: fold64to32:<br> ; CHECK: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill<br>-; CHECK: movq -{{[0-9]+}}(%rsp), %rax # 8-byte Reload<br>-; CHECK: subl %eax, %esi<br>+; CHECK: subl -{{[0-9]+}}(%rsp), %esi # 4-byte Folded Reload<br> entry:<br> tail call void asm sideeffect "", "~{rax},~{rbx},~{rcx},~{rdx},~<wbr>{rdi},~{rbp},~{r8},~{r9},~{r10<wbr>},~{r11},~{r12},~{r13},~{r14},<wbr>~{r15},~{dirflag},~{fpsr},~{<wbr>flags}"()<br> %trunc = trunc i64 %add to i32<br>@@ -15,8 +14,7 @@ entry:<br> define i8 @fold64to8(i64 %add, i8 %spill) {<br> ; CHECK-LABEL: fold64to8:<br> ; CHECK: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill<br>-; CHECK: movq -{{[0-9]+}}(%rsp), %rax # 8-byte Reload<br>-; CHECK: subb %al, %sil<br>+; CHECK: subb -{{[0-9]+}}(%rsp), %sil # 1-byte Folded Reload<br> entry:<br> tail call void asm sideeffect "", "~{rax},~{rbx},~{rcx},~{rdx},~<wbr>{rdi},~{rbp},~{r8},~{r9},~{r10<wbr>},~{r11},~{r12},~{r13},~{r14},<wbr>~{r15},~{dirflag},~{fpsr},~{<wbr>flags}"()<br> %trunc = trunc i64 %add to i8<br><br>Modified: llvm/trunk/test/CodeGen/X86/ve<wbr>ctor-half-conversions.ll<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll?rev=287792&r1=287791&r2=287792&view=diff" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/CodeGen/<wbr>X86/vector-half-conversions.<wbr>ll?rev=287792&r1=287791&r2=<wbr>287792&view=diff</a><br>==============================<wbr>==============================<wbr>==================<br>--- llvm/trunk/test/CodeGen/X86/ve<wbr>ctor-half-conversions.ll (original)<br>+++ llvm/trunk/test/CodeGen/X86/ve<wbr>ctor-half-conversions.ll Wed Nov 23 12:33:49 2016<br>@@ -4788,9 +4788,8 @@ define <8 x i16> @cvt_8f64_to_8i16(<8 x<br> ; AVX1-NEXT: orl %ebx, %r14d<br> ; AVX1-NEXT: shlq $32, %r14<br> ; AVX1-NEXT: orq %r15, %r14<br>-; AVX1-NEXT: vmovupd (%rsp), %ymm0 # 32-byte Reload<br>-; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]<br>-; AVX1-NEXT: vzeroupper<br>+; AVX1-NEXT: vpermilpd $1, (%rsp), %xmm0 # 16-byte Folded Reload<br>+; AVX1-NEXT: # xmm0 = mem[1,0]<br> ; AVX1-NEXT: callq __truncdfhf2<br> ; AVX1-NEXT: movw %ax, %bx<br> ; AVX1-NEXT: shll $16, %ebx<br>@@ -4856,9 +4855,8 @@ define <8 x i16> @cvt_8f64_to_8i16(<8 x<br> ; AVX2-NEXT: orl %ebx, %r14d<br> ; AVX2-NEXT: shlq $32, %r14<br> ; AVX2-NEXT: orq %r15, %r14<br>-; AVX2-NEXT: vmovupd (%rsp), %ymm0 # 32-byte Reload<br>-; AVX2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]<br>-; AVX2-NEXT: vzeroupper<br>+; AVX2-NEXT: vpermilpd $1, (%rsp), %xmm0 # 16-byte Folded Reload<br>+; AVX2-NEXT: # xmm0 = mem[1,0]<br> ; AVX2-NEXT: callq __truncdfhf2<br> ; AVX2-NEXT: movw %ax, %bx<br> ; AVX2-NEXT: shll $16, %ebx<br>@@ -5585,9 +5583,8 @@ define void @store_cvt_8f64_to_8i16(<8 x<br> ; AVX1-NEXT: vzeroupper<br> ; AVX1-NEXT: callq __truncdfhf2<br> ; AVX1-NEXT: movw %ax, {{[0-9]+}}(%rsp) # 2-byte Spill<br>-; AVX1-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload<br>-; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]<br>-; AVX1-NEXT: vzeroupper<br>+; AVX1-NEXT: vpermilpd $1, {{[0-9]+}}(%rsp), %xmm0 # 16-byte Folded Reload<br>+; AVX1-NEXT: # xmm0 = mem[1,0]<br> ; AVX1-NEXT: callq __truncdfhf2<br> ; AVX1-NEXT: movl %eax, %r12d<br> ; AVX1-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload<br>@@ -5654,9 +5651,8 @@ define void @store_cvt_8f64_to_8i16(<8 x<br> ; AVX2-NEXT: vzeroupper<br> ; AVX2-NEXT: callq __truncdfhf2<br> ; AVX2-NEXT: movw %ax, {{[0-9]+}}(%rsp) # 2-byte Spill<br>-; AVX2-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload<br>-; AVX2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]<br>-; AVX2-NEXT: vzeroupper<br>+; AVX2-NEXT: vpermilpd $1, {{[0-9]+}}(%rsp), %xmm0 # 16-byte Folded Reload<br>+; AVX2-NEXT: # xmm0 = mem[1,0]<br> ; AVX2-NEXT: callq __truncdfhf2<br> ; AVX2-NEXT: movl %eax, %r12d<br> ; AVX2-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload<br><br><br>______________________________<wbr>_________________<br>llvm-commits mailing list<br><a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br></div></div></blockquote><br></div></div></div><div>Thank you,</div><div>Volkan</div><br></div></div></blockquote></div><br></div>
</div></blockquote></div><br></div></div></div></blockquote></div><br></div>