<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 11/25/2016 06:55 AM, Michael
Kuperstein wrote:<br>
</div>
<blockquote
cite="mid:CAL_y90mux5wikbA4xHwN1Cb8-e+NnTCdbPvFZrhBWF1-wY8aSA@mail.gmail.com"
type="cite">
<div dir="ltr">So, I had this exact discussion with Matthias on
the review thread, and on IRC.</div>
</blockquote>
Ah, glad to know it got discussed. That was my primary concern.
I'll share my 2cts below, but that's just for the record, not
because I'm asking for changes.<br>
<blockquote
cite="mid:CAL_y90mux5wikbA4xHwN1Cb8-e+NnTCdbPvFZrhBWF1-wY8aSA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>The problem isn't in-tree targets, it's out-of-tree
targets. Now, generally speaking, breaking out-of-tree targets
is fine, but in this case, I think it's a particularly nasty
kind of break - it's a change that silently relaxes an API
invariant. And the way the breakage would manifest is by
creating nasty-to-debug miscompiles.</div>
<div>I'd really rather not be *that* hostile to downstream.</div>
</div>
</blockquote>
Honestly, this really feels like the wrong tradeoff to me. We
shouldn't be taking code complexity upstream to prevent possible
problems in downstream out of tree backends. We should give notice
of potentially breaking changes (llvm-dev email, release notes,
etc..), but the maintenance responsibility for the out of tree code
should lie on the out of tree users. Beyond the obvious goal of
avoiding confusing complexity upstream, this is one of our main
incentive mechanisms for out of tree users to follow ToT and
eventually become upstream contributors. <br>
<br>
One possible middle ground would be to offer the callback (with the
safe default) for a limited migration period. Explicitly document
the callback as only being present in one release, update the
release documents to clearly state the required migration, and
remove the callback one day after the next release has landed. This
would give a softer migration period without accumulating technical
complexity long term. <br>
<blockquote
cite="mid:CAL_y90mux5wikbA4xHwN1Cb8-e+NnTCdbPvFZrhBWF1-wY8aSA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>We could make the break less silent by changing the
foldMemoryOperand API in a way that'll break in compile time,
but it's really not clear it's worth it.</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Nov 24, 2016 at 7:50 PM, Philip
Reames <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">On 11/23/2016 10:33 AM, Michael Kuperstein via
llvm-commits wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
Author: mkuper<br>
Date: Wed Nov 23 12:33:49 2016<br>
New Revision: 287792<br>
<br>
URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project?rev=287792&view=rev"
rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject?rev=287792&view=rev</a><br>
Log:<br>
[X86] Allow folding of stack reloads when loading a
subreg of the spilled reg<br>
<br>
We did not support subregs in
InlineSpiller:foldMemoryOperan<wbr>d() because targets<br>
may not deal with them correctly.<br>
<br>
This adds a target hook to let the spiller know that a
target can handle<br>
subregs, and actually enables it for x86 for the case of
stack slot reloads.<br>
This fixes PR30832.<br>
</blockquote>
</span>
This feels like a weird design. If I remember correctly,
foldMemoryOperand is allowed to do nothing if it doesn't
know how to fold. Given this, why not just update the in
tree targets to check for a sub-reg load and bail out? Why
do we need yet another target hook?
<div class="HOEnZb">
<div class="h5"><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Differential Revision: <a moz-do-not-send="true"
href="https://reviews.llvm.org/D26521"
rel="noreferrer" target="_blank">https://reviews.llvm.org/D2652<wbr>1</a><br>
<br>
Modified:<br>
llvm/trunk/include/llvm/Targe<wbr>t/TargetInstrInfo.h<br>
llvm/trunk/lib/CodeGen/Inline<wbr>Spiller.cpp<br>
llvm/trunk/lib/CodeGen/Target<wbr>InstrInfo.cpp<br>
llvm/trunk/lib/Target/X86/X86<wbr>InstrInfo.cpp<br>
llvm/trunk/lib/Target/X86/X86<wbr>InstrInfo.h<br>
llvm/trunk/test/CodeGen/X86/p<wbr>artial-fold32.ll<br>
llvm/trunk/test/CodeGen/X86/p<wbr>artial-fold64.ll<br>
llvm/trunk/test/CodeGen/X86/v<wbr>ector-half-conversions.ll<br>
<br>
Modified: llvm/trunk/include/llvm/Target<wbr>/TargetInstrInfo.h<br>
URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetInstrInfo.h?rev=287792&r1=287791&r2=287792&view=diff"
rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/include/llvm/<wbr>Target/TargetInstrInfo.h?rev=<wbr>287792&r1=287791&r2=287792&<wbr>view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/include/llvm/Target<wbr>/TargetInstrInfo.h
(original)<br>
+++ llvm/trunk/include/llvm/Target<wbr>/TargetInstrInfo.h
Wed Nov 23 12:33:49 2016<br>
@@ -817,6 +817,20 @@ public:<br>
/// anything was changed.<br>
virtual bool expandPostRAPseudo(MachineInst<wbr>r
&MI) const { return false; }<br>
+ /// Check whether the target can fold a load that
feeds a subreg operand<br>
+ /// (or a subreg operand that feeds a store).<br>
+ /// For example, X86 may want to return true if it
can fold<br>
+ /// movl (%esp), %eax<br>
+ /// subb, %al, ...<br>
+ /// Into:<br>
+ /// subb (%esp), ...<br>
+ ///<br>
+ /// Ideally, we'd like the target implementation of
foldMemoryOperand() to<br>
+ /// reject subregs - but since this behavior used
to be enforced in the<br>
+ /// target-independent code, moving this
responsibility to the targets<br>
+ /// has the potential of causing nasty silent
breakage in out-of-tree targets.<br>
+ virtual bool isSubregFoldable() const { return
false; }<br>
+<br>
/// Attempt to fold a load or store of the
specified stack<br>
/// slot into the specified machine instruction
for the specified operand(s).<br>
/// If this is possible, a new instruction is
returned with the specified<br>
<br>
Modified: llvm/trunk/lib/CodeGen/InlineS<wbr>piller.cpp<br>
URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/InlineSpiller.cpp?rev=287792&r1=287791&r2=287792&view=diff"
rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/CodeGen/<wbr>InlineSpiller.cpp?rev=287792&<wbr>r1=287791&r2=287792&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/CodeGen/InlineS<wbr>piller.cpp
(original)<br>
+++ llvm/trunk/lib/CodeGen/InlineS<wbr>piller.cpp Wed
Nov 23 12:33:49 2016<br>
@@ -739,9 +739,12 @@ foldMemoryOperand(ArrayRef<std<wbr>::pair<Mac<br>
bool WasCopy = MI->isCopy();<br>
unsigned ImpReg = 0;<br>
- bool SpillSubRegs = (MI->getOpcode() ==
TargetOpcode::STATEPOINT ||<br>
- MI->getOpcode() ==
TargetOpcode::PATCHPOINT ||<br>
- MI->getOpcode() ==
TargetOpcode::STACKMAP);<br>
+ // Spill subregs if the target allows it.<br>
+ // We always want to spill subregs for
stackmap/patchpoint pseudos.<br>
+ bool SpillSubRegs = TII.isSubregFoldable() ||<br>
+ MI->getOpcode() ==
TargetOpcode::STATEPOINT ||<br>
+ MI->getOpcode() ==
TargetOpcode::PATCHPOINT ||<br>
+ MI->getOpcode() ==
TargetOpcode::STACKMAP;<br>
// TargetInstrInfo::foldMemoryOpe<wbr>rand only
expects explicit, non-tied<br>
// operands.<br>
@@ -754,7 +757,7 @@ foldMemoryOperand(ArrayRef<std<wbr>::pair<Mac<br>
ImpReg = MO.getReg();<br>
continue;<br>
}<br>
- // FIXME: Teach targets to deal with subregs.<br>
+<br>
if (!SpillSubRegs && MO.getSubReg())<br>
return false;<br>
// We cannot fold a load instruction into a def.<br>
<br>
Modified: llvm/trunk/lib/CodeGen/TargetI<wbr>nstrInfo.cpp<br>
URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetInstrInfo.cpp?rev=287792&r1=287791&r2=287792&view=diff"
rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/CodeGen/<wbr>TargetInstrInfo.cpp?rev=<wbr>287792&r1=287791&r2=287792&<wbr>view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/CodeGen/TargetI<wbr>nstrInfo.cpp
(original)<br>
+++ llvm/trunk/lib/CodeGen/TargetI<wbr>nstrInfo.cpp
Wed Nov 23 12:33:49 2016<br>
@@ -515,6 +515,31 @@ MachineInstr
*TargetInstrInfo::foldMemor<br>
assert(MBB && "foldMemoryOperand needs an
inserted instruction");<br>
MachineFunction &MF = *MBB->getParent();<br>
+ // If we're not folding a load into a subreg, the
size of the load is the<br>
+ // size of the spill slot. But if we are, we need
to figure out what the<br>
+ // actual load size is.<br>
+ int64_t MemSize = 0;<br>
+ const MachineFrameInfo &MFI =
MF.getFrameInfo();<br>
+ const TargetRegisterInfo *TRI =
MF.getSubtarget().getRegisterI<wbr>nfo();<br>
+<br>
+ if (Flags & MachineMemOperand::MOStore) {<br>
+ MemSize = MFI.getObjectSize(FI);<br>
+ } else {<br>
+ for (unsigned Idx : Ops) {<br>
+ int64_t OpSize = MFI.getObjectSize(FI);<br>
+<br>
+ if (auto SubReg =
MI.getOperand(Idx).getSubReg()<wbr>) {<br>
+ unsigned SubRegSize =
TRI->getSubRegIdxSize(SubReg);<br>
+ if (SubRegSize > 0 && !(SubRegSize
% 8))<br>
+ OpSize = SubRegSize / 8;<br>
+ }<br>
+<br>
+ MemSize = std::max(MemSize, OpSize);<br>
+ }<br>
+ }<br>
+<br>
+ assert(MemSize && "Did not expect a
zero-sized stack slot");<br>
+<br>
MachineInstr *NewMI = nullptr;<br>
if (MI.getOpcode() == TargetOpcode::STACKMAP ||<br>
@@ -538,10 +563,9 @@ MachineInstr
*TargetInstrInfo::foldMemor<br>
assert((!(Flags & MachineMemOperand::MOLoad)
||<br>
NewMI->mayLoad()) &&<br>
"Folded a use to a non-load!");<br>
- const MachineFrameInfo &MFI =
MF.getFrameInfo();<br>
assert(MFI.getObjectOffset(FI) != -1);<br>
MachineMemOperand *MMO =
MF.getMachineMemOperand(<br>
- MachinePointerInfo::getFixedSt<wbr>ack(MF,
FI), Flags, MFI.getObjectSize(FI),<br>
+ MachinePointerInfo::getFixedSt<wbr>ack(MF,
FI), Flags, MemSize,<br>
MFI.getObjectAlignment(FI));<br>
NewMI->addMemOperand(MF, MMO);<br>
@@ -558,7 +582,6 @@ MachineInstr
*TargetInstrInfo::foldMemor<br>
const MachineOperand &MO = MI.getOperand(1 -
Ops[0]);<br>
MachineBasicBlock::iterator Pos = MI;<br>
- const TargetRegisterInfo *TRI =
MF.getSubtarget().getRegisterI<wbr>nfo();<br>
if (Flags == MachineMemOperand::MOStore)<br>
storeRegToStackSlot(*MBB, Pos, MO.getReg(),
MO.isKill(), FI, RC, TRI);<br>
<br>
Modified: llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.cpp<br>
URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp?rev=287792&r1=287791&r2=287792&view=diff"
rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/Target/X8<wbr>6/X86InstrInfo.cpp?rev=287792&<wbr>r1=287791&r2=287792&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.cpp
(original)<br>
+++ llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.cpp
Wed Nov 23 12:33:49 2016<br>
@@ -6843,6 +6843,14 @@ X86InstrInfo::foldMemoryOperan<wbr>dImpl(Mach<br>
if (!MF.getFunction()->optForSize<wbr>()
&& hasPartialRegUpdate(MI.getOpco<wbr>de()))<br>
return nullptr;<br>
+ // Don't fold subreg spills, or reloads that use
a high subreg.<br>
+ for (auto Op : Ops) {<br>
+ MachineOperand &MO = MI.getOperand(Op);<br>
+ auto SubReg = MO.getSubReg();<br>
+ if (SubReg && (MO.isDef() || SubReg ==
X86::sub_8bit_hi))<br>
+ return nullptr;<br>
+ }<br>
+<br>
const MachineFrameInfo &MFI =
MF.getFrameInfo();<br>
unsigned Size = MFI.getObjectSize(FrameIndex);<br>
unsigned Alignment =
MFI.getObjectAlignment(FrameIn<wbr>dex);<br>
@@ -6967,6 +6975,14 @@ MachineInstr
*X86InstrInfo::foldMemoryOp<br>
MachineFunction &MF, MachineInstr &MI,
ArrayRef<unsigned> Ops,<br>
MachineBasicBlock::iterator InsertPt,
MachineInstr &LoadMI,<br>
LiveIntervals *LIS) const {<br>
+<br>
+ // TODO: Support the case where LoadMI loads a wide
register, but MI<br>
+ // only uses a subreg.<br>
+ for (auto Op : Ops) {<br>
+ if (MI.getOperand(Op).getSubReg()<wbr>)<br>
+ return nullptr;<br>
+ }<br>
+<br>
// If loading from a FrameIndex, fold directly
from the FrameIndex.<br>
unsigned NumOps = LoadMI.getDesc().getNumOperand<wbr>s();<br>
int FrameIndex;<br>
<br>
Modified: llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.h<br>
URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.h?rev=287792&r1=287791&r2=287792&view=diff"
rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/Target/X8<wbr>6/X86InstrInfo.h?rev=287792&r1<wbr>=287791&r2=287792&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.h
(original)<br>
+++ llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.h Wed
Nov 23 12:33:49 2016<br>
@@ -378,6 +378,10 @@ public:<br>
bool expandPostRAPseudo(MachineInst<wbr>r
&MI) const override;<br>
+ /// Check whether the target can fold a load that
feeds a subreg operand<br>
+ /// (or a subreg operand that feeds a store).<br>
+ bool isSubregFoldable() const override { return
true; }<br>
+<br>
/// foldMemoryOperand - If this target supports
it, fold a load or store of<br>
/// the specified stack slot into the specified
machine instruction for the<br>
/// specified operand(s). If this is possible,
the target should perform the<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/pa<wbr>rtial-fold32.ll<br>
URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/partial-fold32.ll?rev=287792&r1=287791&r2=287792&view=diff"
rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/CodeGen/<wbr>X86/partial-fold32.ll?rev=<wbr>287792&r1=287791&r2=287792&<wbr>view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/pa<wbr>rtial-fold32.ll
(original)<br>
+++ llvm/trunk/test/CodeGen/X86/pa<wbr>rtial-fold32.ll
Wed Nov 23 12:33:49 2016<br>
@@ -3,8 +3,7 @@<br>
define fastcc i8 @fold32to8(i32 %add, i8 %spill) {<br>
; CHECK-LABEL: fold32to8:<br>
; CHECK: movl %ecx, (%esp) # 4-byte Spill<br>
-; CHECK: movl (%esp), %eax # 4-byte Reload<br>
-; CHECK: subb %al, %dl<br>
+; CHECK: subb (%esp), %dl # 1-byte Folded Reload<br>
entry:<br>
tail call void asm sideeffect "",
"~{eax},~{ebx},~{ecx},~{edi},~<wbr>{esi},~{ebp},~{dirflag},~{fpsr<wbr>},~{flags}"()<br>
%trunc = trunc i32 %add to i8<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/pa<wbr>rtial-fold64.ll<br>
URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/partial-fold64.ll?rev=287792&r1=287791&r2=287792&view=diff"
rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/CodeGen/<wbr>X86/partial-fold64.ll?rev=<wbr>287792&r1=287791&r2=287792&<wbr>view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/pa<wbr>rtial-fold64.ll
(original)<br>
+++ llvm/trunk/test/CodeGen/X86/pa<wbr>rtial-fold64.ll
Wed Nov 23 12:33:49 2016<br>
@@ -3,8 +3,7 @@<br>
define i32 @fold64to32(i64 %add, i32 %spill) {<br>
; CHECK-LABEL: fold64to32:<br>
; CHECK: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte
Spill<br>
-; CHECK: movq -{{[0-9]+}}(%rsp), %rax # 8-byte
Reload<br>
-; CHECK: subl %eax, %esi<br>
+; CHECK: subl -{{[0-9]+}}(%rsp), %esi # 4-byte
Folded Reload<br>
entry:<br>
tail call void asm sideeffect "",
"~{rax},~{rbx},~{rcx},~{rdx},~<wbr>{rdi},~{rbp},~{r8},~{r9},~{r10<wbr>},~{r11},~{r12},~{r13},~{r14},<wbr>~{r15},~{dirflag},~{fpsr},~{<wbr>flags}"()<br>
%trunc = trunc i64 %add to i32<br>
@@ -15,8 +14,7 @@ entry:<br>
define i8 @fold64to8(i64 %add, i8 %spill) {<br>
; CHECK-LABEL: fold64to8:<br>
; CHECK: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte
Spill<br>
-; CHECK: movq -{{[0-9]+}}(%rsp), %rax # 8-byte
Reload<br>
-; CHECK: subb %al, %sil<br>
+; CHECK: subb -{{[0-9]+}}(%rsp), %sil # 1-byte
Folded Reload<br>
entry:<br>
tail call void asm sideeffect "",
"~{rax},~{rbx},~{rcx},~{rdx},~<wbr>{rdi},~{rbp},~{r8},~{r9},~{r10<wbr>},~{r11},~{r12},~{r13},~{r14},<wbr>~{r15},~{dirflag},~{fpsr},~{<wbr>flags}"()<br>
%trunc = trunc i64 %add to i8<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/ve<wbr>ctor-half-conversions.ll<br>
URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll?rev=287792&r1=287791&r2=287792&view=diff"
rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/CodeGen/<wbr>X86/vector-half-conversions.<wbr>ll?rev=287792&r1=287791&r2=<wbr>287792&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/ve<wbr>ctor-half-conversions.ll
(original)<br>
+++ llvm/trunk/test/CodeGen/X86/ve<wbr>ctor-half-conversions.ll
Wed Nov 23 12:33:49 2016<br>
@@ -4788,9 +4788,8 @@ define <8 x i16>
@cvt_8f64_to_8i16(<8 x<br>
; AVX1-NEXT: orl %ebx, %r14d<br>
; AVX1-NEXT: shlq $32, %r14<br>
; AVX1-NEXT: orq %r15, %r14<br>
-; AVX1-NEXT: vmovupd (%rsp), %ymm0 # 32-byte
Reload<br>
-; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]<br>
-; AVX1-NEXT: vzeroupper<br>
+; AVX1-NEXT: vpermilpd $1, (%rsp), %xmm0 # 16-byte
Folded Reload<br>
+; AVX1-NEXT: # xmm0 = mem[1,0]<br>
; AVX1-NEXT: callq __truncdfhf2<br>
; AVX1-NEXT: movw %ax, %bx<br>
; AVX1-NEXT: shll $16, %ebx<br>
@@ -4856,9 +4855,8 @@ define <8 x i16>
@cvt_8f64_to_8i16(<8 x<br>
; AVX2-NEXT: orl %ebx, %r14d<br>
; AVX2-NEXT: shlq $32, %r14<br>
; AVX2-NEXT: orq %r15, %r14<br>
-; AVX2-NEXT: vmovupd (%rsp), %ymm0 # 32-byte
Reload<br>
-; AVX2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]<br>
-; AVX2-NEXT: vzeroupper<br>
+; AVX2-NEXT: vpermilpd $1, (%rsp), %xmm0 # 16-byte
Folded Reload<br>
+; AVX2-NEXT: # xmm0 = mem[1,0]<br>
; AVX2-NEXT: callq __truncdfhf2<br>
; AVX2-NEXT: movw %ax, %bx<br>
; AVX2-NEXT: shll $16, %ebx<br>
@@ -5585,9 +5583,8 @@ define void
@store_cvt_8f64_to_8i16(<8 x<br>
; AVX1-NEXT: vzeroupper<br>
; AVX1-NEXT: callq __truncdfhf2<br>
; AVX1-NEXT: movw %ax, {{[0-9]+}}(%rsp) # 2-byte
Spill<br>
-; AVX1-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 #
32-byte Reload<br>
-; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]<br>
-; AVX1-NEXT: vzeroupper<br>
+; AVX1-NEXT: vpermilpd $1, {{[0-9]+}}(%rsp), %xmm0
# 16-byte Folded Reload<br>
+; AVX1-NEXT: # xmm0 = mem[1,0]<br>
; AVX1-NEXT: callq __truncdfhf2<br>
; AVX1-NEXT: movl %eax, %r12d<br>
; AVX1-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 #
32-byte Reload<br>
@@ -5654,9 +5651,8 @@ define void
@store_cvt_8f64_to_8i16(<8 x<br>
; AVX2-NEXT: vzeroupper<br>
; AVX2-NEXT: callq __truncdfhf2<br>
; AVX2-NEXT: movw %ax, {{[0-9]+}}(%rsp) # 2-byte
Spill<br>
-; AVX2-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 #
32-byte Reload<br>
-; AVX2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]<br>
-; AVX2-NEXT: vzeroupper<br>
+; AVX2-NEXT: vpermilpd $1, {{[0-9]+}}(%rsp), %xmm0
# 16-byte Folded Reload<br>
+; AVX2-NEXT: # xmm0 = mem[1,0]<br>
; AVX2-NEXT: callq __truncdfhf2<br>
; AVX2-NEXT: movl %eax, %r12d<br>
; AVX2-NEXT: vmovupd {{[0-9]+}}(%rsp), %ymm0 #
32-byte Reload<br>
<br>
<br>
______________________________<wbr>_________________<br>
llvm-commits mailing list<br>
<a moz-do-not-send="true"
href="mailto:llvm-commits@lists.llvm.org"
target="_blank">llvm-commits@lists.llvm.org</a><br>
<a moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits"
rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>
</blockquote>
<br>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>