<div dir="ltr"><div>Yeah, it looks like we're choosing a register that is not available on i686 (32-bit mode only has xmm0-xmm7).</div><div><br></div>Marina, I guess r278321 is not selecting a register from the correct register class?</div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Aug 16, 2016 at 5:26 PM, Andrew Adams via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Is it possible that this commit caused: <a href="https://llvm.org/bugs/show_bug.cgi?id=29010" target="_blank">https://llvm.org/bugs/show_<wbr>bug.cgi?id=29010</a> ?<br><div><br></div><div>It could also be one of the nearby AVX-512 commits, though that seems less likely.</div><div><br></div><div>- Andrew</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Aug 13, 2016 at 10:22 AM, Craig Topper via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I think there maybe a bug in this. See comment below.<br><div class="gmail_extra"><br clear="all"><div><div data-smartmail="gmail_signature">~Craig</div></div>
<br><div class="gmail_quote"><div><div class="m_-7106383665775259293h5">On Thu, Aug 11, 2016 at 12:32 AM, Marina Yatsina via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: myatsina<br>
Date: Thu Aug 11 02:32:08 2016<br>
New Revision: 278321<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=278321&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject?rev=278321&view=rev</a><br>
Log:<br>
Avoid false dependencies of undef machine operands<br>
<br>
This patch helps avoid false dependencies on undef registers by updating the machine instructions' undef operand to use a register that the instruction is truly dependent on, or use a register with clearance higher than Pref.<br>
<br>
Pseudo example:<br>
<br>
loop:<br>
xmm0 = ...<br>
xmm1 = vcvtsi2sdl eax, xmm0<undef><br>
... = inst xmm0<br>
jmp loop<br>
<br>
In this example, selecting xmm0 as the undef register creates false dependency between loop iterations.<br>
This false dependency cannot be solved by inserting an xor before vcvtsi2sdl because xmm0 is alive at the point of the vcvtsi2sdl instruction.<br>
Selecting a different register instead of xmm0, especially a register that is not used in the loop, will eliminate this problem.<br>
<br>
Differential Revision: <a href="https://reviews.llvm.org/D22466" rel="noreferrer" target="_blank">https://reviews.llvm.org/D2246<wbr>6</a><br>
<br>
<br>
Modified:<br>
llvm/trunk/lib/CodeGen/Executi<wbr>onDepsFix.cpp<br>
llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.cpp<br>
llvm/trunk/test/CodeGen/X86/av<wbr>x512-cvt.ll<br>
llvm/trunk/test/CodeGen/X86/br<wbr>eak-false-dep.ll<br>
llvm/trunk/test/CodeGen/X86/co<wbr>py-propagation.ll<br>
llvm/trunk/test/CodeGen/X86/ha<wbr>lf.ll<br>
llvm/trunk/test/CodeGen/X86/ss<wbr>e-fsignum.ll<br>
llvm/trunk/test/CodeGen/X86/ve<wbr>c_int_to_fp.ll<br>
<br>
Modified: llvm/trunk/lib/CodeGen/Executi<wbr>onDepsFix.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/ExecutionDepsFix.cpp?rev=278321&r1=278320&r2=278321&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/CodeGen/E<wbr>xecutionDepsFix.cpp?rev=278321<wbr>&r1=278320&r2=278321&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/CodeGen/Executi<wbr>onDepsFix.cpp (original)<br>
+++ llvm/trunk/lib/CodeGen/Executi<wbr>onDepsFix.cpp Thu Aug 11 02:32:08 2016<br>
@@ -203,6 +203,8 @@ private:<br>
void processDefs(MachineInstr*, bool Kill);<br>
void visitSoftInstr(MachineInstr*, unsigned mask);<br>
void visitHardInstr(MachineInstr*, unsigned domain);<br>
+ void pickBestRegisterForUndef(Machi<wbr>neInstr *MI, unsigned OpIdx,<br>
+ unsigned Pref);<br>
bool shouldBreakDependence(MachineI<wbr>nstr*, unsigned OpIdx, unsigned Pref);<br>
void processUndefReads(MachineBasic<wbr>Block*);<br>
};<br>
@@ -473,6 +475,56 @@ void ExeDepsFix::visitInstr(Machine<wbr>Instr<br>
processDefs(MI, !DomP.first);<br>
}<br>
<br>
+/// \brief Helps avoid false dependencies on undef registers by updating the<br>
+/// machine instructions' undef operand to use a register that the instruction<br>
+/// is truly dependent on, or use a register with clearance higher than Pref.<br>
+void ExeDepsFix::pickBestRegisterFo<wbr>rUndef(MachineInstr *MI, unsigned OpIdx,<br>
+ unsigned Pref) {<br>
+ MachineOperand &MO = MI->getOperand(OpIdx);<br>
+ assert(MO.isUndef() && "Expected undef machine operand");<br>
+<br>
+ unsigned OriginalReg = MO.getReg();<br>
+<br>
+ // Update only undef operands that are mapped to one register.<br>
+ if (AliasMap[OriginalReg].size() != 1)<br>
+ return;<br>
+<br>
+ // Get the undef operand's register class<br>
+ const TargetRegisterClass *OpRC =<br>
+ TII->getRegClass(MI->getDesc()<wbr>, OpIdx, TRI, *MF);<br>
+<br>
+ // If the instruction has a true dependency, we can hide the false depdency<br>
+ // behind it.<br>
+ for (MachineOperand &CurrMO : MI->operands()) {<br>
+ if (!CurrMO.isReg() || CurrMO.isDef() || CurrMO.isUndef() ||<br>
+ !OpRC->contains(CurrMO.getReg(<wbr>)))<br>
+ continue;<br>
+ // We found a true dependency - replace the undef register with the true<br>
+ // dependency.<br>
+ MO.setReg(CurrMO.getReg());<br>
+ return;<br>
+ }<br>
+<br>
+ // Go over all registers in the register class and find the register with<br>
+ // max clearance or clearance higher than Pref.<br>
+ unsigned MaxClearance = 0;<br>
+ unsigned MaxClearanceReg = OriginalReg;<br>
+ for (unsigned rx = 0; rx < OpRC->getNumRegs(); ++rx) {<br>
+ unsigned Clearance = CurInstr - LiveRegs[rx].Def;<br></blockquote><div><br></div></div></div><div>There's no guarantee that the indices in OpRC are equivalent to the indices in the RC that was passed to this pass. So I'm not sure you can use them directly in indexing LiveRegs. I think you need to go through AliasMap.</div><div><div class="m_-7106383665775259293h5"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+ if (Clearance <= MaxClearance)<br>
+ continue;<br>
+ MaxClearance = Clearance;<br>
+ MaxClearanceReg = OpRC->getRegister(rx);<br>
+<br>
+ if (MaxClearance > Pref)<br>
+ break;<br>
+ }<br>
+<br>
+ // Update the operand if we found a register with better clearance.<br>
+ if (MaxClearanceReg != OriginalReg)<br>
+ MO.setReg(MaxClearanceReg);<br>
+}<br>
+<br>
/// \brief Return true to if it makes sense to break dependence on a partial def<br>
/// or undef use.<br>
bool ExeDepsFix::shouldBreakDepende<wbr>nce(MachineInstr *MI, unsigned OpIdx,<br>
@@ -510,6 +562,7 @@ void ExeDepsFix::processDefs(Machin<wbr>eInst<br>
unsigned OpNum;<br>
unsigned Pref = TII->getUndefRegClearance(*MI, OpNum, TRI);<br>
if (Pref) {<br>
+ pickBestRegisterForUndef(MI, OpNum, Pref);<br>
if (shouldBreakDependence(MI, OpNum, Pref))<br>
UndefReads.push_back(std::mak<wbr>e_pair(MI, OpNum));<br>
}<br>
<br>
Modified: llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.cpp?rev=278321&r1=278320&r2=278321&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/Target/X8<wbr>6/X86InstrInfo.cpp?rev=278321&<wbr>r1=278320&r2=278321&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.cpp (original)<br>
+++ llvm/trunk/lib/Target/X86/X86I<wbr>nstrInfo.cpp Thu Aug 11 02:32:08 2016<br>
@@ -68,7 +68,7 @@ static cl::opt<unsigned><br>
UndefRegClearance("undef-reg-<wbr>clearance",<br>
cl::desc("How many idle instructions we would like before "<br>
"certain undef register reads"),<br>
- cl::init(64), cl::Hidden);<br>
+ cl::init(128), cl::Hidden);<br>
<br>
enum {<br>
// Select which memory operand is being unfolded.<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/av<wbr>x512-cvt.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-cvt.ll?rev=278321&r1=278320&r2=278321&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/CodeGen/<wbr>X86/avx512-cvt.ll?rev=278321&r<wbr>1=278320&r2=278321&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/av<wbr>x512-cvt.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/av<wbr>x512-cvt.ll Thu Aug 11 02:32:08 2016<br>
@@ -16,28 +16,27 @@ define <8 x double> @sltof864(<8 x i64><br>
; KNL: ## BB#0:<br>
; KNL-NEXT: vextracti32x4 $3, %zmm0, %xmm1<br>
; KNL-NEXT: vpextrq $1, %xmm1, %rax<br>
-; KNL-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; KNL-NEXT: vcvtsi2sdq %rax, %xmm2, %xmm2<br>
; KNL-NEXT: vmovq %xmm1, %rax<br>
-; KNL-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm1<br>
+; KNL-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm1<br>
; KNL-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]<br>
; KNL-NEXT: vextracti32x4 $2, %zmm0, %xmm2<br>
; KNL-NEXT: vpextrq $1, %xmm2, %rax<br>
-; KNL-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm3<br>
+; KNL-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm3<br>
; KNL-NEXT: vmovq %xmm2, %rax<br>
-; KNL-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; KNL-NEXT: vcvtsi2sdq %rax, %xmm4, %xmm2<br>
; KNL-NEXT: vunpcklpd {{.*#+}} xmm2 = xmm2[0],xmm3[0]<br>
; KNL-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm1<br>
; KNL-NEXT: vextracti32x4 $1, %zmm0, %xmm2<br>
; KNL-NEXT: vpextrq $1, %xmm2, %rax<br>
-; KNL-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm3<br>
+; KNL-NEXT: vcvtsi2sdq %rax, %xmm4, %xmm3<br>
; KNL-NEXT: vmovq %xmm2, %rax<br>
-; KNL-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; KNL-NEXT: vcvtsi2sdq %rax, %xmm4, %xmm2<br>
; KNL-NEXT: vunpcklpd {{.*#+}} xmm2 = xmm2[0],xmm3[0]<br>
; KNL-NEXT: vpextrq $1, %xmm0, %rax<br>
-; KNL-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm3<br>
+; KNL-NEXT: vcvtsi2sdq %rax, %xmm4, %xmm3<br>
; KNL-NEXT: vmovq %xmm0, %rax<br>
-; KNL-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; KNL-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0<br>
+; KNL-NEXT: vcvtsi2sdq %rax, %xmm4, %xmm0<br>
; KNL-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm3[0]<br>
; KNL-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0<br>
; KNL-NEXT: vinsertf64x4 $1, %ymm1, %zmm0, %zmm0<br>
@@ -56,15 +55,14 @@ define <4 x double> @sltof464(<4 x i64><br>
; KNL: ## BB#0:<br>
; KNL-NEXT: vextracti128 $1, %ymm0, %xmm1<br>
; KNL-NEXT: vpextrq $1, %xmm1, %rax<br>
-; KNL-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; KNL-NEXT: vcvtsi2sdq %rax, %xmm2, %xmm2<br>
; KNL-NEXT: vmovq %xmm1, %rax<br>
-; KNL-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm1<br>
+; KNL-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm1<br>
; KNL-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]<br>
; KNL-NEXT: vpextrq $1, %xmm0, %rax<br>
-; KNL-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; KNL-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm2<br>
; KNL-NEXT: vmovq %xmm0, %rax<br>
-; KNL-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; KNL-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0<br>
+; KNL-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm0<br>
; KNL-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]<br>
; KNL-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0<br>
; KNL-NEXT: retq<br>
@@ -81,12 +79,11 @@ define <2 x float> @sltof2f32(<2 x i64><br>
; KNL-LABEL: sltof2f32:<br>
; KNL: ## BB#0:<br>
; KNL-NEXT: vpextrq $1, %xmm0, %rax<br>
-; KNL-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; KNL-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; KNL-NEXT: vmovq %xmm0, %rax<br>
-; KNL-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; KNL-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; KNL-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm0<br>
; KNL-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]<br>
-; KNL-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; KNL-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm1<br>
; KNL-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]<br>
; KNL-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0]<br>
; KNL-NEXT: retq<br>
@@ -105,17 +102,16 @@ define <4 x float> @sltof4f32_mem(<4 x i<br>
; KNL: ## BB#0:<br>
; KNL-NEXT: vmovdqu (%rdi), %ymm0<br>
; KNL-NEXT: vpextrq $1, %xmm0, %rax<br>
-; KNL-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; KNL-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; KNL-NEXT: vmovq %xmm0, %rax<br>
-; KNL-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; KNL-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; KNL-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]<br>
; KNL-NEXT: vextracti128 $1, %ymm0, %xmm0<br>
; KNL-NEXT: vmovq %xmm0, %rax<br>
-; KNL-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; KNL-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm2<br>
; KNL-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]<br>
; KNL-NEXT: vpextrq $1, %xmm0, %rax<br>
-; KNL-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; KNL-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; KNL-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm0<br>
; KNL-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; KNL-NEXT: retq<br>
;<br>
@@ -186,17 +182,16 @@ define <4 x float> @sltof432(<4 x i64> %<br>
; KNL-LABEL: sltof432:<br>
; KNL: ## BB#0:<br>
; KNL-NEXT: vpextrq $1, %xmm0, %rax<br>
-; KNL-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; KNL-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; KNL-NEXT: vmovq %xmm0, %rax<br>
-; KNL-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; KNL-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; KNL-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]<br>
; KNL-NEXT: vextracti128 $1, %ymm0, %xmm0<br>
; KNL-NEXT: vmovq %xmm0, %rax<br>
-; KNL-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; KNL-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm2<br>
; KNL-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]<br>
; KNL-NEXT: vpextrq $1, %xmm0, %rax<br>
-; KNL-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; KNL-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; KNL-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm0<br>
; KNL-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; KNL-NEXT: retq<br>
;<br>
@@ -884,12 +879,11 @@ define <2 x float> @sitofp_2i1_float(<2<br>
; KNL-NEXT: movl $-1, %eax<br>
; KNL-NEXT: movl $0, %edx<br>
; KNL-NEXT: cmovnel %eax, %edx<br>
-; KNL-NEXT: vcvtsi2ssl %edx, %xmm0, %xmm1<br>
+; KNL-NEXT: vcvtsi2ssl %edx, %xmm2, %xmm1<br>
; KNL-NEXT: vmovq %xmm0, %rdx<br>
; KNL-NEXT: testb $1, %dl<br>
; KNL-NEXT: cmovnel %eax, %ecx<br>
-; KNL-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; KNL-NEXT: vcvtsi2ssl %ecx, %xmm0, %xmm0<br>
+; KNL-NEXT: vcvtsi2ssl %ecx, %xmm2, %xmm0<br>
; KNL-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]<br>
; KNL-NEXT: retq<br>
;<br>
@@ -1091,11 +1085,10 @@ define <2 x float> @uitofp_2i1_float(<2<br>
; KNL-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm0<br>
; KNL-NEXT: vpextrq $1, %xmm0, %rax<br>
; KNL-NEXT: andl $1, %eax<br>
-; KNL-NEXT: vcvtsi2ssl %eax, %xmm0, %xmm1<br>
+; KNL-NEXT: vcvtsi2ssl %eax, %xmm2, %xmm1<br>
; KNL-NEXT: vmovq %xmm0, %rax<br>
; KNL-NEXT: andl $1, %eax<br>
-; KNL-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; KNL-NEXT: vcvtsi2ssl %eax, %xmm0, %xmm0<br>
+; KNL-NEXT: vcvtsi2ssl %eax, %xmm2, %xmm0<br>
; KNL-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]<br>
; KNL-NEXT: retq<br>
;<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/br<wbr>eak-false-dep.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/break-false-dep.ll?rev=278321&r1=278320&r2=278321&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/CodeGen/<wbr>X86/break-false-dep.ll?rev=278<wbr>321&r1=278320&r2=278321&view=<wbr>diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/br<wbr>eak-false-dep.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/br<wbr>eak-false-dep.ll Thu Aug 11 02:32:08 2016<br>
@@ -126,6 +126,7 @@ loop:<br>
%i = phi i64 [ 1, %entry ], [ %inc, %loop ]<br>
%s1 = phi i64 [ %vx, %entry ], [ %s2, %loop ]<br>
%fi = sitofp i64 %i to double<br>
+ tail call void asm sideeffect "", "~{xmm0},~{xmm1},~{xmm2},~{xmm<wbr>3},~{xmm4},~{xmm5},~{xmm6},~{x<wbr>mm7},~{xmm8},~{xmm9},~{xmm10},<wbr>~{xmm11},~{xmm12},~{xmm13},~{x<wbr>mm14},~{xmm15},~{dirflag},~{fp<wbr>sr},~{flags}"()<br>
%vy = load double, double* %y<br>
%fipy = fadd double %fi, %vy<br>
%iipy = fptosi double %fipy to i64<br>
@@ -174,6 +175,7 @@ for.body3:<br>
store double %mul11, double* %arrayidx13, align 8<br>
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br>
%exitcond = icmp eq i64 %indvars.iv.next, 1024<br>
+ tail call void asm sideeffect "", "~{xmm0},~{xmm1},~{xmm2},~{xmm<wbr>3},~{xmm4},~{xmm5},~{xmm6},~{x<wbr>mm7},~{xmm8},~{xmm9},~{xmm10},<wbr>~{xmm11},~{xmm12},~{xmm13},~{x<wbr>mm14},~{xmm15},~{dirflag},~{fp<wbr>sr},~{flags}"()<br>
br i1 %exitcond, label %for.inc14, label %for.body3<br>
<br>
for.inc14: ; preds = %for.body3<br>
@@ -193,7 +195,7 @@ for.end16:<br>
;SSE-NEXT: movsd [[XMM0]],<br>
;AVX-LABEL:@loopdep3<br>
;AVX: vxorps [[XMM0:%xmm[0-9]+]], [[XMM0]]<br>
-;AVX-NEXT: vcvtsi2sdl {{.*}}, [[XMM0]], [[XMM0]]<br>
+;AVX-NEXT: vcvtsi2sdl {{.*}}, [[XMM0]], {{%xmm[0-9]+}}<br>
;AVX-NEXT: vmulsd {{.*}}, [[XMM0]], [[XMM0]]<br>
;AVX-NEXT: vmulsd {{.*}}, [[XMM0]], [[XMM0]]<br>
;AVX-NEXT: vmulsd {{.*}}, [[XMM0]], [[XMM0]]<br>
@@ -202,10 +204,76 @@ for.end16:<br>
<br>
define double @inlineasmdep(i64 %arg) {<br>
top:<br>
- tail call void asm sideeffect "", "~{xmm0},~{dirflag},~{fpsr},~{<wbr>flags}"()<br>
+ tail call void asm sideeffect "", "~{xmm0},~{xmm1},~{xmm2},~{xmm<wbr>3},~{dirflag},~{fpsr},~{flags}<wbr>"()<br>
+ tail call void asm sideeffect "", "~{xmm4},~{xmm5},~{xmm6},~{xmm<wbr>7},~{dirflag},~{fpsr},~{flags}<wbr>"()<br>
+ tail call void asm sideeffect "", "~{xmm8},~{xmm9},~{xmm10},~{xm<wbr>m11},~{dirflag},~{fpsr},~{flag<wbr>s}"()<br>
+ tail call void asm sideeffect "", "~{xmm12},~{xmm13},~{xmm14},~{<wbr>xmm15},~{dirflag},~{fpsr},~{fl<wbr>ags}"()<br>
%tmp1 = sitofp i64 %arg to double<br>
ret double %tmp1<br>
;AVX-LABEL:@inlineasmdep<br>
;AVX: vxorps [[XMM0:%xmm[0-9]+]], [[XMM0]], [[XMM0]]<br>
;AVX-NEXT: vcvtsi2sdq {{.*}}, [[XMM0]], {{%xmm[0-9]+}}<br>
}<br>
+<br>
+; Make sure we are making a smart choice regarding undef registers and<br>
+; hiding the false dependency behind a true dependency<br>
+define double @truedeps(float %arg) {<br>
+top:<br>
+ tail call void asm sideeffect "", "~{xmm6},~{dirflag},~{fpsr},~{<wbr>flags}"()<br>
+ tail call void asm sideeffect "", "~{xmm0},~{xmm1},~{xmm2},~{xmm<wbr>3},~{dirflag},~{fpsr},~{flags}<wbr>"()<br>
+ tail call void asm sideeffect "", "~{xmm4},~{xmm5},~{xmm7},~{dir<wbr>flag},~{fpsr},~{flags}"()<br>
+ tail call void asm sideeffect "", "~{xmm8},~{xmm9},~{xmm10},~{xm<wbr>m11},~{dirflag},~{fpsr},~{flag<wbr>s}"()<br>
+ tail call void asm sideeffect "", "~{xmm12},~{xmm13},~{xmm14},~{<wbr>xmm15},~{dirflag},~{fpsr},~{fl<wbr>ags}"()<br>
+ %tmp1 = fpext float %arg to double<br>
+ ret double %tmp1<br>
+;AVX-LABEL:@truedeps<br>
+;AVX-NOT: vxorps<br>
+;AVX: vcvtss2sd [[XMM0:%xmm[0-9]+]], [[XMM0]], {{%xmm[0-9]+}}<br>
+}<br>
+<br>
+; Make sure we are making a smart choice regarding undef registers and<br>
+; choosing the register with the highest clearence<br>
+define double @clearence(i64 %arg) {<br>
+top:<br>
+ tail call void asm sideeffect "", "~{xmm6},~{dirflag},~{fpsr},~{<wbr>flags}"()<br>
+ tail call void asm sideeffect "", "~{xmm0},~{xmm1},~{xmm2},~{xmm<wbr>3},~{dirflag},~{fpsr},~{flags}<wbr>"()<br>
+ tail call void asm sideeffect "", "~{xmm4},~{xmm5},~{xmm7},~{dir<wbr>flag},~{fpsr},~{flags}"()<br>
+ tail call void asm sideeffect "", "~{xmm8},~{xmm9},~{xmm10},~{xm<wbr>m11},~{dirflag},~{fpsr},~{flag<wbr>s}"()<br>
+ tail call void asm sideeffect "", "~{xmm12},~{xmm13},~{xmm14},~{<wbr>xmm15},~{dirflag},~{fpsr},~{fl<wbr>ags}"()<br>
+ %tmp1 = sitofp i64 %arg to double<br>
+ ret double %tmp1<br>
+;AVX-LABEL:@clearence<br>
+;AVX: vxorps [[XMM6:%xmm6]], [[XMM6]], [[XMM6]]<br>
+;AVX-NEXT: vcvtsi2sdq {{.*}}, [[XMM6]], {{%xmm[0-9]+}}<br>
+}<br>
+<br>
+; Make sure we are making a smart choice regarding undef registers in order to<br>
+; avoid a cyclic dependence on a write to the same register in a previous<br>
+; iteration, especially when we cannot zero out the undef register because it<br>
+; is alive.<br>
+define i64 @loopclearence(i64* nocapture %x, double* nocapture %y) nounwind {<br>
+entry:<br>
+ %vx = load i64, i64* %x<br>
+ br label %loop<br>
+loop:<br>
+ %i = phi i64 [ 1, %entry ], [ %inc, %loop ]<br>
+ %s1 = phi i64 [ %vx, %entry ], [ %s2, %loop ]<br>
+ %fi = sitofp i64 %i to double<br>
+ tail call void asm sideeffect "", "~{xmm0},~{xmm1},~{xmm2},~{xmm<wbr>3},~{dirflag},~{fpsr},~{flags}<wbr>"()<br>
+ tail call void asm sideeffect "", "~{xmm8},~{xmm9},~{xmm10},~{xm<wbr>m11},~{dirflag},~{fpsr},~{flag<wbr>s}"()<br>
+ tail call void asm sideeffect "", "~{xmm12},~{xmm13},~{xmm14},~{<wbr>xmm15},~{dirflag},~{fpsr},~{fl<wbr>ags}"()<br>
+ %vy = load double, double* %y<br>
+ %fipy = fadd double %fi, %vy<br>
+ %iipy = fptosi double %fipy to i64<br>
+ %s2 = add i64 %s1, %iipy<br>
+ %inc = add nsw i64 %i, 1<br>
+ %exitcond = icmp eq i64 %inc, 156250000<br>
+ br i1 %exitcond, label %ret, label %loop<br>
+ret:<br>
+ ret i64 %s2<br>
+;AVX-LABEL:@loopclearence<br>
+;Registers 4-7 are not used and therefore one of them should be chosen<br>
+;AVX-NOT: {{%xmm[4-7]}}<br>
+;AVX: vcvtsi2sdq {{.*}}, [[XMM4_7:%xmm[4-7]]], {{%xmm[0-9]+}}<br>
+;AVX-NOT: [[XMM4_7]]<br>
+}<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/co<wbr>py-propagation.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/copy-propagation.ll?rev=278321&r1=278320&r2=278321&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/CodeGen/<wbr>X86/copy-propagation.ll?rev=27<wbr>8321&r1=278320&r2=278321&view=<wbr>diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/co<wbr>py-propagation.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/co<wbr>py-propagation.ll Thu Aug 11 02:32:08 2016<br>
@@ -26,7 +26,7 @@ target triple = "x86_64-pc-win32-elf"<br>
; Copy the result in a temporary.<br>
; Note: Technically the regalloc could have been smarter and this move not required,<br>
; which would have hidden the bug.<br>
-; CHECK-NEXT: vmovapd %xmm0, [[TMP:%xmm[0-9]+]]<br>
+; CHECK: vmovapd %xmm0, [[TMP:%xmm[0-9]+]]<br>
; Crush xmm0.<br>
; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
; CHECK: movl $339772768, %e[[INDIRECT_CALL2:[a-z]+]]<br>
@@ -37,6 +37,7 @@ target triple = "x86_64-pc-win32-elf"<br>
define double @foo(i64 %arg) {<br>
top:<br>
%tmp = call double inttoptr (i64 339752784 to double (double, double)*)(double 1.000000e+00, double 0.000000e+00)<br>
+ tail call void asm sideeffect "", "x,~{xmm1},~{xmm2},~{xmm3},~{x<wbr>mm4},~{xmm5},~{xmm6},~{xmm7},~<wbr>{xmm8},~{xmm9},~{xmm10},~{xmm1<wbr>1},~{xmm12},~{xmm13},~{xmm14},<wbr>~{xmm15},~{dirflag},~{fpsr},~{<wbr>flags}"(double %tmp)<br>
%tmp1 = sitofp i64 %arg to double<br>
call void inttoptr (i64 339772768 to void (double, double)*)(double %tmp, double %tmp1)<br>
%tmp3 = fadd double %tmp1, %tmp<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/ha<wbr>lf.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/half.ll?rev=278321&r1=278320&r2=278321&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/CodeGen/<wbr>X86/half.ll?rev=278321&r1=2783<wbr>20&r2=278321&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/ha<wbr>lf.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/ha<wbr>lf.ll Thu Aug 11 02:32:08 2016<br>
@@ -299,7 +299,7 @@ define half @test_f80trunc_nodagcombine(<br>
; CHECK-F16C-NEXT: movswl (%rsi), %eax<br>
; CHECK-F16C-NEXT: vmovd %eax, %xmm0<br>
; CHECK-F16C-NEXT: vcvtph2ps %xmm0, %xmm0<br>
-; CHECK-F16C-NEXT: vcvtsi2ssl %edi, %xmm0, %xmm1<br>
+; CHECK-F16C-NEXT: vcvtsi2ssl %edi, %xmm1, %xmm1<br>
; CHECK-F16C-NEXT: vcvtps2ph $4, %xmm1, %xmm1<br>
; CHECK-F16C-NEXT: vcvtph2ps %xmm1, %xmm1<br>
; CHECK-F16C-NEXT: vaddss %xmm1, %xmm0, %xmm0<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/ss<wbr>e-fsignum.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse-fsignum.ll?rev=278321&r1=278320&r2=278321&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/CodeGen/<wbr>X86/sse-fsignum.ll?rev=278321&<wbr>r1=278320&r2=278321&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/ss<wbr>e-fsignum.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/ss<wbr>e-fsignum.ll Thu Aug 11 02:32:08 2016<br>
@@ -39,16 +39,15 @@ define void @signum64a(<2 x double>*) {<br>
; AVX1-NEXT: vxorpd %xmm1, %xmm1, %xmm1<br>
; AVX1-NEXT: vcmpltpd %xmm1, %xmm0, %xmm2<br>
; AVX1-NEXT: vpextrq $1, %xmm2, %rax<br>
-; AVX1-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm3<br>
+; AVX1-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm3<br>
; AVX1-NEXT: vmovq %xmm2, %rax<br>
-; AVX1-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2sdq %rax, %xmm4, %xmm2<br>
; AVX1-NEXT: vunpcklpd {{.*#+}} xmm2 = xmm2[0],xmm3[0]<br>
; AVX1-NEXT: vcmpltpd %xmm0, %xmm1, %xmm0<br>
; AVX1-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX1-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm1<br>
+; AVX1-NEXT: vcvtsi2sdq %rax, %xmm4, %xmm1<br>
; AVX1-NEXT: vmovq %xmm0, %rax<br>
-; AVX1-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX1-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0<br>
+; AVX1-NEXT: vcvtsi2sdq %rax, %xmm4, %xmm0<br>
; AVX1-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]<br>
; AVX1-NEXT: vsubpd %xmm0, %xmm2, %xmm0<br>
; AVX1-NEXT: vmovapd %xmm0, (%rdi)<br>
@@ -60,16 +59,15 @@ define void @signum64a(<2 x double>*) {<br>
; AVX2-NEXT: vxorpd %xmm1, %xmm1, %xmm1<br>
; AVX2-NEXT: vcmpltpd %xmm1, %xmm0, %xmm2<br>
; AVX2-NEXT: vpextrq $1, %xmm2, %rax<br>
-; AVX2-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm3<br>
+; AVX2-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm3<br>
; AVX2-NEXT: vmovq %xmm2, %rax<br>
-; AVX2-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2sdq %rax, %xmm4, %xmm2<br>
; AVX2-NEXT: vunpcklpd {{.*#+}} xmm2 = xmm2[0],xmm3[0]<br>
; AVX2-NEXT: vcmpltpd %xmm0, %xmm1, %xmm0<br>
; AVX2-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX2-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm1<br>
+; AVX2-NEXT: vcvtsi2sdq %rax, %xmm4, %xmm1<br>
; AVX2-NEXT: vmovq %xmm0, %rax<br>
-; AVX2-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX2-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0<br>
+; AVX2-NEXT: vcvtsi2sdq %rax, %xmm4, %xmm0<br>
; AVX2-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]<br>
; AVX2-NEXT: vsubpd %xmm0, %xmm2, %xmm0<br>
; AVX2-NEXT: vmovapd %xmm0, (%rdi)<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/ve<wbr>c_int_to_fp.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll?rev=278321&r1=278320&r2=278321&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/CodeGen/<wbr>X86/vec_int_to_fp.ll?rev=27832<wbr>1&r1=278320&r2=278321&view=<wbr>diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/ve<wbr>c_int_to_fp.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/ve<wbr>c_int_to_fp.ll Thu Aug 11 02:32:08 2016<br>
@@ -28,10 +28,9 @@ define <2 x double> @sitofp_2i64_to_2f64<br>
; AVX-LABEL: sitofp_2i64_to_2f64:<br>
; AVX: # BB#0:<br>
; AVX-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm1<br>
+; AVX-NEXT: vcvtsi2sdq %rax, %xmm1, %xmm1<br>
; AVX-NEXT: vmovq %xmm0, %rax<br>
-; AVX-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0<br>
+; AVX-NEXT: vcvtsi2sdq %rax, %xmm2, %xmm0<br>
; AVX-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]<br>
; AVX-NEXT: retq<br>
%cvt = sitofp <2 x i64> %a to <2 x double><br>
@@ -209,15 +208,14 @@ define <4 x double> @sitofp_4i64_to_4f64<br>
; AVX1: # BB#0:<br>
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1<br>
; AVX1-NEXT: vpextrq $1, %xmm1, %rax<br>
-; AVX1-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2sdq %rax, %xmm2, %xmm2<br>
; AVX1-NEXT: vmovq %xmm1, %rax<br>
-; AVX1-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm1<br>
+; AVX1-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm1<br>
; AVX1-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]<br>
; AVX1-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX1-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm2<br>
; AVX1-NEXT: vmovq %xmm0, %rax<br>
-; AVX1-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX1-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0<br>
+; AVX1-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm0<br>
; AVX1-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]<br>
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0<br>
; AVX1-NEXT: retq<br>
@@ -226,15 +224,14 @@ define <4 x double> @sitofp_4i64_to_4f64<br>
; AVX2: # BB#0:<br>
; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1<br>
; AVX2-NEXT: vpextrq $1, %xmm1, %rax<br>
-; AVX2-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2sdq %rax, %xmm2, %xmm2<br>
; AVX2-NEXT: vmovq %xmm1, %rax<br>
-; AVX2-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm1<br>
+; AVX2-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm1<br>
; AVX2-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]<br>
; AVX2-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX2-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm2<br>
; AVX2-NEXT: vmovq %xmm0, %rax<br>
-; AVX2-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX2-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0<br>
+; AVX2-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm0<br>
; AVX2-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]<br>
; AVX2-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0<br>
; AVX2-NEXT: retq<br>
@@ -243,15 +240,14 @@ define <4 x double> @sitofp_4i64_to_4f64<br>
; AVX512: # BB#0:<br>
; AVX512-NEXT: vextracti32x4 $1, %ymm0, %xmm1<br>
; AVX512-NEXT: vpextrq $1, %xmm1, %rax<br>
-; AVX512-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; AVX512-NEXT: vcvtsi2sdq %rax, %xmm2, %xmm2<br>
; AVX512-NEXT: vmovq %xmm1, %rax<br>
-; AVX512-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm1<br>
+; AVX512-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm1<br>
; AVX512-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]<br>
; AVX512-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX512-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; AVX512-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm2<br>
; AVX512-NEXT: vmovq %xmm0, %rax<br>
-; AVX512-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX512-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0<br>
+; AVX512-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm0<br>
; AVX512-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]<br>
; AVX512-NEXT: vinsertf32x4 $1, %xmm1, %ymm0, %ymm0<br>
; AVX512-NEXT: retq<br>
@@ -941,12 +937,11 @@ define <4 x float> @sitofp_2i64_to_4f32(<br>
; AVX-LABEL: sitofp_2i64_to_4f32:<br>
; AVX: # BB#0:<br>
; AVX-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; AVX-NEXT: vmovq %xmm0, %rax<br>
-; AVX-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm0<br>
; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]<br>
-; AVX-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm1<br>
; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]<br>
; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0]<br>
; AVX-NEXT: retq<br>
@@ -974,12 +969,11 @@ define <4 x float> @sitofp_4i64_to_4f32_<br>
; AVX-LABEL: sitofp_4i64_to_4f32_undef:<br>
; AVX: # BB#0:<br>
; AVX-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; AVX-NEXT: vmovq %xmm0, %rax<br>
-; AVX-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm0<br>
; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]<br>
-; AVX-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm1<br>
; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]<br>
; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0]<br>
; AVX-NEXT: retq<br>
@@ -1140,17 +1134,16 @@ define <4 x float> @sitofp_4i64_to_4f32(<br>
; AVX1-LABEL: sitofp_4i64_to_4f32:<br>
; AVX1: # BB#0:<br>
; AVX1-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; AVX1-NEXT: vmovq %xmm0, %rax<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]<br>
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0<br>
; AVX1-NEXT: vmovq %xmm0, %rax<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm2<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]<br>
; AVX1-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX1-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm0<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; AVX1-NEXT: vzeroupper<br>
; AVX1-NEXT: retq<br>
@@ -1158,17 +1151,16 @@ define <4 x float> @sitofp_4i64_to_4f32(<br>
; AVX2-LABEL: sitofp_4i64_to_4f32:<br>
; AVX2: # BB#0:<br>
; AVX2-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; AVX2-NEXT: vmovq %xmm0, %rax<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]<br>
; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm0<br>
; AVX2-NEXT: vmovq %xmm0, %rax<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm2<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]<br>
; AVX2-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX2-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm0<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; AVX2-NEXT: vzeroupper<br>
; AVX2-NEXT: retq<br>
@@ -1176,17 +1168,16 @@ define <4 x float> @sitofp_4i64_to_4f32(<br>
; AVX512-LABEL: sitofp_4i64_to_4f32:<br>
; AVX512: # BB#0:<br>
; AVX512-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; AVX512-NEXT: vmovq %xmm0, %rax<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; AVX512-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]<br>
; AVX512-NEXT: vextracti32x4 $1, %ymm0, %xmm0<br>
; AVX512-NEXT: vmovq %xmm0, %rax<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm2<br>
; AVX512-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]<br>
; AVX512-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX512-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm0<br>
; AVX512-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; AVX512-NEXT: retq<br>
%cvt = sitofp <4 x i64> %a to <4 x float><br>
@@ -1377,12 +1368,12 @@ define <4 x float> @uitofp_2i64_to_4f32(<br>
; VEX-NEXT: testq %rax, %rax<br>
; VEX-NEXT: js .LBB38_1<br>
; VEX-NEXT: # BB#2:<br>
-; VEX-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; VEX-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; VEX-NEXT: jmp .LBB38_3<br>
; VEX-NEXT: .LBB38_1:<br>
; VEX-NEXT: shrq %rax<br>
; VEX-NEXT: orq %rax, %rcx<br>
-; VEX-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm1<br>
+; VEX-NEXT: vcvtsi2ssq %rcx, %xmm1, %xmm1<br>
; VEX-NEXT: vaddss %xmm1, %xmm1, %xmm1<br>
; VEX-NEXT: .LBB38_3:<br>
; VEX-NEXT: vmovq %xmm0, %rax<br>
@@ -1391,14 +1382,12 @@ define <4 x float> @uitofp_2i64_to_4f32(<br>
; VEX-NEXT: testq %rax, %rax<br>
; VEX-NEXT: js .LBB38_4<br>
; VEX-NEXT: # BB#5:<br>
-; VEX-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; VEX-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm0<br>
; VEX-NEXT: jmp .LBB38_6<br>
; VEX-NEXT: .LBB38_4:<br>
; VEX-NEXT: shrq %rax<br>
; VEX-NEXT: orq %rax, %rcx<br>
-; VEX-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; VEX-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm0<br>
+; VEX-NEXT: vcvtsi2ssq %rcx, %xmm2, %xmm0<br>
; VEX-NEXT: vaddss %xmm0, %xmm0, %xmm0<br>
; VEX-NEXT: .LBB38_6:<br>
; VEX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]<br>
@@ -1406,7 +1395,7 @@ define <4 x float> @uitofp_2i64_to_4f32(<br>
; VEX-NEXT: testq %rax, %rax<br>
; VEX-NEXT: js .LBB38_8<br>
; VEX-NEXT: # BB#7:<br>
-; VEX-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm1<br>
; VEX-NEXT: .LBB38_8:<br>
; VEX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]<br>
; VEX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0]<br>
@@ -1485,12 +1474,12 @@ define <4 x float> @uitofp_4i64_to_4f32_<br>
; VEX-NEXT: testq %rax, %rax<br>
; VEX-NEXT: js .LBB39_1<br>
; VEX-NEXT: # BB#2:<br>
-; VEX-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; VEX-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; VEX-NEXT: jmp .LBB39_3<br>
; VEX-NEXT: .LBB39_1:<br>
; VEX-NEXT: shrq %rax<br>
; VEX-NEXT: orq %rax, %rcx<br>
-; VEX-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm1<br>
+; VEX-NEXT: vcvtsi2ssq %rcx, %xmm1, %xmm1<br>
; VEX-NEXT: vaddss %xmm1, %xmm1, %xmm1<br>
; VEX-NEXT: .LBB39_3:<br>
; VEX-NEXT: vmovq %xmm0, %rax<br>
@@ -1499,14 +1488,12 @@ define <4 x float> @uitofp_4i64_to_4f32_<br>
; VEX-NEXT: testq %rax, %rax<br>
; VEX-NEXT: js .LBB39_4<br>
; VEX-NEXT: # BB#5:<br>
-; VEX-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; VEX-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm0<br>
; VEX-NEXT: jmp .LBB39_6<br>
; VEX-NEXT: .LBB39_4:<br>
; VEX-NEXT: shrq %rax<br>
; VEX-NEXT: orq %rax, %rcx<br>
-; VEX-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; VEX-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm0<br>
+; VEX-NEXT: vcvtsi2ssq %rcx, %xmm2, %xmm0<br>
; VEX-NEXT: vaddss %xmm0, %xmm0, %xmm0<br>
; VEX-NEXT: .LBB39_6:<br>
; VEX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]<br>
@@ -1514,7 +1501,7 @@ define <4 x float> @uitofp_4i64_to_4f32_<br>
; VEX-NEXT: testq %rax, %rax<br>
; VEX-NEXT: js .LBB39_8<br>
; VEX-NEXT: # BB#7:<br>
-; VEX-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm1<br>
; VEX-NEXT: .LBB39_8:<br>
; VEX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]<br>
; VEX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0]<br>
@@ -1782,12 +1769,12 @@ define <4 x float> @uitofp_4i64_to_4f32(<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB45_1<br>
; AVX1-NEXT: # BB#2:<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; AVX1-NEXT: jmp .LBB45_3<br>
; AVX1-NEXT: .LBB45_1:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm1<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm1, %xmm1<br>
; AVX1-NEXT: vaddss %xmm1, %xmm1, %xmm1<br>
; AVX1-NEXT: .LBB45_3:<br>
; AVX1-NEXT: vmovq %xmm0, %rax<br>
@@ -1796,12 +1783,12 @@ define <4 x float> @uitofp_4i64_to_4f32(<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB45_4<br>
; AVX1-NEXT: # BB#5:<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; AVX1-NEXT: jmp .LBB45_6<br>
; AVX1-NEXT: .LBB45_4:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm2, %xmm2<br>
; AVX1-NEXT: vaddss %xmm2, %xmm2, %xmm2<br>
; AVX1-NEXT: .LBB45_6:<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]<br>
@@ -1812,12 +1799,12 @@ define <4 x float> @uitofp_4i64_to_4f32(<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB45_7<br>
; AVX1-NEXT: # BB#8:<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm2<br>
; AVX1-NEXT: jmp .LBB45_9<br>
; AVX1-NEXT: .LBB45_7:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm3, %xmm2<br>
; AVX1-NEXT: vaddss %xmm2, %xmm2, %xmm2<br>
; AVX1-NEXT: .LBB45_9:<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]<br>
@@ -1827,16 +1814,14 @@ define <4 x float> @uitofp_4i64_to_4f32(<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB45_10<br>
; AVX1-NEXT: # BB#11:<br>
-; AVX1-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm0<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; AVX1-NEXT: vzeroupper<br>
; AVX1-NEXT: retq<br>
; AVX1-NEXT: .LBB45_10:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm0<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm3, %xmm0<br>
; AVX1-NEXT: vaddss %xmm0, %xmm0, %xmm0<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; AVX1-NEXT: vzeroupper<br>
@@ -1850,12 +1835,12 @@ define <4 x float> @uitofp_4i64_to_4f32(<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB45_1<br>
; AVX2-NEXT: # BB#2:<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; AVX2-NEXT: jmp .LBB45_3<br>
; AVX2-NEXT: .LBB45_1:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm1<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm1, %xmm1<br>
; AVX2-NEXT: vaddss %xmm1, %xmm1, %xmm1<br>
; AVX2-NEXT: .LBB45_3:<br>
; AVX2-NEXT: vmovq %xmm0, %rax<br>
@@ -1864,12 +1849,12 @@ define <4 x float> @uitofp_4i64_to_4f32(<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB45_4<br>
; AVX2-NEXT: # BB#5:<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; AVX2-NEXT: jmp .LBB45_6<br>
; AVX2-NEXT: .LBB45_4:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm2, %xmm2<br>
; AVX2-NEXT: vaddss %xmm2, %xmm2, %xmm2<br>
; AVX2-NEXT: .LBB45_6:<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]<br>
@@ -1880,12 +1865,12 @@ define <4 x float> @uitofp_4i64_to_4f32(<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB45_7<br>
; AVX2-NEXT: # BB#8:<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm2<br>
; AVX2-NEXT: jmp .LBB45_9<br>
; AVX2-NEXT: .LBB45_7:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm3, %xmm2<br>
; AVX2-NEXT: vaddss %xmm2, %xmm2, %xmm2<br>
; AVX2-NEXT: .LBB45_9:<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]<br>
@@ -1895,16 +1880,14 @@ define <4 x float> @uitofp_4i64_to_4f32(<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB45_10<br>
; AVX2-NEXT: # BB#11:<br>
-; AVX2-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm0<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; AVX2-NEXT: vzeroupper<br>
; AVX2-NEXT: retq<br>
; AVX2-NEXT: .LBB45_10:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm0<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm3, %xmm0<br>
; AVX2-NEXT: vaddss %xmm0, %xmm0, %xmm0<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; AVX2-NEXT: vzeroupper<br>
@@ -2118,10 +2101,9 @@ define <2 x double> @sitofp_load_2i64_to<br>
; VEX: # BB#0:<br>
; VEX-NEXT: vmovdqa (%rdi), %xmm0<br>
; VEX-NEXT: vpextrq $1, %xmm0, %rax<br>
-; VEX-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm1<br>
+; VEX-NEXT: vcvtsi2sdq %rax, %xmm1, %xmm1<br>
; VEX-NEXT: vmovq %xmm0, %rax<br>
-; VEX-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; VEX-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0<br>
+; VEX-NEXT: vcvtsi2sdq %rax, %xmm2, %xmm0<br>
; VEX-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]<br>
; VEX-NEXT: retq<br>
;<br>
@@ -2129,10 +2111,9 @@ define <2 x double> @sitofp_load_2i64_to<br>
; AVX512: # BB#0:<br>
; AVX512-NEXT: vmovdqa64 (%rdi), %xmm0<br>
; AVX512-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX512-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm1<br>
+; AVX512-NEXT: vcvtsi2sdq %rax, %xmm1, %xmm1<br>
; AVX512-NEXT: vmovq %xmm0, %rax<br>
-; AVX512-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX512-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0<br>
+; AVX512-NEXT: vcvtsi2sdq %rax, %xmm2, %xmm0<br>
; AVX512-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]<br>
; AVX512-NEXT: retq<br>
%ld = load <2 x i64>, <2 x i64> *%a<br>
@@ -2231,15 +2212,14 @@ define <4 x double> @sitofp_load_4i64_to<br>
; AVX1-NEXT: vmovaps (%rdi), %ymm0<br>
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1<br>
; AVX1-NEXT: vpextrq $1, %xmm1, %rax<br>
-; AVX1-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2sdq %rax, %xmm2, %xmm2<br>
; AVX1-NEXT: vmovq %xmm1, %rax<br>
-; AVX1-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm1<br>
+; AVX1-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm1<br>
; AVX1-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]<br>
; AVX1-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX1-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm2<br>
; AVX1-NEXT: vmovq %xmm0, %rax<br>
-; AVX1-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX1-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0<br>
+; AVX1-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm0<br>
; AVX1-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]<br>
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0<br>
; AVX1-NEXT: retq<br>
@@ -2249,15 +2229,14 @@ define <4 x double> @sitofp_load_4i64_to<br>
; AVX2-NEXT: vmovdqa (%rdi), %ymm0<br>
; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm1<br>
; AVX2-NEXT: vpextrq $1, %xmm1, %rax<br>
-; AVX2-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2sdq %rax, %xmm2, %xmm2<br>
; AVX2-NEXT: vmovq %xmm1, %rax<br>
-; AVX2-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm1<br>
+; AVX2-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm1<br>
; AVX2-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]<br>
; AVX2-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX2-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm2<br>
; AVX2-NEXT: vmovq %xmm0, %rax<br>
-; AVX2-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX2-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0<br>
+; AVX2-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm0<br>
; AVX2-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]<br>
; AVX2-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0<br>
; AVX2-NEXT: retq<br>
@@ -2267,15 +2246,14 @@ define <4 x double> @sitofp_load_4i64_to<br>
; AVX512-NEXT: vmovdqa64 (%rdi), %ymm0<br>
; AVX512-NEXT: vextracti32x4 $1, %ymm0, %xmm1<br>
; AVX512-NEXT: vpextrq $1, %xmm1, %rax<br>
-; AVX512-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; AVX512-NEXT: vcvtsi2sdq %rax, %xmm2, %xmm2<br>
; AVX512-NEXT: vmovq %xmm1, %rax<br>
-; AVX512-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm1<br>
+; AVX512-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm1<br>
; AVX512-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]<br>
; AVX512-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX512-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm2<br>
+; AVX512-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm2<br>
; AVX512-NEXT: vmovq %xmm0, %rax<br>
-; AVX512-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX512-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm0<br>
+; AVX512-NEXT: vcvtsi2sdq %rax, %xmm3, %xmm0<br>
; AVX512-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]<br>
; AVX512-NEXT: vinsertf32x4 $1, %xmm1, %ymm0, %ymm0<br>
; AVX512-NEXT: retq<br>
@@ -2756,17 +2734,16 @@ define <4 x float> @sitofp_load_4i64_to_<br>
; AVX1: # BB#0:<br>
; AVX1-NEXT: vmovdqa (%rdi), %ymm0<br>
; AVX1-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; AVX1-NEXT: vmovq %xmm0, %rax<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]<br>
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0<br>
; AVX1-NEXT: vmovq %xmm0, %rax<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm2<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]<br>
; AVX1-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX1-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm0<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; AVX1-NEXT: vzeroupper<br>
; AVX1-NEXT: retq<br>
@@ -2775,17 +2752,16 @@ define <4 x float> @sitofp_load_4i64_to_<br>
; AVX2: # BB#0:<br>
; AVX2-NEXT: vmovdqa (%rdi), %ymm0<br>
; AVX2-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; AVX2-NEXT: vmovq %xmm0, %rax<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]<br>
; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm0<br>
; AVX2-NEXT: vmovq %xmm0, %rax<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm2<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]<br>
; AVX2-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX2-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm0<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; AVX2-NEXT: vzeroupper<br>
; AVX2-NEXT: retq<br>
@@ -2794,17 +2770,16 @@ define <4 x float> @sitofp_load_4i64_to_<br>
; AVX512: # BB#0:<br>
; AVX512-NEXT: vmovdqa64 (%rdi), %ymm0<br>
; AVX512-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; AVX512-NEXT: vmovq %xmm0, %rax<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; AVX512-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]<br>
; AVX512-NEXT: vextracti32x4 $1, %ymm0, %xmm0<br>
; AVX512-NEXT: vmovq %xmm0, %rax<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm2<br>
; AVX512-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]<br>
; AVX512-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX512-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm0<br>
; AVX512-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; AVX512-NEXT: retq<br>
%ld = load <4 x i64>, <4 x i64> *%a<br>
@@ -2912,29 +2887,28 @@ define <8 x float> @sitofp_load_8i64_to_<br>
; AVX1-NEXT: vmovdqa (%rdi), %ymm0<br>
; AVX1-NEXT: vmovdqa 32(%rdi), %ymm1<br>
; AVX1-NEXT: vpextrq $1, %xmm1, %rax<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; AVX1-NEXT: vmovq %xmm1, %rax<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm3<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[2,3]<br>
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm1<br>
; AVX1-NEXT: vmovq %xmm1, %rax<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm3<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1],xmm3[0],xmm2[3]<br>
; AVX1-NEXT: vpextrq $1, %xmm1, %rax<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm1<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0,1,2],xmm1[0]<br>
; AVX1-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm2<br>
; AVX1-NEXT: vmovq %xmm0, %rax<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm3<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[2,3]<br>
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0<br>
; AVX1-NEXT: vmovq %xmm0, %rax<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm3<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1],xmm3[0],xmm2[3]<br>
; AVX1-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX1-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm0<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm2[0,1,2],xmm0[0]<br>
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0<br>
; AVX1-NEXT: retq<br>
@@ -2944,29 +2918,28 @@ define <8 x float> @sitofp_load_8i64_to_<br>
; AVX2-NEXT: vmovdqa (%rdi), %ymm0<br>
; AVX2-NEXT: vmovdqa 32(%rdi), %ymm1<br>
; AVX2-NEXT: vpextrq $1, %xmm1, %rax<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; AVX2-NEXT: vmovq %xmm1, %rax<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm3<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[2,3]<br>
; AVX2-NEXT: vextracti128 $1, %ymm1, %xmm1<br>
; AVX2-NEXT: vmovq %xmm1, %rax<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm3<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1],xmm3[0],xmm2[3]<br>
; AVX2-NEXT: vpextrq $1, %xmm1, %rax<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm1<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0,1,2],xmm1[0]<br>
; AVX2-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm2<br>
; AVX2-NEXT: vmovq %xmm0, %rax<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm3<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[2,3]<br>
; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm0<br>
; AVX2-NEXT: vmovq %xmm0, %rax<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm3<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1],xmm3[0],xmm2[3]<br>
; AVX2-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX2-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm0<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm0 = xmm2[0,1,2],xmm0[0]<br>
; AVX2-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0<br>
; AVX2-NEXT: retq<br>
@@ -2976,29 +2949,28 @@ define <8 x float> @sitofp_load_8i64_to_<br>
; AVX512-NEXT: vmovdqa64 (%rdi), %zmm0<br>
; AVX512-NEXT: vextracti32x4 $2, %zmm0, %xmm1<br>
; AVX512-NEXT: vpextrq $1, %xmm1, %rax<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; AVX512-NEXT: vmovq %xmm1, %rax<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm1<br>
; AVX512-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[2,3]<br>
; AVX512-NEXT: vextracti32x4 $3, %zmm0, %xmm2<br>
; AVX512-NEXT: vmovq %xmm2, %rax<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm3<br>
; AVX512-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm3[0],xmm1[3]<br>
; AVX512-NEXT: vpextrq $1, %xmm2, %rax<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm2<br>
; AVX512-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],xmm2[0]<br>
; AVX512-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm2<br>
; AVX512-NEXT: vmovq %xmm0, %rax<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm3<br>
; AVX512-NEXT: vinsertps {{.*#+}} xmm2 = xmm3[0],xmm2[0],xmm3[2,3]<br>
; AVX512-NEXT: vextracti32x4 $1, %zmm0, %xmm0<br>
; AVX512-NEXT: vmovq %xmm0, %rax<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm3<br>
; AVX512-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1],xmm3[0],xmm2[3]<br>
; AVX512-NEXT: vpextrq $1, %xmm0, %rax<br>
-; AVX512-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX512-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX512-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm0<br>
; AVX512-NEXT: vinsertps {{.*#+}} xmm0 = xmm2[0,1,2],xmm0[0]<br>
; AVX512-NEXT: vinsertf32x4 $1, %xmm1, %ymm0, %ymm0<br>
; AVX512-NEXT: retq<br>
@@ -3186,12 +3158,12 @@ define <4 x float> @uitofp_load_4i64_to_<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB74_1<br>
; AVX1-NEXT: # BB#2:<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; AVX1-NEXT: jmp .LBB74_3<br>
; AVX1-NEXT: .LBB74_1:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm1<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm1, %xmm1<br>
; AVX1-NEXT: vaddss %xmm1, %xmm1, %xmm1<br>
; AVX1-NEXT: .LBB74_3:<br>
; AVX1-NEXT: vmovq %xmm0, %rax<br>
@@ -3200,12 +3172,12 @@ define <4 x float> @uitofp_load_4i64_to_<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB74_4<br>
; AVX1-NEXT: # BB#5:<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; AVX1-NEXT: jmp .LBB74_6<br>
; AVX1-NEXT: .LBB74_4:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm2, %xmm2<br>
; AVX1-NEXT: vaddss %xmm2, %xmm2, %xmm2<br>
; AVX1-NEXT: .LBB74_6:<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]<br>
@@ -3216,12 +3188,12 @@ define <4 x float> @uitofp_load_4i64_to_<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB74_7<br>
; AVX1-NEXT: # BB#8:<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm2<br>
; AVX1-NEXT: jmp .LBB74_9<br>
; AVX1-NEXT: .LBB74_7:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm3, %xmm2<br>
; AVX1-NEXT: vaddss %xmm2, %xmm2, %xmm2<br>
; AVX1-NEXT: .LBB74_9:<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]<br>
@@ -3231,16 +3203,14 @@ define <4 x float> @uitofp_load_4i64_to_<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB74_10<br>
; AVX1-NEXT: # BB#11:<br>
-; AVX1-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm0<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; AVX1-NEXT: vzeroupper<br>
; AVX1-NEXT: retq<br>
; AVX1-NEXT: .LBB74_10:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm0<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm3, %xmm0<br>
; AVX1-NEXT: vaddss %xmm0, %xmm0, %xmm0<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; AVX1-NEXT: vzeroupper<br>
@@ -3255,12 +3225,12 @@ define <4 x float> @uitofp_load_4i64_to_<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB74_1<br>
; AVX2-NEXT: # BB#2:<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; AVX2-NEXT: jmp .LBB74_3<br>
; AVX2-NEXT: .LBB74_1:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm1<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm1, %xmm1<br>
; AVX2-NEXT: vaddss %xmm1, %xmm1, %xmm1<br>
; AVX2-NEXT: .LBB74_3:<br>
; AVX2-NEXT: vmovq %xmm0, %rax<br>
@@ -3269,12 +3239,12 @@ define <4 x float> @uitofp_load_4i64_to_<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB74_4<br>
; AVX2-NEXT: # BB#5:<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm2<br>
; AVX2-NEXT: jmp .LBB74_6<br>
; AVX2-NEXT: .LBB74_4:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm2, %xmm2<br>
; AVX2-NEXT: vaddss %xmm2, %xmm2, %xmm2<br>
; AVX2-NEXT: .LBB74_6:<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]<br>
@@ -3285,12 +3255,12 @@ define <4 x float> @uitofp_load_4i64_to_<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB74_7<br>
; AVX2-NEXT: # BB#8:<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm2<br>
; AVX2-NEXT: jmp .LBB74_9<br>
; AVX2-NEXT: .LBB74_7:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm3, %xmm2<br>
; AVX2-NEXT: vaddss %xmm2, %xmm2, %xmm2<br>
; AVX2-NEXT: .LBB74_9:<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]<br>
@@ -3300,16 +3270,14 @@ define <4 x float> @uitofp_load_4i64_to_<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB74_10<br>
; AVX2-NEXT: # BB#11:<br>
-; AVX2-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm0<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm0<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; AVX2-NEXT: vzeroupper<br>
; AVX2-NEXT: retq<br>
; AVX2-NEXT: .LBB74_10:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm0<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm3, %xmm0<br>
; AVX2-NEXT: vaddss %xmm0, %xmm0, %xmm0<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]<br>
; AVX2-NEXT: vzeroupper<br>
@@ -3581,12 +3549,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB78_1<br>
; AVX1-NEXT: # BB#2:<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; AVX1-NEXT: jmp .LBB78_3<br>
; AVX1-NEXT: .LBB78_1:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm1<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm1, %xmm1<br>
; AVX1-NEXT: vaddss %xmm1, %xmm1, %xmm1<br>
; AVX1-NEXT: .LBB78_3:<br>
; AVX1-NEXT: vmovq %xmm2, %rax<br>
@@ -3595,12 +3563,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB78_4<br>
; AVX1-NEXT: # BB#5:<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm3<br>
; AVX1-NEXT: jmp .LBB78_6<br>
; AVX1-NEXT: .LBB78_4:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm3<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm3, %xmm3<br>
; AVX1-NEXT: vaddss %xmm3, %xmm3, %xmm3<br>
; AVX1-NEXT: .LBB78_6:<br>
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm2<br>
@@ -3610,12 +3578,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB78_7<br>
; AVX1-NEXT: # BB#8:<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm4<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm4<br>
; AVX1-NEXT: jmp .LBB78_9<br>
; AVX1-NEXT: .LBB78_7:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm4<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm4, %xmm4<br>
; AVX1-NEXT: vaddss %xmm4, %xmm4, %xmm4<br>
; AVX1-NEXT: .LBB78_9:<br>
; AVX1-NEXT: vpextrq $1, %xmm2, %rax<br>
@@ -3624,12 +3592,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB78_10<br>
; AVX1-NEXT: # BB#11:<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm5, %xmm2<br>
; AVX1-NEXT: jmp .LBB78_12<br>
; AVX1-NEXT: .LBB78_10:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm5, %xmm2<br>
; AVX1-NEXT: vaddss %xmm2, %xmm2, %xmm2<br>
; AVX1-NEXT: .LBB78_12:<br>
; AVX1-NEXT: vpextrq $1, %xmm0, %rax<br>
@@ -3638,12 +3606,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB78_13<br>
; AVX1-NEXT: # BB#14:<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm5<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm5, %xmm5<br>
; AVX1-NEXT: jmp .LBB78_15<br>
; AVX1-NEXT: .LBB78_13:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm5<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm5, %xmm5<br>
; AVX1-NEXT: vaddss %xmm5, %xmm5, %xmm5<br>
; AVX1-NEXT: .LBB78_15:<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm3[0],xmm1[0],xmm3[2,3]<br>
@@ -3653,12 +3621,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB78_16<br>
; AVX1-NEXT: # BB#17:<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm6, %xmm3<br>
; AVX1-NEXT: jmp .LBB78_18<br>
; AVX1-NEXT: .LBB78_16:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm3<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm6, %xmm3<br>
; AVX1-NEXT: vaddss %xmm3, %xmm3, %xmm3<br>
; AVX1-NEXT: .LBB78_18:<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm4[0],xmm1[3]<br>
@@ -3670,14 +3638,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB78_19<br>
; AVX1-NEXT: # BB#20:<br>
-; AVX1-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm5<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm6, %xmm5<br>
; AVX1-NEXT: jmp .LBB78_21<br>
; AVX1-NEXT: .LBB78_19:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm0<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm6, %xmm0<br>
; AVX1-NEXT: vaddss %xmm0, %xmm0, %xmm5<br>
; AVX1-NEXT: .LBB78_21:<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm2[0]<br>
@@ -3688,12 +3654,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX1-NEXT: testq %rax, %rax<br>
; AVX1-NEXT: js .LBB78_22<br>
; AVX1-NEXT: # BB#23:<br>
-; AVX1-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rax, %xmm6, %xmm2<br>
; AVX1-NEXT: jmp .LBB78_24<br>
; AVX1-NEXT: .LBB78_22:<br>
; AVX1-NEXT: shrq %rax<br>
; AVX1-NEXT: orq %rax, %rcx<br>
-; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm2<br>
+; AVX1-NEXT: vcvtsi2ssq %rcx, %xmm6, %xmm2<br>
; AVX1-NEXT: vaddss %xmm2, %xmm2, %xmm2<br>
; AVX1-NEXT: .LBB78_24:<br>
; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],xmm2[0]<br>
@@ -3710,12 +3676,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB78_1<br>
; AVX2-NEXT: # BB#2:<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm1<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1<br>
; AVX2-NEXT: jmp .LBB78_3<br>
; AVX2-NEXT: .LBB78_1:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm1<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm1, %xmm1<br>
; AVX2-NEXT: vaddss %xmm1, %xmm1, %xmm1<br>
; AVX2-NEXT: .LBB78_3:<br>
; AVX2-NEXT: vmovq %xmm2, %rax<br>
@@ -3724,12 +3690,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB78_4<br>
; AVX2-NEXT: # BB#5:<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm3, %xmm3<br>
; AVX2-NEXT: jmp .LBB78_6<br>
; AVX2-NEXT: .LBB78_4:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm3<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm3, %xmm3<br>
; AVX2-NEXT: vaddss %xmm3, %xmm3, %xmm3<br>
; AVX2-NEXT: .LBB78_6:<br>
; AVX2-NEXT: vextracti128 $1, %ymm2, %xmm2<br>
@@ -3739,12 +3705,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB78_7<br>
; AVX2-NEXT: # BB#8:<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm4<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm4, %xmm4<br>
; AVX2-NEXT: jmp .LBB78_9<br>
; AVX2-NEXT: .LBB78_7:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm4<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm4, %xmm4<br>
; AVX2-NEXT: vaddss %xmm4, %xmm4, %xmm4<br>
; AVX2-NEXT: .LBB78_9:<br>
; AVX2-NEXT: vpextrq $1, %xmm2, %rax<br>
@@ -3753,12 +3719,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB78_10<br>
; AVX2-NEXT: # BB#11:<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm5, %xmm2<br>
; AVX2-NEXT: jmp .LBB78_12<br>
; AVX2-NEXT: .LBB78_10:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm5, %xmm2<br>
; AVX2-NEXT: vaddss %xmm2, %xmm2, %xmm2<br>
; AVX2-NEXT: .LBB78_12:<br>
; AVX2-NEXT: vpextrq $1, %xmm0, %rax<br>
@@ -3767,12 +3733,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB78_13<br>
; AVX2-NEXT: # BB#14:<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm5<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm5, %xmm5<br>
; AVX2-NEXT: jmp .LBB78_15<br>
; AVX2-NEXT: .LBB78_13:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm5<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm5, %xmm5<br>
; AVX2-NEXT: vaddss %xmm5, %xmm5, %xmm5<br>
; AVX2-NEXT: .LBB78_15:<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm3[0],xmm1[0],xmm3[2,3]<br>
@@ -3782,12 +3748,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB78_16<br>
; AVX2-NEXT: # BB#17:<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm3<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm6, %xmm3<br>
; AVX2-NEXT: jmp .LBB78_18<br>
; AVX2-NEXT: .LBB78_16:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm3<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm6, %xmm3<br>
; AVX2-NEXT: vaddss %xmm3, %xmm3, %xmm3<br>
; AVX2-NEXT: .LBB78_18:<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm4[0],xmm1[3]<br>
@@ -3799,14 +3765,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB78_19<br>
; AVX2-NEXT: # BB#20:<br>
-; AVX2-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm5<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm6, %xmm5<br>
; AVX2-NEXT: jmp .LBB78_21<br>
; AVX2-NEXT: .LBB78_19:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm0<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm6, %xmm0<br>
; AVX2-NEXT: vaddss %xmm0, %xmm0, %xmm5<br>
; AVX2-NEXT: .LBB78_21:<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm2[0]<br>
@@ -3817,12 +3781,12 @@ define <8 x float> @uitofp_load_8i64_to_<br>
; AVX2-NEXT: testq %rax, %rax<br>
; AVX2-NEXT: js .LBB78_22<br>
; AVX2-NEXT: # BB#23:<br>
-; AVX2-NEXT: vcvtsi2ssq %rax, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rax, %xmm6, %xmm2<br>
; AVX2-NEXT: jmp .LBB78_24<br>
; AVX2-NEXT: .LBB78_22:<br>
; AVX2-NEXT: shrq %rax<br>
; AVX2-NEXT: orq %rax, %rcx<br>
-; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm0, %xmm2<br>
+; AVX2-NEXT: vcvtsi2ssq %rcx, %xmm6, %xmm2<br>
; AVX2-NEXT: vaddss %xmm2, %xmm2, %xmm2<br>
; AVX2-NEXT: .LBB78_24:<br>
; AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],xmm2[0]<br>
<br>
<br>
______________________________<wbr>_________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>
</blockquote></div></div></div><br></div></div>
<br>______________________________<wbr>_________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>
<br></blockquote></div><br></div>
</div></div><br>______________________________<wbr>_________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>
<br></blockquote></div><br></div>