<div dir="ltr"><div><div><div><div>Hi Simon (and all),<br><br></div>I noticed that this patch changes the definition of intrinsic _mm_cvtsd2_ss in emmintrin.h. Is that intentional? My understanding is that your patch should have only addressed float-to-integer conversions.<br><br></div><div>Was this change to _mm_cvtsd_ss motivated by the fact that (V)CVTSD2SS depends on the rounding mode (control bits in the MXCSR register) for inexact conversions? That would explain why the LLVM part (r275981) also added a codegen test for that double-to-float conversion (I guess that none of the reviewer spotted that extra test).<br><br></div><div>The problem I am seeing is that your change is causing a crash in the backend (at -O1 and above). See the reproducible below:<br><br>/////<br></div><div>#include <x86intrin.h><br><br></div><div>__m128 test(__m128 a, const __m128d &b) {<br></div><div>  return _mm_cvtsd_ss(a, b);<br></div><div>}<br>/////<br><br></div><div>Alternatively, here is the LLVM IR:<br><br>define <4 x float> @test(<4 x float> %a, <2 x double>* nocapture readonly %b) {<br>entry:<br>  %0 = load <2 x double>, <2 x double>* %b, align 16<br>  %1 = tail call <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float> %a, <2 x double> %0)<br>  ret <4 x float> %1<br>}<br><br>; Function Attrs: nounwind readnone<br>declare <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float>, <2 x double>)<br><br>--------<br><br>With your patch, we now always generate a call to @llvm.x86.sse2.cvtsd2ss when expanding the builtin call from _mm_cvtsd_ss.<br><br>ISel would then select `Int_CVTSD2SSrm` instruction, which however is `CodeGenOnly`. Unfortunately that pseudo is never further expanded/replaced before compilation reaches the object emitter stage. So, before we execute Pass 'X86 Assembly/Object Emitter' we see machine code like this:<br><br>BB#0: derived from LLVM BB %entry<br>    Live Ins: %RDI %XMM0<br>        %XMM0<def> = Int_VCVTSD2SSrm %XMM0<kill>, %RDI<kill>, 1, %noreg, 0, %noreg; mem:LD16[%b]<br>        RETQ %XMM0<kill><br><br><br></div><div>.. which then causes  the following assertion failure:<br><br></div><div>[2016-07-25 13:25:11.830351700] 0x7bad5f87e0     Executing Pass 'X86 Assembly / Object Emitter' on Function 'test'...<br>Cannot encode all operands of: <MCInst 1074 <MCOperand Reg:126> <MCOperand Reg:126> <MCOperand Reg:39> <MCOperand Imm:1> <MCOperand Reg:0> <MCOperand Imm:0> <MCOperand Reg:0>><br><br><br></div>Overall, I agree that the compiler shouldn't make assumptions on the rounding mode when coverting from double-to-float. However, the RM variant of Int_VCVTSD2SS doesn't seem to be correctly handled/expanded by the backend.<br><br>Can we revert the change to the double-to-single convert? Alternatively,

 can we fix the codegen issue exposed by this change (and add extra test

 coverage)?<br>My opinion is that the change to the double-to-float conversion intrinsic should have been part of a separate patch.<br><br><br></div><div>Thanks,<br></div><div>Andrea<br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jul 22, 2016 at 2:18 PM, Hans Wennborg via cfe-commits <span dir="ltr"><<a href="mailto:cfe-commits@lists.llvm.org" target="_blank">cfe-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Thu, Jul 21, 2016 at 6:34 PM, Robinson, Paul via cfe-commits<br>

<<a href="mailto:cfe-commits@lists.llvm.org">cfe-commits@lists.llvm.org</a>> wrote:<br>

><br>

><br>

>> -----Original Message-----<br>

>> From: cfe-commits [mailto:<a href="mailto:cfe-commits-bounces@lists.llvm.org">cfe-commits-bounces@lists.llvm.org</a>] On Behalf Of<br>

>> Simon Pilgrim via cfe-commits<br>

>> Sent: Wednesday, July 20, 2016 3:18 AM<br>

>> To: <a href="mailto:cfe-commits@lists.llvm.org">cfe-commits@lists.llvm.org</a><br>

>> Subject: r276102 - [X86][SSE] Reimplement SSE fp2si conversion intrinsics<br>

>> instead of using generic IR<br>

>><br>

>> Author: rksimon<br>

>> Date: Wed Jul 20 05:18:01 2016<br>

>> New Revision: 276102<br>

>><br>

>> URL: <a href="http://llvm.org/viewvc/llvm-project?rev=276102&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=276102&view=rev</a><br>

>> Log:<br>

>> [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using<br>

>> generic IR<br>

>><br>

>> D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ<br>

>> truncating conversions with generic IR instead.<br>

>><br>

>> It turns out that the behaviour of these intrinsics is different enough<br>

>> from generic IR that this will cause problems, INF/NAN/out of range values<br>

>> are guaranteed to result in a 0x80000000 value - which plays havoc with<br>

>> constant folding which converts them to either zero or UNDEF. This is also<br>

>> an issue with the scalar implementations (which were already generic IR<br>

>> and what I was trying to match).<br>

><br>

> Are the problems enough that this should be merged to the 3.9 release branch?<br>

> --paulr<br>

<br>

</div></div>IIUC, this is the Clang-side of r275981, and if we merge that this<br>

should probably be merged too.<br>

<br>

Thanks,<br>

Hans<br>

<div class="HOEnZb"><div class="h5">_______________________________________________<br>

cfe-commits mailing list<br>

<a href="mailto:cfe-commits@lists.llvm.org">cfe-commits@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits</a><br>

</div></div></blockquote></div><br></div>