r276102 - [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR

Mon Jul 25 05:50:31 PDT 2016

Hi Simon (and all),

I noticed that this patch changes the definition of intrinsic _mm_cvtsd2_ss
in emmintrin.h. Is that intentional? My understanding is that your patch
should have only addressed float-to-integer conversions.

Was this change to _mm_cvtsd_ss motivated by the fact that (V)CVTSD2SS
depends on the rounding mode (control bits in the MXCSR register) for
inexact conversions? That would explain why the LLVM part (r275981) also
added a codegen test for that double-to-float conversion (I guess that none
of the reviewer spotted that extra test).

The problem I am seeing is that your change is causing a crash in the
backend (at -O1 and above). See the reproducible below:

/////
#include <x86intrin.h>

__m128 test(__m128 a, const __m128d &b) {
  return _mm_cvtsd_ss(a, b);
}
/////

Alternatively, here is the LLVM IR:

define <4 x float> @test(<4 x float> %a, <2 x double>* nocapture readonly
%b) {
entry:
  %0 = load <2 x double>, <2 x double>* %b, align 16
  %1 = tail call <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float> %a, <2 x
double> %0)
  ret <4 x float> %1
}

; Function Attrs: nounwind readnone
declare <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float>, <2 x double>)

--------

With your patch, we now always generate a call to @llvm.x86.sse2.cvtsd2ss
when expanding the builtin call from _mm_cvtsd_ss.

ISel would then select `Int_CVTSD2SSrm` instruction, which however is
`CodeGenOnly`. Unfortunately that pseudo is never further expanded/replaced
before compilation reaches the object emitter stage. So, before we execute
Pass 'X86 Assembly/Object Emitter' we see machine code like this:

BB#0: derived from LLVM BB %entry
    Live Ins: %RDI %XMM0
        %XMM0<def> = Int_VCVTSD2SSrm %XMM0<kill>, %RDI<kill>, 1, %noreg, 0,
%noreg; mem:LD16[%b]
        RETQ %XMM0<kill>

.. which then causes  the following assertion failure:

[2016-07-25 13:25:11.830351700] 0x7bad5f87e0     Executing Pass 'X86
Assembly / Object Emitter' on Function 'test'...
Cannot encode all operands of: <MCInst 1074 <MCOperand Reg:126> <MCOperand
Reg:126> <MCOperand Reg:39> <MCOperand Imm:1> <MCOperand Reg:0> <MCOperand
Imm:0> <MCOperand Reg:0>>

Overall, I agree that the compiler shouldn't make assumptions on the
rounding mode when coverting from double-to-float. However, the RM variant
of Int_VCVTSD2SS doesn't seem to be correctly handled/expanded by the
backend.

Can we revert the change to the double-to-single convert? Alternatively,
can we fix the codegen issue exposed by this change (and add extra test
coverage)?
My opinion is that the change to the double-to-float conversion intrinsic
should have been part of a separate patch.

Thanks,
Andrea

On Fri, Jul 22, 2016 at 2:18 PM, Hans Wennborg via cfe-commits <
cfe-commits at lists.llvm.org> wrote:

> On Thu, Jul 21, 2016 at 6:34 PM, Robinson, Paul via cfe-commits
> <cfe-commits at lists.llvm.org> wrote:
> >
> >
> >> -----Original Message-----
> >> From: cfe-commits [mailto:cfe-commits-bounces at lists.llvm.org] On
> Behalf Of
> >> Simon Pilgrim via cfe-commits
> >> Sent: Wednesday, July 20, 2016 3:18 AM
> >> To: cfe-commits at lists.llvm.org
> >> Subject: r276102 - [X86][SSE] Reimplement SSE fp2si conversion
> intrinsics
> >> instead of using generic IR
> >>
> >> Author: rksimon
> >> Date: Wed Jul 20 05:18:01 2016
> >> New Revision: 276102
> >>
> >> URL: http://llvm.org/viewvc/llvm-project?rev=276102&view=rev
> >> Log:
> >> [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using
> >> generic IR
> >>
> >> D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and
> VCVTTPD2DQ
> >> truncating conversions with generic IR instead.
> >>
> >> It turns out that the behaviour of these intrinsics is different enough
> >> from generic IR that this will cause problems, INF/NAN/out of range
> values
> >> are guaranteed to result in a 0x80000000 value - which plays havoc with
> >> constant folding which converts them to either zero or UNDEF. This is
> also
> >> an issue with the scalar implementations (which were already generic IR
> >> and what I was trying to match).
> >
> > Are the problems enough that this should be merged to the 3.9 release
> branch?
> > --paulr
>
> IIUC, this is the Clang-side of r275981, and if we merge that this
> should probably be merged too.
>
> Thanks,
> Hans
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20160725/70f2e732/attachment.html>