[llvm] r275981 - [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR

Thu Jul 28 08:49:22 PDT 2016

Thanks for the update.

I've merged this (r275981) and the tablegen fix (r276740) in r276990,
and the Clang counter-part (r276102) in r276991.

Cheers,
Hans

On Wed, Jul 27, 2016 at 5:39 PM, Andrea Di Biagio
<andrea.dibiagio at gmail.com> wrote:
> Hi Hans,
>
> Simon fixed all the issues at revision 276740.
> http://llvm.org/viewvc/llvm-project?view=revision&revision=276740
>
> Tablegen definitions for instructions Int_VCVTSD2SSrm and Int_CVTSD2SSrm had
> the wrong encoding format.
> Before revision 276740, those instructions were defined with encoding format
> `MRMSrcReg` instead of `MRMSrcMem`. This was causing problems in the code
> emitter since the wrong logic was used to encode the memory operand of those
> instructions.
>
> Basically, if we decide to integrate 275981, then we also have to integrate
> revision 276740.
>
> -Andrea
>
>
>
> On Thu, Jul 28, 2016 at 12:53 AM, Hans Wennborg <hans at chromium.org> wrote:
>>
>> I'm holding off merging since there seems to be issues here. Has there
>> been any progress?
>>
>> Thanks,
>> Hans
>>
>> On Mon, Jul 25, 2016 at 6:06 AM, Andrea Di Biagio
>> <andrea.dibiagio at gmail.com> wrote:
>> > For the record: the clang counterpart (revision 275981) introduced a
>> > regression.
>> >
>> > After revision 275981 the compiler fails to build this test:
>> >
>> > ///////
>> > target triple = "x86_64-unknown-unknown"
>> >
>> > define <4 x float> @test(<4 x float> %a, <2 x double>* nocapture
>> > readonly
>> > %b) {
>> > entry:
>> >   %0 = load <2 x double>, <2 x double>* %b, align 16
>> >   %1 = tail call <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float> %a, <2
>> > x
>> > double> %0)
>> >   ret <4 x float> %1
>> > }
>> >
>> > declare <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float>, <2 x double>)
>> > ////////
>> >
>> >> llc test.ll -o - -mattr=+avx
>> >
>> > in X86MCCodeEmitter::encodeInstruction
>> > Cannot encode all operands of: <MCInst 1073 <MCOperand Reg:126>
>> > <MCOperand
>> > Reg:126> <MCOperand Reg:39> <MCOperand Imm:1> <MCOperand Reg:0>
>> > <MCOperand
>> > Imm:0> <MCOperand Reg:0>>
>> >
>> > I have already commented on the clang thread for revision 275981.
>> >
>> > -Andrea
>> >
>> > On Sat, Jul 23, 2016 at 5:15 PM, Nadav Rotem via llvm-commits
>> > <llvm-commits at lists.llvm.org> wrote:
>> >>
>> >> LGTM.
>> >>
>> >>
>> >> > On Jul 22, 2016, at 6:14 AM, Hans Wennborg <hans at chromium.org> wrote:
>> >> >
>> >> > Nadav: you're the X86 owner. What do you think?
>> >> >
>> >> >> On Thu, Jul 21, 2016 at 5:41 PM, Eli Friedman
>> >> >> <eli.friedman at gmail.com>
>> >> >> wrote:
>> >> >> Nominating for backport to 3.9, so the intrinsics in question remain
>> >> >> available.
>> >> >>
>> >> >> -Eli
>> >> >>
>> >> >>
>> >> >> On Tue, Jul 19, 2016 at 8:07 AM, Simon Pilgrim via llvm-commits
>> >> >> <llvm-commits at lists.llvm.org> wrote:
>> >> >>>
>> >> >>> Author: rksimon
>> >> >>> Date: Tue Jul 19 10:07:43 2016
>> >> >>> New Revision: 275981
>> >> >>>
>> >> >>> URL: http://llvm.org/viewvc/llvm-project?rev=275981&view=rev
>> >> >>> Log:
>> >> >>> [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of
>> >> >>> using
>> >> >>> generic IR
>> >> >>>
>> >> >>> D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and
>> >> >>> VCVTTPD2DQ
>> >> >>> truncating conversions with generic IR instead.
>> >> >>>
>> >> >>> It turns out that the behaviour of these intrinsics is different
>> >> >>> enough
>> >> >>> from generic IR that this will cause problems, INF/NAN/out of range
>> >> >>> values
>> >> >>> are guaranteed to result in a 0x80000000 value - which plays havoc
>> >> >>> with
>> >> >>> constant folding which converts them to either zero or UNDEF. This
>> >> >>> is
>> >> >>> also
>> >> >>> an issue with the scalar implementations (which were already
>> >> >>> generic
>> >> >>> IR and
>> >> >>> what I was trying to match).
>> >> >>>
>> >> >>> This patch changes both scalar and packed versions back to using
>> >> >>> x86-specific builtins.
>> >> >>>
>> >> >>> It also deals with the other scalar conversion cases that are
>> >> >>> runtime
>> >> >>> rounding mode dependent and can have similar issues with constant
>> >> >>> folding.
>> >> >>>
>> >> >>> A companion clang patch is at D22105
>> >> >>>
>> >> >>> Differential Revision: https://reviews.llvm.org/D22106
>> >> >>>
>> >> >>> Modified:
>> >> >>>    llvm/trunk/include/llvm/IR/IntrinsicsX86.td
>> >> >>>    llvm/trunk/lib/Analysis/ConstantFolding.cpp
>> >> >>>    llvm/trunk/lib/IR/AutoUpgrade.cpp
>> >> >>>    llvm/trunk/lib/Target/X86/X86InstrSSE.td
>> >> >>>    llvm/trunk/test/CodeGen/X86/avx-intrinsics-fast-isel.ll
>> >> >>>    llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll
>> >> >>>    llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86.ll
>> >> >>>    llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel-x86_64.ll
>> >> >>>    llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel.ll
>> >> >>>    llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel-x86_64.ll
>> >> >>>    llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll
>> >> >>>    llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll
>> >> >>>    llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86.ll
>> >> >>>    llvm/trunk/test/Transforms/ConstProp/calls.ll