[llvm] r334460 - [X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc.
Wei Mi via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 19 11:32:11 PDT 2018
Thanks for the quick fix!
Wei.
On Tue, Jun 19, 2018 at 10:59 AM, Topper, Craig <craig.topper at intel.com> wrote:
> Should be fixed after r335062
>
> -----Original Message-----
> From: Wei Mi [mailto:wmi at google.com]
> Sent: Tuesday, June 19, 2018 10:21 AM
> To: Topper, Craig <craig.topper at intel.com>
> Cc: llvm-commits <llvm-commits at lists.llvm.org>
> Subject: Re: [llvm] r334460 - [X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc.
>
> Hi Craig,
>
> We run into a SEGV for a test after this patch. It happens when
> folding an unaligned memory access into roundpd. roundpd will SEGV if
> the mem operand is not 128 bits aligned. Here is an example reduced:
>
> $ cat test.ll
> declare <2 x double> @llvm.floor.v2f64(<2 x double>)
>
> define <2 x double> @test(<2 x double>* %xptr) nounwind optsize {
> %x = load <2 x double>, <2 x double>* %xptr, align 8
> %call = tail call <2 x double> @llvm.floor.v2f64(<2 x double> %x)
> ret <2 x double> %call
> }
>
> $ ../../llvm-r334145/rbuild/bin/llc -mtriple=x86_64 -mattr=sse4.1 < 2.ll
> .text
> .file "<stdin>"
> .globl test # -- Begin function test
> .type test, at function
> test: # @test
> # %bb.0:
> movupd (%rdi), %xmm0
> roundpd $9, %xmm0, %xmm0
> retq
> .Lfunc_end0:
> .size test, .Lfunc_end0-test
>
> $ ../../llvm-r334460/bin/llc -mtriple=x86_64 -mattr=sse4.1 < 2.ll
> .text
> .file "<stdin>"
> .globl test # -- Begin function test
> .type test, at function
> test: # @test
> # %bb.0:
> roundpd $9, (%rdi), %xmm0
> retq
> .Lfunc_end0:
> .size test, .Lfunc_end0-test
> # -- End function
>
> .section ".note.GNU-stack","", at progbits
>
> Thanks,
> Wei.
>
> On Mon, Jun 11, 2018 at 5:48 PM, Craig Topper via llvm-commits
> <llvm-commits at lists.llvm.org> wrote:
>> Author: ctopper
>> Date: Mon Jun 11 17:48:57 2018
>> New Revision: 334460
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=334460&view=rev
>> Log:
>> [X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc.
>>
>> We were missing packed isel folding patterns for all of sse41, avx, and avx512.
>>
>> For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns.
>>
>> Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch.
>>
>> Some of this was spotted in the review for D47993.
>>
>> This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns.
>>
>> Modified:
>> llvm/trunk/lib/Target/X86/X86InstrAVX512.td
>> llvm/trunk/lib/Target/X86/X86InstrSSE.td
>> llvm/trunk/test/CodeGen/X86/avx-cvt.ll
>> llvm/trunk/test/CodeGen/X86/avx-cvttp2si.ll
>> llvm/trunk/test/CodeGen/X86/rounding-ops.ll
>> llvm/trunk/test/CodeGen/X86/sse-cvttp2si.ll
>>
>> Modified: llvm/trunk/lib/Target/X86/X86InstrAVX512.td
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrAVX512.td?rev=334460&r1=334459&r2=334460&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86InstrAVX512.td (original)
>> +++ llvm/trunk/lib/Target/X86/X86InstrAVX512.td Mon Jun 11 17:48:57 2018
>> @@ -9777,6 +9777,17 @@ def : Pat<(v16f32 (frint VR512:$src)),
>> def : Pat<(v16f32 (ftrunc VR512:$src)),
>> (VRNDSCALEPSZrri VR512:$src, (i32 0xB))>;
>>
>> +def : Pat<(v16f32 (ffloor (loadv16f32 addr:$src))),
>> + (VRNDSCALEPSZrmi addr:$src, (i32 0x9))>;
>> +def : Pat<(v16f32 (fnearbyint (loadv16f32 addr:$src))),
>> + (VRNDSCALEPSZrmi addr:$src, (i32 0xC))>;
>> +def : Pat<(v16f32 (fceil (loadv16f32 addr:$src))),
>> + (VRNDSCALEPSZrmi addr:$src, (i32 0xA))>;
>> +def : Pat<(v16f32 (frint (loadv16f32 addr:$src))),
>> + (VRNDSCALEPSZrmi addr:$src, (i32 0x4))>;
>> +def : Pat<(v16f32 (ftrunc (loadv16f32 addr:$src))),
>> + (VRNDSCALEPSZrmi addr:$src, (i32 0xB))>;
>> +
>> def : Pat<(v8f64 (ffloor VR512:$src)),
>> (VRNDSCALEPDZrri VR512:$src, (i32 0x9))>;
>> def : Pat<(v8f64 (fnearbyint VR512:$src)),
>> @@ -9787,6 +9798,17 @@ def : Pat<(v8f64 (frint VR512:$src)),
>> (VRNDSCALEPDZrri VR512:$src, (i32 0x4))>;
>> def : Pat<(v8f64 (ftrunc VR512:$src)),
>> (VRNDSCALEPDZrri VR512:$src, (i32 0xB))>;
>> +
>> +def : Pat<(v8f64 (ffloor (loadv8f64 addr:$src))),
>> + (VRNDSCALEPDZrmi addr:$src, (i32 0x9))>;
>> +def : Pat<(v8f64 (fnearbyint (loadv8f64 addr:$src))),
>> + (VRNDSCALEPDZrmi addr:$src, (i32 0xC))>;
>> +def : Pat<(v8f64 (fceil (loadv8f64 addr:$src))),
>> + (VRNDSCALEPDZrmi addr:$src, (i32 0xA))>;
>> +def : Pat<(v8f64 (frint (loadv8f64 addr:$src))),
>> + (VRNDSCALEPDZrmi addr:$src, (i32 0x4))>;
>> +def : Pat<(v8f64 (ftrunc (loadv8f64 addr:$src))),
>> + (VRNDSCALEPDZrmi addr:$src, (i32 0xB))>;
>> }
>>
>> let Predicates = [HasVLX] in {
>> @@ -9801,6 +9823,17 @@ def : Pat<(v4f32 (frint VR128X:$src)),
>> def : Pat<(v4f32 (ftrunc VR128X:$src)),
>> (VRNDSCALEPSZ128rri VR128X:$src, (i32 0xB))>;
>>
>> +def : Pat<(v4f32 (ffloor (loadv4f32 addr:$src))),
>> + (VRNDSCALEPSZ128rmi addr:$src, (i32 0x9))>;
>> +def : Pat<(v4f32 (fnearbyint (loadv4f32 addr:$src))),
>> + (VRNDSCALEPSZ128rmi addr:$src, (i32 0xC))>;
>> +def : Pat<(v4f32 (fceil (loadv4f32 addr:$src))),
>> + (VRNDSCALEPSZ128rmi addr:$src, (i32 0xA))>;
>> +def : Pat<(v4f32 (frint (loadv4f32 addr:$src))),
>> + (VRNDSCALEPSZ128rmi addr:$src, (i32 0x4))>;
>> +def : Pat<(v4f32 (ftrunc (loadv4f32 addr:$src))),
>> + (VRNDSCALEPSZ128rmi addr:$src, (i32 0xB))>;
>> +
>> def : Pat<(v2f64 (ffloor VR128X:$src)),
>> (VRNDSCALEPDZ128rri VR128X:$src, (i32 0x9))>;
>> def : Pat<(v2f64 (fnearbyint VR128X:$src)),
>> @@ -9812,6 +9845,17 @@ def : Pat<(v2f64 (frint VR128X:$src)),
>> def : Pat<(v2f64 (ftrunc VR128X:$src)),
>> (VRNDSCALEPDZ128rri VR128X:$src, (i32 0xB))>;
>>
>> +def : Pat<(v2f64 (ffloor (loadv2f64 addr:$src))),
>> + (VRNDSCALEPDZ128rmi addr:$src, (i32 0x9))>;
>> +def : Pat<(v2f64 (fnearbyint (loadv2f64 addr:$src))),
>> + (VRNDSCALEPDZ128rmi addr:$src, (i32 0xC))>;
>> +def : Pat<(v2f64 (fceil (loadv2f64 addr:$src))),
>> + (VRNDSCALEPDZ128rmi addr:$src, (i32 0xA))>;
>> +def : Pat<(v2f64 (frint (loadv2f64 addr:$src))),
>> + (VRNDSCALEPDZ128rmi addr:$src, (i32 0x4))>;
>> +def : Pat<(v2f64 (ftrunc (loadv2f64 addr:$src))),
>> + (VRNDSCALEPDZ128rmi addr:$src, (i32 0xB))>;
>> +
>> def : Pat<(v8f32 (ffloor VR256X:$src)),
>> (VRNDSCALEPSZ256rri VR256X:$src, (i32 0x9))>;
>> def : Pat<(v8f32 (fnearbyint VR256X:$src)),
>> @@ -9823,6 +9867,17 @@ def : Pat<(v8f32 (frint VR256X:$src)),
>> def : Pat<(v8f32 (ftrunc VR256X:$src)),
>> (VRNDSCALEPSZ256rri VR256X:$src, (i32 0xB))>;
>>
>> +def : Pat<(v8f32 (ffloor (loadv8f32 addr:$src))),
>> + (VRNDSCALEPSZ256rmi addr:$src, (i32 0x9))>;
>> +def : Pat<(v8f32 (fnearbyint (loadv8f32 addr:$src))),
>> + (VRNDSCALEPSZ256rmi addr:$src, (i32 0xC))>;
>> +def : Pat<(v8f32 (fceil (loadv8f32 addr:$src))),
>> + (VRNDSCALEPSZ256rmi addr:$src, (i32 0xA))>;
>> +def : Pat<(v8f32 (frint (loadv8f32 addr:$src))),
>> + (VRNDSCALEPSZ256rmi addr:$src, (i32 0x4))>;
>> +def : Pat<(v8f32 (ftrunc (loadv8f32 addr:$src))),
>> + (VRNDSCALEPSZ256rmi addr:$src, (i32 0xB))>;
>> +
>> def : Pat<(v4f64 (ffloor VR256X:$src)),
>> (VRNDSCALEPDZ256rri VR256X:$src, (i32 0x9))>;
>> def : Pat<(v4f64 (fnearbyint VR256X:$src)),
>> @@ -9833,6 +9888,17 @@ def : Pat<(v4f64 (frint VR256X:$src)),
>> (VRNDSCALEPDZ256rri VR256X:$src, (i32 0x4))>;
>> def : Pat<(v4f64 (ftrunc VR256X:$src)),
>> (VRNDSCALEPDZ256rri VR256X:$src, (i32 0xB))>;
>> +
>> +def : Pat<(v4f64 (ffloor (loadv4f64 addr:$src))),
>> + (VRNDSCALEPDZ256rmi addr:$src, (i32 0x9))>;
>> +def : Pat<(v4f64 (fnearbyint (loadv4f64 addr:$src))),
>> + (VRNDSCALEPDZ256rmi addr:$src, (i32 0xC))>;
>> +def : Pat<(v4f64 (fceil (loadv4f64 addr:$src))),
>> + (VRNDSCALEPDZ256rmi addr:$src, (i32 0xA))>;
>> +def : Pat<(v4f64 (frint (loadv4f64 addr:$src))),
>> + (VRNDSCALEPDZ256rmi addr:$src, (i32 0x4))>;
>> +def : Pat<(v4f64 (ftrunc (loadv4f64 addr:$src))),
>> + (VRNDSCALEPDZ256rmi addr:$src, (i32 0xB))>;
>> }
>>
>> multiclass avx512_shuff_packed_128_common<bits<8> opc, string OpcodeStr,
>>
>> Modified: llvm/trunk/lib/Target/X86/X86InstrSSE.td
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrSSE.td?rev=334460&r1=334459&r2=334460&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86InstrSSE.td (original)
>> +++ llvm/trunk/lib/Target/X86/X86InstrSSE.td Mon Jun 11 17:48:57 2018
>> @@ -5606,26 +5606,51 @@ let Predicates = [HasAVX, NoAVX512] in {
>> let Predicates = [UseAVX] in {
>> def : Pat<(ffloor FR32:$src),
>> (VROUNDSSr (f32 (IMPLICIT_DEF)), FR32:$src, (i32 0x9))>;
>> - def : Pat<(f64 (ffloor FR64:$src)),
>> - (VROUNDSDr (f64 (IMPLICIT_DEF)), FR64:$src, (i32 0x9))>;
>> def : Pat<(f32 (fnearbyint FR32:$src)),
>> (VROUNDSSr (f32 (IMPLICIT_DEF)), FR32:$src, (i32 0xC))>;
>> - def : Pat<(f64 (fnearbyint FR64:$src)),
>> - (VROUNDSDr (f64 (IMPLICIT_DEF)), FR64:$src, (i32 0xC))>;
>> def : Pat<(f32 (fceil FR32:$src)),
>> (VROUNDSSr (f32 (IMPLICIT_DEF)), FR32:$src, (i32 0xA))>;
>> - def : Pat<(f64 (fceil FR64:$src)),
>> - (VROUNDSDr (f64 (IMPLICIT_DEF)), FR64:$src, (i32 0xA))>;
>> def : Pat<(f32 (frint FR32:$src)),
>> (VROUNDSSr (f32 (IMPLICIT_DEF)), FR32:$src, (i32 0x4))>;
>> - def : Pat<(f64 (frint FR64:$src)),
>> - (VROUNDSDr (f64 (IMPLICIT_DEF)), FR64:$src, (i32 0x4))>;
>> def : Pat<(f32 (ftrunc FR32:$src)),
>> (VROUNDSSr (f32 (IMPLICIT_DEF)), FR32:$src, (i32 0xB))>;
>> +
>> + def : Pat<(f64 (ffloor FR64:$src)),
>> + (VROUNDSDr (f64 (IMPLICIT_DEF)), FR64:$src, (i32 0x9))>;
>> + def : Pat<(f64 (fnearbyint FR64:$src)),
>> + (VROUNDSDr (f64 (IMPLICIT_DEF)), FR64:$src, (i32 0xC))>;
>> + def : Pat<(f64 (fceil FR64:$src)),
>> + (VROUNDSDr (f64 (IMPLICIT_DEF)), FR64:$src, (i32 0xA))>;
>> + def : Pat<(f64 (frint FR64:$src)),
>> + (VROUNDSDr (f64 (IMPLICIT_DEF)), FR64:$src, (i32 0x4))>;
>> def : Pat<(f64 (ftrunc FR64:$src)),
>> (VROUNDSDr (f64 (IMPLICIT_DEF)), FR64:$src, (i32 0xB))>;
>> }
>>
>> +let Predicates = [UseAVX, OptForSize] in {
>> + def : Pat<(ffloor (loadf32 addr:$src)),
>> + (VROUNDSSm (f32 (IMPLICIT_DEF)), addr:$src, (i32 0x9))>;
>> + def : Pat<(f32 (fnearbyint (loadf32 addr:$src))),
>> + (VROUNDSSm (f32 (IMPLICIT_DEF)), addr:$src, (i32 0xC))>;
>> + def : Pat<(f32 (fceil (loadf32 addr:$src))),
>> + (VROUNDSSm (f32 (IMPLICIT_DEF)), addr:$src, (i32 0xA))>;
>> + def : Pat<(f32 (frint (loadf32 addr:$src))),
>> + (VROUNDSSm (f32 (IMPLICIT_DEF)), addr:$src, (i32 0x4))>;
>> + def : Pat<(f32 (ftrunc (loadf32 addr:$src))),
>> + (VROUNDSSm (f32 (IMPLICIT_DEF)), addr:$src, (i32 0xB))>;
>> +
>> + def : Pat<(f64 (ffloor (loadf64 addr:$src))),
>> + (VROUNDSDm (f64 (IMPLICIT_DEF)), addr:$src, (i32 0x9))>;
>> + def : Pat<(f64 (fnearbyint (loadf64 addr:$src))),
>> + (VROUNDSDm (f64 (IMPLICIT_DEF)), addr:$src, (i32 0xC))>;
>> + def : Pat<(f64 (fceil (loadf64 addr:$src))),
>> + (VROUNDSDm (f64 (IMPLICIT_DEF)), addr:$src, (i32 0xA))>;
>> + def : Pat<(f64 (frint (loadf64 addr:$src))),
>> + (VROUNDSDm (f64 (IMPLICIT_DEF)), addr:$src, (i32 0x4))>;
>> + def : Pat<(f64 (ftrunc (loadf64 addr:$src))),
>> + (VROUNDSDm (f64 (IMPLICIT_DEF)), addr:$src, (i32 0xB))>;
>> +}
>> +
>> let Predicates = [HasAVX, NoVLX] in {
>> def : Pat<(v4f32 (ffloor VR128:$src)),
>> (VROUNDPSr VR128:$src, (i32 0x9))>;
>> @@ -5638,6 +5663,17 @@ let Predicates = [HasAVX, NoVLX] in {
>> def : Pat<(v4f32 (ftrunc VR128:$src)),
>> (VROUNDPSr VR128:$src, (i32 0xB))>;
>>
>> + def : Pat<(v4f32 (ffloor (loadv4f32 addr:$src))),
>> + (VROUNDPSm addr:$src, (i32 0x9))>;
>> + def : Pat<(v4f32 (fnearbyint (loadv4f32 addr:$src))),
>> + (VROUNDPSm addr:$src, (i32 0xC))>;
>> + def : Pat<(v4f32 (fceil (loadv4f32 addr:$src))),
>> + (VROUNDPSm addr:$src, (i32 0xA))>;
>> + def : Pat<(v4f32 (frint (loadv4f32 addr:$src))),
>> + (VROUNDPSm addr:$src, (i32 0x4))>;
>> + def : Pat<(v4f32 (ftrunc (loadv4f32 addr:$src))),
>> + (VROUNDPSm addr:$src, (i32 0xB))>;
>> +
>> def : Pat<(v2f64 (ffloor VR128:$src)),
>> (VROUNDPDr VR128:$src, (i32 0x9))>;
>> def : Pat<(v2f64 (fnearbyint VR128:$src)),
>> @@ -5649,6 +5685,17 @@ let Predicates = [HasAVX, NoVLX] in {
>> def : Pat<(v2f64 (ftrunc VR128:$src)),
>> (VROUNDPDr VR128:$src, (i32 0xB))>;
>>
>> + def : Pat<(v2f64 (ffloor (loadv2f64 addr:$src))),
>> + (VROUNDPDm addr:$src, (i32 0x9))>;
>> + def : Pat<(v2f64 (fnearbyint (loadv2f64 addr:$src))),
>> + (VROUNDPDm addr:$src, (i32 0xC))>;
>> + def : Pat<(v2f64 (fceil (loadv2f64 addr:$src))),
>> + (VROUNDPDm addr:$src, (i32 0xA))>;
>> + def : Pat<(v2f64 (frint (loadv2f64 addr:$src))),
>> + (VROUNDPDm addr:$src, (i32 0x4))>;
>> + def : Pat<(v2f64 (ftrunc (loadv2f64 addr:$src))),
>> + (VROUNDPDm addr:$src, (i32 0xB))>;
>> +
>> def : Pat<(v8f32 (ffloor VR256:$src)),
>> (VROUNDPSYr VR256:$src, (i32 0x9))>;
>> def : Pat<(v8f32 (fnearbyint VR256:$src)),
>> @@ -5660,6 +5707,17 @@ let Predicates = [HasAVX, NoVLX] in {
>> def : Pat<(v8f32 (ftrunc VR256:$src)),
>> (VROUNDPSYr VR256:$src, (i32 0xB))>;
>>
>> + def : Pat<(v8f32 (ffloor (loadv8f32 addr:$src))),
>> + (VROUNDPSYm addr:$src, (i32 0x9))>;
>> + def : Pat<(v8f32 (fnearbyint (loadv8f32 addr:$src))),
>> + (VROUNDPSYm addr:$src, (i32 0xC))>;
>> + def : Pat<(v8f32 (fceil (loadv8f32 addr:$src))),
>> + (VROUNDPSYm addr:$src, (i32 0xA))>;
>> + def : Pat<(v8f32 (frint (loadv8f32 addr:$src))),
>> + (VROUNDPSYm addr:$src, (i32 0x4))>;
>> + def : Pat<(v8f32 (ftrunc (loadv8f32 addr:$src))),
>> + (VROUNDPSYm addr:$src, (i32 0xB))>;
>> +
>> def : Pat<(v4f64 (ffloor VR256:$src)),
>> (VROUNDPDYr VR256:$src, (i32 0x9))>;
>> def : Pat<(v4f64 (fnearbyint VR256:$src)),
>> @@ -5670,6 +5728,17 @@ let Predicates = [HasAVX, NoVLX] in {
>> (VROUNDPDYr VR256:$src, (i32 0x4))>;
>> def : Pat<(v4f64 (ftrunc VR256:$src)),
>> (VROUNDPDYr VR256:$src, (i32 0xB))>;
>> +
>> + def : Pat<(v4f64 (ffloor (loadv4f64 addr:$src))),
>> + (VROUNDPDYm addr:$src, (i32 0x9))>;
>> + def : Pat<(v4f64 (fnearbyint (loadv4f64 addr:$src))),
>> + (VROUNDPDYm addr:$src, (i32 0xC))>;
>> + def : Pat<(v4f64 (fceil (loadv4f64 addr:$src))),
>> + (VROUNDPDYm addr:$src, (i32 0xA))>;
>> + def : Pat<(v4f64 (frint (loadv4f64 addr:$src))),
>> + (VROUNDPDYm addr:$src, (i32 0x4))>;
>> + def : Pat<(v4f64 (ftrunc (loadv4f64 addr:$src))),
>> + (VROUNDPDYm addr:$src, (i32 0xB))>;
>> }
>>
>> let ExeDomain = SSEPackedSingle in
>> @@ -5688,25 +5757,52 @@ defm ROUND : sse41_fp_binop_s<0x0A, 0x0
>> let Predicates = [UseSSE41] in {
>> def : Pat<(ffloor FR32:$src),
>> (ROUNDSSr FR32:$src, (i32 0x9))>;
>> - def : Pat<(f64 (ffloor FR64:$src)),
>> - (ROUNDSDr FR64:$src, (i32 0x9))>;
>> def : Pat<(f32 (fnearbyint FR32:$src)),
>> (ROUNDSSr FR32:$src, (i32 0xC))>;
>> - def : Pat<(f64 (fnearbyint FR64:$src)),
>> - (ROUNDSDr FR64:$src, (i32 0xC))>;
>> def : Pat<(f32 (fceil FR32:$src)),
>> (ROUNDSSr FR32:$src, (i32 0xA))>;
>> - def : Pat<(f64 (fceil FR64:$src)),
>> - (ROUNDSDr FR64:$src, (i32 0xA))>;
>> def : Pat<(f32 (frint FR32:$src)),
>> (ROUNDSSr FR32:$src, (i32 0x4))>;
>> - def : Pat<(f64 (frint FR64:$src)),
>> - (ROUNDSDr FR64:$src, (i32 0x4))>;
>> def : Pat<(f32 (ftrunc FR32:$src)),
>> (ROUNDSSr FR32:$src, (i32 0xB))>;
>> +
>> + def : Pat<(f64 (ffloor FR64:$src)),
>> + (ROUNDSDr FR64:$src, (i32 0x9))>;
>> + def : Pat<(f64 (fnearbyint FR64:$src)),
>> + (ROUNDSDr FR64:$src, (i32 0xC))>;
>> + def : Pat<(f64 (fceil FR64:$src)),
>> + (ROUNDSDr FR64:$src, (i32 0xA))>;
>> + def : Pat<(f64 (frint FR64:$src)),
>> + (ROUNDSDr FR64:$src, (i32 0x4))>;
>> def : Pat<(f64 (ftrunc FR64:$src)),
>> (ROUNDSDr FR64:$src, (i32 0xB))>;
>> +}
>>
>> +let Predicates = [UseSSE41, OptForSize] in {
>> + def : Pat<(ffloor (loadf32 addr:$src)),
>> + (ROUNDSSm addr:$src, (i32 0x9))>;
>> + def : Pat<(f32 (fnearbyint (loadf32 addr:$src))),
>> + (ROUNDSSm addr:$src, (i32 0xC))>;
>> + def : Pat<(f32 (fceil (loadf32 addr:$src))),
>> + (ROUNDSSm addr:$src, (i32 0xA))>;
>> + def : Pat<(f32 (frint (loadf32 addr:$src))),
>> + (ROUNDSSm addr:$src, (i32 0x4))>;
>> + def : Pat<(f32 (ftrunc (loadf32 addr:$src))),
>> + (ROUNDSSm addr:$src, (i32 0xB))>;
>> +
>> + def : Pat<(f64 (ffloor (loadf64 addr:$src))),
>> + (ROUNDSDm addr:$src, (i32 0x9))>;
>> + def : Pat<(f64 (fnearbyint (loadf64 addr:$src))),
>> + (ROUNDSDm addr:$src, (i32 0xC))>;
>> + def : Pat<(f64 (fceil (loadf64 addr:$src))),
>> + (ROUNDSDm addr:$src, (i32 0xA))>;
>> + def : Pat<(f64 (frint (loadf64 addr:$src))),
>> + (ROUNDSDm addr:$src, (i32 0x4))>;
>> + def : Pat<(f64 (ftrunc (loadf64 addr:$src))),
>> + (ROUNDSDm addr:$src, (i32 0xB))>;
>> +}
>> +
>> +let Predicates = [UseSSE41] in {
>> def : Pat<(v4f32 (ffloor VR128:$src)),
>> (ROUNDPSr VR128:$src, (i32 0x9))>;
>> def : Pat<(v4f32 (fnearbyint VR128:$src)),
>> @@ -5718,6 +5814,17 @@ let Predicates = [UseSSE41] in {
>> def : Pat<(v4f32 (ftrunc VR128:$src)),
>> (ROUNDPSr VR128:$src, (i32 0xB))>;
>>
>> + def : Pat<(v4f32 (ffloor (loadv4f32 addr:$src))),
>> + (ROUNDPSm addr:$src, (i32 0x9))>;
>> + def : Pat<(v4f32 (fnearbyint (loadv4f32 addr:$src))),
>> + (ROUNDPSm addr:$src, (i32 0xC))>;
>> + def : Pat<(v4f32 (fceil (loadv4f32 addr:$src))),
>> + (ROUNDPSm addr:$src, (i32 0xA))>;
>> + def : Pat<(v4f32 (frint (loadv4f32 addr:$src))),
>> + (ROUNDPSm addr:$src, (i32 0x4))>;
>> + def : Pat<(v4f32 (ftrunc (loadv4f32 addr:$src))),
>> + (ROUNDPSm addr:$src, (i32 0xB))>;
>> +
>> def : Pat<(v2f64 (ffloor VR128:$src)),
>> (ROUNDPDr VR128:$src, (i32 0x9))>;
>> def : Pat<(v2f64 (fnearbyint VR128:$src)),
>> @@ -5728,6 +5835,17 @@ let Predicates = [UseSSE41] in {
>> (ROUNDPDr VR128:$src, (i32 0x4))>;
>> def : Pat<(v2f64 (ftrunc VR128:$src)),
>> (ROUNDPDr VR128:$src, (i32 0xB))>;
>> +
>> + def : Pat<(v2f64 (ffloor (loadv2f64 addr:$src))),
>> + (ROUNDPDm addr:$src, (i32 0x9))>;
>> + def : Pat<(v2f64 (fnearbyint (loadv2f64 addr:$src))),
>> + (ROUNDPDm addr:$src, (i32 0xC))>;
>> + def : Pat<(v2f64 (fceil (loadv2f64 addr:$src))),
>> + (ROUNDPDm addr:$src, (i32 0xA))>;
>> + def : Pat<(v2f64 (frint (loadv2f64 addr:$src))),
>> + (ROUNDPDm addr:$src, (i32 0x4))>;
>> + def : Pat<(v2f64 (ftrunc (loadv2f64 addr:$src))),
>> + (ROUNDPDm addr:$src, (i32 0xB))>;
>> }
>>
>> //===----------------------------------------------------------------------===//
>>
>> Modified: llvm/trunk/test/CodeGen/X86/avx-cvt.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-cvt.ll?rev=334460&r1=334459&r2=334460&view=diff
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/avx-cvt.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/avx-cvt.ll Mon Jun 11 17:48:57 2018
>> @@ -1,5 +1,6 @@
>> ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
>> -; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx | FileCheck %s
>> +; RUN: llc < %s -disable-peephole -mtriple=x86_64-unknown-unknown -mattr=+avx | FileCheck %s --check-prefix=CHECK --check-prefix=AVX
>> +; RUN: llc < %s -disable-peephole -mtriple=x86_64-unknown-unknown -mattr=+avx512f | FileCheck %s --check-prefix=CHECK --check-prefix=AVX512
>>
>> define <8 x float> @sitofp00(<8 x i32> %a) nounwind {
>> ; CHECK-LABEL: sitofp00:
>> @@ -29,14 +30,20 @@ define <4 x double> @sitofp01(<4 x i32>
>> }
>>
>> define <8 x float> @sitofp02(<8 x i16> %a) {
>> -; CHECK-LABEL: sitofp02:
>> -; CHECK: # %bb.0:
>> -; CHECK-NEXT: vpmovsxwd %xmm0, %xmm1
>> -; CHECK-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
>> -; CHECK-NEXT: vpmovsxwd %xmm0, %xmm0
>> -; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
>> -; CHECK-NEXT: vcvtdq2ps %ymm0, %ymm0
>> -; CHECK-NEXT: retq
>> +; AVX-LABEL: sitofp02:
>> +; AVX: # %bb.0:
>> +; AVX-NEXT: vpmovsxwd %xmm0, %xmm1
>> +; AVX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
>> +; AVX-NEXT: vpmovsxwd %xmm0, %xmm0
>> +; AVX-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
>> +; AVX-NEXT: vcvtdq2ps %ymm0, %ymm0
>> +; AVX-NEXT: retq
>> +;
>> +; AVX512-LABEL: sitofp02:
>> +; AVX512: # %bb.0:
>> +; AVX512-NEXT: vpmovsxwd %xmm0, %ymm0
>> +; AVX512-NEXT: vcvtdq2ps %ymm0, %ymm0
>> +; AVX512-NEXT: retq
>> %b = sitofp <8 x i16> %a to <8 x float>
>> ret <8 x float> %b
>> }
>> @@ -52,12 +59,17 @@ define <4 x i32> @fptosi01(<4 x double>
>> }
>>
>> define <8 x float> @fptrunc00(<8 x double> %b) nounwind {
>> -; CHECK-LABEL: fptrunc00:
>> -; CHECK: # %bb.0:
>> -; CHECK-NEXT: vcvtpd2ps %ymm0, %xmm0
>> -; CHECK-NEXT: vcvtpd2ps %ymm1, %xmm1
>> -; CHECK-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
>> -; CHECK-NEXT: retq
>> +; AVX-LABEL: fptrunc00:
>> +; AVX: # %bb.0:
>> +; AVX-NEXT: vcvtpd2ps %ymm0, %xmm0
>> +; AVX-NEXT: vcvtpd2ps %ymm1, %xmm1
>> +; AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
>> +; AVX-NEXT: retq
>> +;
>> +; AVX512-LABEL: fptrunc00:
>> +; AVX512: # %bb.0:
>> +; AVX512-NEXT: vcvtpd2ps %zmm0, %ymm0
>> +; AVX512-NEXT: retq
>> %a = fptrunc <8 x double> %b to <8 x float>
>> ret <8 x float> %a
>> }
>> @@ -168,4 +180,23 @@ define float @floor_f32(float %a) {
>> }
>> declare float @llvm.floor.f32(float %p)
>>
>> +define float @floor_f32_load(float* %aptr) optsize {
>> +; CHECK-LABEL: floor_f32_load:
>> +; CHECK: # %bb.0:
>> +; CHECK-NEXT: vroundss $9, (%rdi), %xmm0, %xmm0
>> +; CHECK-NEXT: retq
>> + %a = load float, float* %aptr
>> + %res = call float @llvm.floor.f32(float %a)
>> + ret float %res
>> +}
>> +
>> +define double @nearbyint_f64_load(double* %aptr) optsize {
>> +; CHECK-LABEL: nearbyint_f64_load:
>> +; CHECK: # %bb.0:
>> +; CHECK-NEXT: vroundsd $12, (%rdi), %xmm0, %xmm0
>> +; CHECK-NEXT: retq
>> + %a = load double, double* %aptr
>> + %res = call double @llvm.nearbyint.f64(double %a)
>> + ret double %res
>> +}
>>
>>
>> Modified: llvm/trunk/test/CodeGen/X86/avx-cvttp2si.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-cvttp2si.ll?rev=334460&r1=334459&r2=334460&view=diff
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/avx-cvttp2si.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/avx-cvttp2si.ll Mon Jun 11 17:48:57 2018
>> @@ -9,16 +9,10 @@ declare <8 x i32> @llvm.x86.avx.cvtt.ps2
>> declare <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double>)
>>
>> define <8 x float> @float_to_int_to_float_mem_v8f32(<8 x float>* %p) {
>> -; AVX1-LABEL: float_to_int_to_float_mem_v8f32:
>> -; AVX1: # %bb.0:
>> -; AVX1-NEXT: vroundps $11, (%rdi), %ymm0
>> -; AVX1-NEXT: retq
>> -;
>> -; AVX512-LABEL: float_to_int_to_float_mem_v8f32:
>> -; AVX512: # %bb.0:
>> -; AVX512-NEXT: vmovups (%rdi), %ymm0
>> -; AVX512-NEXT: vroundps $11, %ymm0, %ymm0
>> -; AVX512-NEXT: retq
>> +; AVX-LABEL: float_to_int_to_float_mem_v8f32:
>> +; AVX: # %bb.0:
>> +; AVX-NEXT: vroundps $11, (%rdi), %ymm0
>> +; AVX-NEXT: retq
>> %x = load <8 x float>, <8 x float>* %p, align 16
>> %fptosi = tail call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %x)
>> %sitofp = sitofp <8 x i32> %fptosi to <8 x float>
>> @@ -36,16 +30,10 @@ define <8 x float> @float_to_int_to_floa
>> }
>>
>> define <4 x double> @float_to_int_to_float_mem_v4f64(<4 x double>* %p) {
>> -; AVX1-LABEL: float_to_int_to_float_mem_v4f64:
>> -; AVX1: # %bb.0:
>> -; AVX1-NEXT: vroundpd $11, (%rdi), %ymm0
>> -; AVX1-NEXT: retq
>> -;
>> -; AVX512-LABEL: float_to_int_to_float_mem_v4f64:
>> -; AVX512: # %bb.0:
>> -; AVX512-NEXT: vmovupd (%rdi), %ymm0
>> -; AVX512-NEXT: vroundpd $11, %ymm0, %ymm0
>> -; AVX512-NEXT: retq
>> +; AVX-LABEL: float_to_int_to_float_mem_v4f64:
>> +; AVX: # %bb.0:
>> +; AVX-NEXT: vroundpd $11, (%rdi), %ymm0
>> +; AVX-NEXT: retq
>> %x = load <4 x double>, <4 x double>* %p, align 16
>> %fptosi = tail call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %x)
>> %sitofp = sitofp <4 x i32> %fptosi to <4 x double>
>>
>> Modified: llvm/trunk/test/CodeGen/X86/rounding-ops.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/rounding-ops.ll?rev=334460&r1=334459&r2=334460&view=diff
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/rounding-ops.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/rounding-ops.ll Mon Jun 11 17:48:57 2018
>> @@ -1,7 +1,7 @@
>> ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
>> -; RUN: llc < %s -mtriple=x86_64-apple-macosx -mattr=+sse4.1 | FileCheck -check-prefix=CHECK-SSE %s
>> -; RUN: llc < %s -mtriple=x86_64-apple-macosx -mattr=+avx | FileCheck -check-prefix=CHECK-AVX %s
>> -; RUN: llc < %s -mtriple=x86_64-apple-macosx -mattr=+avx512f | FileCheck -check-prefix=CHECK-AVX512 %s
>> +; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-macosx -mattr=+sse4.1 | FileCheck -check-prefix=CHECK-SSE %s
>> +; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-macosx -mattr=+avx | FileCheck -check-prefix=CHECK-AVX %s
>> +; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-macosx -mattr=+avx512f | FileCheck -check-prefix=CHECK-AVX512 %s
>>
>> define float @test1(float %x) nounwind {
>> ; CHECK-SSE-LABEL: test1:
>> @@ -212,3 +212,43 @@ define double @test10(double %x) nounwin
>> }
>>
>> declare double @trunc(double) nounwind readnone
>> +
>> +define float @test11(float* %xptr) nounwind optsize {
>> +; CHECK-SSE-LABEL: test11:
>> +; CHECK-SSE: ## %bb.0:
>> +; CHECK-SSE-NEXT: roundss $11, (%rdi), %xmm0
>> +; CHECK-SSE-NEXT: retq
>> +;
>> +; CHECK-AVX-LABEL: test11:
>> +; CHECK-AVX: ## %bb.0:
>> +; CHECK-AVX-NEXT: vroundss $11, (%rdi), %xmm0, %xmm0
>> +; CHECK-AVX-NEXT: retq
>> +;
>> +; CHECK-AVX512-LABEL: test11:
>> +; CHECK-AVX512: ## %bb.0:
>> +; CHECK-AVX512-NEXT: vroundss $11, (%rdi), %xmm0, %xmm0
>> +; CHECK-AVX512-NEXT: retq
>> + %x = load float, float* %xptr
>> + %call = tail call float @truncf(float %x) nounwind readnone
>> + ret float %call
>> +}
>> +
>> +define double @test12(double* %xptr) nounwind optsize {
>> +; CHECK-SSE-LABEL: test12:
>> +; CHECK-SSE: ## %bb.0:
>> +; CHECK-SSE-NEXT: roundsd $11, (%rdi), %xmm0
>> +; CHECK-SSE-NEXT: retq
>> +;
>> +; CHECK-AVX-LABEL: test12:
>> +; CHECK-AVX: ## %bb.0:
>> +; CHECK-AVX-NEXT: vroundsd $11, (%rdi), %xmm0, %xmm0
>> +; CHECK-AVX-NEXT: retq
>> +;
>> +; CHECK-AVX512-LABEL: test12:
>> +; CHECK-AVX512: ## %bb.0:
>> +; CHECK-AVX512-NEXT: vroundsd $11, (%rdi), %xmm0, %xmm0
>> +; CHECK-AVX512-NEXT: retq
>> + %x = load double, double* %xptr
>> + %call = tail call double @trunc(double %x) nounwind readnone
>> + ret double %call
>> +}
>>
>> Modified: llvm/trunk/test/CodeGen/X86/sse-cvttp2si.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse-cvttp2si.ll?rev=334460&r1=334459&r2=334460&view=diff
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/sse-cvttp2si.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/sse-cvttp2si.ll Mon Jun 11 17:48:57 2018
>> @@ -163,16 +163,10 @@ define <4 x float> @float_to_int_to_floa
>> ; SSE-NEXT: roundps $11, (%rdi), %xmm0
>> ; SSE-NEXT: retq
>> ;
>> -; AVX1-LABEL: float_to_int_to_float_mem_v4f32:
>> -; AVX1: # %bb.0:
>> -; AVX1-NEXT: vroundps $11, (%rdi), %xmm0
>> -; AVX1-NEXT: retq
>> -;
>> -; AVX512-LABEL: float_to_int_to_float_mem_v4f32:
>> -; AVX512: # %bb.0:
>> -; AVX512-NEXT: vmovaps (%rdi), %xmm0
>> -; AVX512-NEXT: vroundps $11, %xmm0, %xmm0
>> -; AVX512-NEXT: retq
>> +; AVX-LABEL: float_to_int_to_float_mem_v4f32:
>> +; AVX: # %bb.0:
>> +; AVX-NEXT: vroundps $11, (%rdi), %xmm0
>> +; AVX-NEXT: retq
>> %x = load <4 x float>, <4 x float>* %p, align 16
>> %fptosi = tail call <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float> %x)
>> %sitofp = sitofp <4 x i32> %fptosi to <4 x float>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list