[llvm] r355741 - [x86] scalarize extract element 0 of FP cmp

Eric Christopher via llvm-commits llvm-commits at lists.llvm.org
Fri Mar 15 15:36:26 PDT 2019


-march=haswell -mprefer-vector-width=128 (and a few other things) typically.

-eric

On Fri, Mar 15, 2019 at 3:32 PM Sanjay Patel via llvm-commits
<llvm-commits at lists.llvm.org> wrote:
>
> Need to confirm something: when you say "targeting Haswell", does that mean you're compiling with -mavx / -mavx2? Or you're compiling for default SSE2 while running on Haswell?
> I don't see any asm diffs when building with AVX/AVX2 and running on Haswell, but I do see the expected suspect 'branch-became-a-select' when building with SSE2.
>
> On Fri, Mar 15, 2019 at 7:35 AM Sanjay Patel <spatel at rotateright.com> wrote:
>>
>> From the test diffs in the patch, I'm guessing we converted a predictable compare+branch into a blendv, so that was a loser.
>> If you have a reduced test already, please let me know. Otherwise, I'll start investigating using the test-suite sources.
>>
>> This patch was intended to be an intermediate step towards the problems seen in PR39665 and related bugs (use 'movmsk' more liberally), so it's possible that we could reverse the order of those and avoid any regressions. Let me know if I should revert while looking at this.
>>
>> On Thu, Mar 14, 2019 at 6:46 PM David Jones <dlj at google.com> wrote:
>>>
>>> Hey Sanjay,
>>>
>>> It looks like this revision caused a pretty substantial performance regression on lcalsALambda in the test-suite. Specifically, all the BM_PRESSURE_CALC_LAMBDA variants (171, 5001, and 44217) showed 5-7% degradation when targeting Haswell. Chandler said he might have some ideas of the issue.
>>>
>>> --dlj
>>>
>>> On Fri, Mar 8, 2019 at 1:53 PM Sanjay Patel via llvm-commits <llvm-commits at lists.llvm.org> wrote:
>>>>
>>>> Author: spatel
>>>> Date: Fri Mar  8 13:54:41 2019
>>>> New Revision: 355741
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=355741&view=rev
>>>> Log:
>>>> [x86] scalarize extract element 0 of FP cmp
>>>>
>>>> An extension of D58282 noted in PR39665:
>>>> https://bugs.llvm.org/show_bug.cgi?id=39665
>>>>
>>>> This doesn't answer the request to use movmsk, but that's an
>>>> independent problem. We need this and probably still need
>>>> scalarization of FP selects because we can't do that as a
>>>> target-independent transform (although it seems likely that
>>>> targets besides x86 should have this transform).
>>>>
>>>> Modified:
>>>>     llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>>>     llvm/trunk/test/CodeGen/X86/extractelement-fp.ll
>>>>     llvm/trunk/test/CodeGen/X86/vec_floor.ll
>>>>
>>>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=355741&r1=355740&r2=355741&view=diff
>>>> ==============================================================================
>>>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>>>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Fri Mar  8 13:54:41 2019
>>>> @@ -34298,6 +34298,22 @@ static SDValue scalarizeExtEltFP(SDNode
>>>>    if (!Vec.hasOneUse() || !isNullConstant(Index) || VecVT.getScalarType() != VT)
>>>>      return SDValue();
>>>>
>>>> +  // Vector FP compares don't fit the pattern of FP math ops (propagate, not
>>>> +  // extract, the condition code), so deal with those as a special-case.
>>>> +  if (Vec.getOpcode() == ISD::SETCC) {
>>>> +    EVT OpVT = Vec.getOperand(0).getValueType().getScalarType();
>>>> +    if (OpVT != MVT::f32 && OpVT != MVT::f64)
>>>> +      return SDValue();
>>>> +
>>>> +    // extract (setcc X, Y, CC), 0 --> setcc (extract X, 0), (extract Y, 0), CC
>>>> +    SDLoc DL(ExtElt);
>>>> +    SDValue Ext0 = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, OpVT,
>>>> +                               Vec.getOperand(0), Index);
>>>> +    SDValue Ext1 = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, OpVT,
>>>> +                               Vec.getOperand(1), Index);
>>>> +    return DAG.getNode(Vec.getOpcode(), DL, VT, Ext0, Ext1, Vec.getOperand(2));
>>>> +  }
>>>> +
>>>>    if (VT != MVT::f32 && VT != MVT::f64)
>>>>      return SDValue();
>>>>
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/extractelement-fp.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/extractelement-fp.ll?rev=355741&r1=355740&r2=355741&view=diff
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/X86/extractelement-fp.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/extractelement-fp.ll Fri Mar  8 13:54:41 2019
>>>> @@ -132,9 +132,8 @@ define double @frem_v4f64(<4 x double> %
>>>>  define i1 @fcmp_v4f32(<4 x float> %x, <4 x float> %y) nounwind {
>>>>  ; CHECK-LABEL: fcmp_v4f32:
>>>>  ; CHECK:       # %bb.0:
>>>> -; CHECK-NEXT:    vcmpltps %xmm0, %xmm1, %xmm0
>>>> -; CHECK-NEXT:    vpextrb $0, %xmm0, %eax
>>>> -; CHECK-NEXT:    # kill: def $al killed $al killed $eax
>>>> +; CHECK-NEXT:    vucomiss %xmm1, %xmm0
>>>> +; CHECK-NEXT:    seta %al
>>>>  ; CHECK-NEXT:    retq
>>>>    %v = fcmp ogt <4 x float> %x, %y
>>>>    %r = extractelement <4 x i1> %v, i32 0
>>>> @@ -144,9 +143,8 @@ define i1 @fcmp_v4f32(<4 x float> %x, <4
>>>>  define i1 @fcmp_v4f64(<4 x double> %x, <4 x double> %y) nounwind {
>>>>  ; CHECK-LABEL: fcmp_v4f64:
>>>>  ; CHECK:       # %bb.0:
>>>> -; CHECK-NEXT:    vcmpnlepd %ymm1, %ymm0, %ymm0
>>>> -; CHECK-NEXT:    vpextrb $0, %xmm0, %eax
>>>> -; CHECK-NEXT:    # kill: def $al killed $al killed $eax
>>>> +; CHECK-NEXT:    vucomisd %xmm0, %xmm1
>>>> +; CHECK-NEXT:    setb %al
>>>>  ; CHECK-NEXT:    vzeroupper
>>>>  ; CHECK-NEXT:    retq
>>>>    %v = fcmp ugt <4 x double> %x, %y
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/vec_floor.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vec_floor.ll?rev=355741&r1=355740&r2=355741&view=diff
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/X86/vec_floor.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/vec_floor.ll Fri Mar  8 13:54:41 2019
>>>> @@ -1665,47 +1665,28 @@ define <2 x double> @floor_maskz_sd_trun
>>>>  define <4 x float> @floor_mask_ss_mask8(<4 x float> %x, <4 x float> %y, <4 x float> %w) nounwind {
>>>>  ; SSE41-LABEL: floor_mask_ss_mask8:
>>>>  ; SSE41:       ## %bb.0:
>>>> -; SSE41-NEXT:    movaps %xmm0, %xmm3
>>>> -; SSE41-NEXT:    cmpeqps %xmm1, %xmm3
>>>> -; SSE41-NEXT:    pextrb $0, %xmm3, %eax
>>>> -; SSE41-NEXT:    testb $1, %al
>>>> -; SSE41-NEXT:    je LBB60_2
>>>> -; SSE41-NEXT:  ## %bb.1:
>>>> -; SSE41-NEXT:    xorps %xmm2, %xmm2
>>>> -; SSE41-NEXT:    roundss $9, %xmm0, %xmm2
>>>> -; SSE41-NEXT:  LBB60_2:
>>>> -; SSE41-NEXT:    blendps {{.*#+}} xmm1 = xmm2[0],xmm1[1,2,3]
>>>> -; SSE41-NEXT:    movaps %xmm1, %xmm0
>>>> +; SSE41-NEXT:    roundss $9, %xmm0, %xmm3
>>>> +; SSE41-NEXT:    cmpeqss %xmm1, %xmm0
>>>> +; SSE41-NEXT:    andps %xmm0, %xmm3
>>>> +; SSE41-NEXT:    andnps %xmm2, %xmm0
>>>> +; SSE41-NEXT:    orps %xmm3, %xmm0
>>>> +; SSE41-NEXT:    blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>>>>  ; SSE41-NEXT:    retq
>>>>  ;
>>>>  ; AVX-LABEL: floor_mask_ss_mask8:
>>>>  ; AVX:       ## %bb.0:
>>>> -; AVX-NEXT:    vcmpeqps %xmm1, %xmm0, %xmm3
>>>> -; AVX-NEXT:    vpextrb $0, %xmm3, %eax
>>>> -; AVX-NEXT:    testb $1, %al
>>>> -; AVX-NEXT:    je LBB60_2
>>>> -; AVX-NEXT:  ## %bb.1:
>>>> -; AVX-NEXT:    vroundss $9, %xmm0, %xmm0, %xmm2
>>>> -; AVX-NEXT:  LBB60_2:
>>>> -; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm2[0],xmm1[1,2,3]
>>>> +; AVX-NEXT:    vroundss $9, %xmm0, %xmm0, %xmm3
>>>> +; AVX-NEXT:    vcmpeqss %xmm1, %xmm0, %xmm0
>>>> +; AVX-NEXT:    vblendvps %xmm0, %xmm3, %xmm2, %xmm0
>>>> +; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>>>>  ; AVX-NEXT:    retq
>>>>  ;
>>>> -; AVX512F-LABEL: floor_mask_ss_mask8:
>>>> -; AVX512F:       ## %bb.0:
>>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>>>> -; AVX512F-NEXT:    vcmpeqps %zmm1, %zmm0, %k1
>>>> -; AVX512F-NEXT:    vrndscaless $1, %xmm0, %xmm1, %xmm2 {%k1}
>>>> -; AVX512F-NEXT:    vmovaps %xmm2, %xmm0
>>>> -; AVX512F-NEXT:    vzeroupper
>>>> -; AVX512F-NEXT:    retq
>>>> -;
>>>> -; AVX512VL-LABEL: floor_mask_ss_mask8:
>>>> -; AVX512VL:       ## %bb.0:
>>>> -; AVX512VL-NEXT:    vcmpeqps %xmm1, %xmm0, %k1
>>>> -; AVX512VL-NEXT:    vrndscaless $1, %xmm0, %xmm1, %xmm2 {%k1}
>>>> -; AVX512VL-NEXT:    vmovaps %xmm2, %xmm0
>>>> -; AVX512VL-NEXT:    retq
>>>> +; AVX512-LABEL: floor_mask_ss_mask8:
>>>> +; AVX512:       ## %bb.0:
>>>> +; AVX512-NEXT:    vcmpeqss %xmm1, %xmm0, %k1
>>>> +; AVX512-NEXT:    vrndscaless $1, %xmm0, %xmm1, %xmm2 {%k1}
>>>> +; AVX512-NEXT:    vmovaps %xmm2, %xmm0
>>>> +; AVX512-NEXT:    retq
>>>>    %mask1 = fcmp oeq <4 x float> %x, %y
>>>>    %mask = extractelement <4 x i1> %mask1, i64 0
>>>>    %s = extractelement <4 x float> %x, i64 0
>>>> @@ -1719,50 +1700,25 @@ define <4 x float> @floor_mask_ss_mask8(
>>>>  define <4 x float> @floor_maskz_ss_mask8(<4 x float> %x, <4 x float> %y) nounwind {
>>>>  ; SSE41-LABEL: floor_maskz_ss_mask8:
>>>>  ; SSE41:       ## %bb.0:
>>>> -; SSE41-NEXT:    movaps %xmm0, %xmm2
>>>> -; SSE41-NEXT:    cmpeqps %xmm1, %xmm2
>>>> -; SSE41-NEXT:    pextrb $0, %xmm2, %eax
>>>> -; SSE41-NEXT:    testb $1, %al
>>>> -; SSE41-NEXT:    jne LBB61_1
>>>> -; SSE41-NEXT:  ## %bb.2:
>>>> -; SSE41-NEXT:    xorps %xmm0, %xmm0
>>>> -; SSE41-NEXT:    jmp LBB61_3
>>>> -; SSE41-NEXT:  LBB61_1:
>>>> -; SSE41-NEXT:    roundss $9, %xmm0, %xmm0
>>>> -; SSE41-NEXT:  LBB61_3:
>>>> -; SSE41-NEXT:    blendps {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
>>>> -; SSE41-NEXT:    movaps %xmm1, %xmm0
>>>> +; SSE41-NEXT:    roundss $9, %xmm0, %xmm2
>>>> +; SSE41-NEXT:    cmpeqss %xmm1, %xmm0
>>>> +; SSE41-NEXT:    andps %xmm2, %xmm0
>>>> +; SSE41-NEXT:    blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>>>>  ; SSE41-NEXT:    retq
>>>>  ;
>>>>  ; AVX-LABEL: floor_maskz_ss_mask8:
>>>>  ; AVX:       ## %bb.0:
>>>> -; AVX-NEXT:    vcmpeqps %xmm1, %xmm0, %xmm2
>>>> -; AVX-NEXT:    vpextrb $0, %xmm2, %eax
>>>> -; AVX-NEXT:    testb $1, %al
>>>> -; AVX-NEXT:    jne LBB61_1
>>>> -; AVX-NEXT:  ## %bb.2:
>>>> -; AVX-NEXT:    vxorps %xmm0, %xmm0, %xmm0
>>>> -; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>>>> -; AVX-NEXT:    retq
>>>> -; AVX-NEXT:  LBB61_1:
>>>> -; AVX-NEXT:    vroundss $9, %xmm0, %xmm0, %xmm0
>>>> +; AVX-NEXT:    vroundss $9, %xmm0, %xmm0, %xmm2
>>>> +; AVX-NEXT:    vcmpeqss %xmm1, %xmm0, %xmm0
>>>> +; AVX-NEXT:    vandps %xmm2, %xmm0, %xmm0
>>>>  ; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>>>>  ; AVX-NEXT:    retq
>>>>  ;
>>>> -; AVX512F-LABEL: floor_maskz_ss_mask8:
>>>> -; AVX512F:       ## %bb.0:
>>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>>>> -; AVX512F-NEXT:    vcmpeqps %zmm1, %zmm0, %k1
>>>> -; AVX512F-NEXT:    vrndscaless $1, %xmm0, %xmm1, %xmm0 {%k1} {z}
>>>> -; AVX512F-NEXT:    vzeroupper
>>>> -; AVX512F-NEXT:    retq
>>>> -;
>>>> -; AVX512VL-LABEL: floor_maskz_ss_mask8:
>>>> -; AVX512VL:       ## %bb.0:
>>>> -; AVX512VL-NEXT:    vcmpeqps %xmm1, %xmm0, %k1
>>>> -; AVX512VL-NEXT:    vrndscaless $1, %xmm0, %xmm1, %xmm0 {%k1} {z}
>>>> -; AVX512VL-NEXT:    retq
>>>> +; AVX512-LABEL: floor_maskz_ss_mask8:
>>>> +; AVX512:       ## %bb.0:
>>>> +; AVX512-NEXT:    vcmpeqss %xmm1, %xmm0, %k1
>>>> +; AVX512-NEXT:    vrndscaless $1, %xmm0, %xmm1, %xmm0 {%k1} {z}
>>>> +; AVX512-NEXT:    retq
>>>>    %mask1 = fcmp oeq <4 x float> %x, %y
>>>>    %mask = extractelement <4 x i1> %mask1, i64 0
>>>>    %s = extractelement <4 x float> %x, i64 0
>>>> @@ -1775,47 +1731,28 @@ define <4 x float> @floor_maskz_ss_mask8
>>>>  define <2 x double> @floor_mask_sd_mask8(<2 x double> %x, <2 x double> %y, <2 x double> %w) nounwind {
>>>>  ; SSE41-LABEL: floor_mask_sd_mask8:
>>>>  ; SSE41:       ## %bb.0:
>>>> -; SSE41-NEXT:    movapd %xmm0, %xmm3
>>>> -; SSE41-NEXT:    cmpeqpd %xmm1, %xmm3
>>>> -; SSE41-NEXT:    pextrb $0, %xmm3, %eax
>>>> -; SSE41-NEXT:    testb $1, %al
>>>> -; SSE41-NEXT:    je LBB62_2
>>>> -; SSE41-NEXT:  ## %bb.1:
>>>> -; SSE41-NEXT:    xorps %xmm2, %xmm2
>>>> -; SSE41-NEXT:    roundsd $9, %xmm0, %xmm2
>>>> -; SSE41-NEXT:  LBB62_2:
>>>> -; SSE41-NEXT:    blendpd {{.*#+}} xmm1 = xmm2[0],xmm1[1]
>>>> -; SSE41-NEXT:    movapd %xmm1, %xmm0
>>>> +; SSE41-NEXT:    roundsd $9, %xmm0, %xmm3
>>>> +; SSE41-NEXT:    cmpeqsd %xmm1, %xmm0
>>>> +; SSE41-NEXT:    andpd %xmm0, %xmm3
>>>> +; SSE41-NEXT:    andnpd %xmm2, %xmm0
>>>> +; SSE41-NEXT:    orpd %xmm3, %xmm0
>>>> +; SSE41-NEXT:    blendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>>>>  ; SSE41-NEXT:    retq
>>>>  ;
>>>>  ; AVX-LABEL: floor_mask_sd_mask8:
>>>>  ; AVX:       ## %bb.0:
>>>> -; AVX-NEXT:    vcmpeqpd %xmm1, %xmm0, %xmm3
>>>> -; AVX-NEXT:    vpextrb $0, %xmm3, %eax
>>>> -; AVX-NEXT:    testb $1, %al
>>>> -; AVX-NEXT:    je LBB62_2
>>>> -; AVX-NEXT:  ## %bb.1:
>>>> -; AVX-NEXT:    vroundsd $9, %xmm0, %xmm0, %xmm2
>>>> -; AVX-NEXT:  LBB62_2:
>>>> -; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm2[0],xmm1[1]
>>>> +; AVX-NEXT:    vroundsd $9, %xmm0, %xmm0, %xmm3
>>>> +; AVX-NEXT:    vcmpeqsd %xmm1, %xmm0, %xmm0
>>>> +; AVX-NEXT:    vblendvpd %xmm0, %xmm3, %xmm2, %xmm0
>>>> +; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>>>>  ; AVX-NEXT:    retq
>>>>  ;
>>>> -; AVX512F-LABEL: floor_mask_sd_mask8:
>>>> -; AVX512F:       ## %bb.0:
>>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>>>> -; AVX512F-NEXT:    vcmpeqpd %zmm1, %zmm0, %k1
>>>> -; AVX512F-NEXT:    vrndscalesd $1, %xmm0, %xmm1, %xmm2 {%k1}
>>>> -; AVX512F-NEXT:    vmovapd %xmm2, %xmm0
>>>> -; AVX512F-NEXT:    vzeroupper
>>>> -; AVX512F-NEXT:    retq
>>>> -;
>>>> -; AVX512VL-LABEL: floor_mask_sd_mask8:
>>>> -; AVX512VL:       ## %bb.0:
>>>> -; AVX512VL-NEXT:    vcmpeqpd %xmm1, %xmm0, %k1
>>>> -; AVX512VL-NEXT:    vrndscalesd $1, %xmm0, %xmm1, %xmm2 {%k1}
>>>> -; AVX512VL-NEXT:    vmovapd %xmm2, %xmm0
>>>> -; AVX512VL-NEXT:    retq
>>>> +; AVX512-LABEL: floor_mask_sd_mask8:
>>>> +; AVX512:       ## %bb.0:
>>>> +; AVX512-NEXT:    vcmpeqsd %xmm1, %xmm0, %k1
>>>> +; AVX512-NEXT:    vrndscalesd $1, %xmm0, %xmm1, %xmm2 {%k1}
>>>> +; AVX512-NEXT:    vmovapd %xmm2, %xmm0
>>>> +; AVX512-NEXT:    retq
>>>>    %mask1 = fcmp oeq <2 x double> %x, %y
>>>>    %mask = extractelement <2 x i1> %mask1, i64 0
>>>>    %s = extractelement <2 x double> %x, i64 0
>>>> @@ -1829,50 +1766,25 @@ define <2 x double> @floor_mask_sd_mask8
>>>>  define <2 x double> @floor_maskz_sd_mask8(<2 x double> %x, <2 x double> %y) nounwind {
>>>>  ; SSE41-LABEL: floor_maskz_sd_mask8:
>>>>  ; SSE41:       ## %bb.0:
>>>> -; SSE41-NEXT:    movapd %xmm0, %xmm2
>>>> -; SSE41-NEXT:    cmpeqpd %xmm1, %xmm2
>>>> -; SSE41-NEXT:    pextrb $0, %xmm2, %eax
>>>> -; SSE41-NEXT:    testb $1, %al
>>>> -; SSE41-NEXT:    jne LBB63_1
>>>> -; SSE41-NEXT:  ## %bb.2:
>>>> -; SSE41-NEXT:    xorpd %xmm0, %xmm0
>>>> -; SSE41-NEXT:    jmp LBB63_3
>>>> -; SSE41-NEXT:  LBB63_1:
>>>> -; SSE41-NEXT:    roundsd $9, %xmm0, %xmm0
>>>> -; SSE41-NEXT:  LBB63_3:
>>>> -; SSE41-NEXT:    blendpd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
>>>> -; SSE41-NEXT:    movapd %xmm1, %xmm0
>>>> +; SSE41-NEXT:    roundsd $9, %xmm0, %xmm2
>>>> +; SSE41-NEXT:    cmpeqsd %xmm1, %xmm0
>>>> +; SSE41-NEXT:    andpd %xmm2, %xmm0
>>>> +; SSE41-NEXT:    blendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>>>>  ; SSE41-NEXT:    retq
>>>>  ;
>>>>  ; AVX-LABEL: floor_maskz_sd_mask8:
>>>>  ; AVX:       ## %bb.0:
>>>> -; AVX-NEXT:    vcmpeqpd %xmm1, %xmm0, %xmm2
>>>> -; AVX-NEXT:    vpextrb $0, %xmm2, %eax
>>>> -; AVX-NEXT:    testb $1, %al
>>>> -; AVX-NEXT:    jne LBB63_1
>>>> -; AVX-NEXT:  ## %bb.2:
>>>> -; AVX-NEXT:    vxorpd %xmm0, %xmm0, %xmm0
>>>> -; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>>>> -; AVX-NEXT:    retq
>>>> -; AVX-NEXT:  LBB63_1:
>>>> -; AVX-NEXT:    vroundsd $9, %xmm0, %xmm0, %xmm0
>>>> +; AVX-NEXT:    vroundsd $9, %xmm0, %xmm0, %xmm2
>>>> +; AVX-NEXT:    vcmpeqsd %xmm1, %xmm0, %xmm0
>>>> +; AVX-NEXT:    vandpd %xmm2, %xmm0, %xmm0
>>>>  ; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>>>>  ; AVX-NEXT:    retq
>>>>  ;
>>>> -; AVX512F-LABEL: floor_maskz_sd_mask8:
>>>> -; AVX512F:       ## %bb.0:
>>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>>>> -; AVX512F-NEXT:    vcmpeqpd %zmm1, %zmm0, %k1
>>>> -; AVX512F-NEXT:    vrndscalesd $1, %xmm0, %xmm1, %xmm0 {%k1} {z}
>>>> -; AVX512F-NEXT:    vzeroupper
>>>> -; AVX512F-NEXT:    retq
>>>> -;
>>>> -; AVX512VL-LABEL: floor_maskz_sd_mask8:
>>>> -; AVX512VL:       ## %bb.0:
>>>> -; AVX512VL-NEXT:    vcmpeqpd %xmm1, %xmm0, %k1
>>>> -; AVX512VL-NEXT:    vrndscalesd $1, %xmm0, %xmm1, %xmm0 {%k1} {z}
>>>> -; AVX512VL-NEXT:    retq
>>>> +; AVX512-LABEL: floor_maskz_sd_mask8:
>>>> +; AVX512:       ## %bb.0:
>>>> +; AVX512-NEXT:    vcmpeqsd %xmm1, %xmm0, %k1
>>>> +; AVX512-NEXT:    vrndscalesd $1, %xmm0, %xmm1, %xmm0 {%k1} {z}
>>>> +; AVX512-NEXT:    retq
>>>>    %mask1 = fcmp oeq <2 x double> %x, %y
>>>>    %mask = extractelement <2 x i1> %mask1, i64 0
>>>>    %s = extractelement <2 x double> %x, i64 0
>>>> @@ -2729,47 +2641,28 @@ define <2 x double> @ceil_maskz_sd_trunc
>>>>  define <4 x float> @ceil_mask_ss_mask8(<4 x float> %x, <4 x float> %y, <4 x float> %w) nounwind {
>>>>  ; SSE41-LABEL: ceil_mask_ss_mask8:
>>>>  ; SSE41:       ## %bb.0:
>>>> -; SSE41-NEXT:    movaps %xmm0, %xmm3
>>>> -; SSE41-NEXT:    cmpeqps %xmm1, %xmm3
>>>> -; SSE41-NEXT:    pextrb $0, %xmm3, %eax
>>>> -; SSE41-NEXT:    testb $1, %al
>>>> -; SSE41-NEXT:    je LBB86_2
>>>> -; SSE41-NEXT:  ## %bb.1:
>>>> -; SSE41-NEXT:    xorps %xmm2, %xmm2
>>>> -; SSE41-NEXT:    roundss $10, %xmm0, %xmm2
>>>> -; SSE41-NEXT:  LBB86_2:
>>>> -; SSE41-NEXT:    blendps {{.*#+}} xmm1 = xmm2[0],xmm1[1,2,3]
>>>> -; SSE41-NEXT:    movaps %xmm1, %xmm0
>>>> +; SSE41-NEXT:    roundss $10, %xmm0, %xmm3
>>>> +; SSE41-NEXT:    cmpeqss %xmm1, %xmm0
>>>> +; SSE41-NEXT:    andps %xmm0, %xmm3
>>>> +; SSE41-NEXT:    andnps %xmm2, %xmm0
>>>> +; SSE41-NEXT:    orps %xmm3, %xmm0
>>>> +; SSE41-NEXT:    blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>>>>  ; SSE41-NEXT:    retq
>>>>  ;
>>>>  ; AVX-LABEL: ceil_mask_ss_mask8:
>>>>  ; AVX:       ## %bb.0:
>>>> -; AVX-NEXT:    vcmpeqps %xmm1, %xmm0, %xmm3
>>>> -; AVX-NEXT:    vpextrb $0, %xmm3, %eax
>>>> -; AVX-NEXT:    testb $1, %al
>>>> -; AVX-NEXT:    je LBB86_2
>>>> -; AVX-NEXT:  ## %bb.1:
>>>> -; AVX-NEXT:    vroundss $10, %xmm0, %xmm0, %xmm2
>>>> -; AVX-NEXT:  LBB86_2:
>>>> -; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm2[0],xmm1[1,2,3]
>>>> +; AVX-NEXT:    vroundss $10, %xmm0, %xmm0, %xmm3
>>>> +; AVX-NEXT:    vcmpeqss %xmm1, %xmm0, %xmm0
>>>> +; AVX-NEXT:    vblendvps %xmm0, %xmm3, %xmm2, %xmm0
>>>> +; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>>>>  ; AVX-NEXT:    retq
>>>>  ;
>>>> -; AVX512F-LABEL: ceil_mask_ss_mask8:
>>>> -; AVX512F:       ## %bb.0:
>>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>>>> -; AVX512F-NEXT:    vcmpeqps %zmm1, %zmm0, %k1
>>>> -; AVX512F-NEXT:    vrndscaless $2, %xmm0, %xmm1, %xmm2 {%k1}
>>>> -; AVX512F-NEXT:    vmovaps %xmm2, %xmm0
>>>> -; AVX512F-NEXT:    vzeroupper
>>>> -; AVX512F-NEXT:    retq
>>>> -;
>>>> -; AVX512VL-LABEL: ceil_mask_ss_mask8:
>>>> -; AVX512VL:       ## %bb.0:
>>>> -; AVX512VL-NEXT:    vcmpeqps %xmm1, %xmm0, %k1
>>>> -; AVX512VL-NEXT:    vrndscaless $2, %xmm0, %xmm1, %xmm2 {%k1}
>>>> -; AVX512VL-NEXT:    vmovaps %xmm2, %xmm0
>>>> -; AVX512VL-NEXT:    retq
>>>> +; AVX512-LABEL: ceil_mask_ss_mask8:
>>>> +; AVX512:       ## %bb.0:
>>>> +; AVX512-NEXT:    vcmpeqss %xmm1, %xmm0, %k1
>>>> +; AVX512-NEXT:    vrndscaless $2, %xmm0, %xmm1, %xmm2 {%k1}
>>>> +; AVX512-NEXT:    vmovaps %xmm2, %xmm0
>>>> +; AVX512-NEXT:    retq
>>>>    %mask1 = fcmp oeq <4 x float> %x, %y
>>>>    %mask = extractelement <4 x i1> %mask1, i64 0
>>>>    %s = extractelement <4 x float> %x, i64 0
>>>> @@ -2783,50 +2676,25 @@ define <4 x float> @ceil_mask_ss_mask8(<
>>>>  define <4 x float> @ceil_maskz_ss_mask8(<4 x float> %x, <4 x float> %y) nounwind {
>>>>  ; SSE41-LABEL: ceil_maskz_ss_mask8:
>>>>  ; SSE41:       ## %bb.0:
>>>> -; SSE41-NEXT:    movaps %xmm0, %xmm2
>>>> -; SSE41-NEXT:    cmpeqps %xmm1, %xmm2
>>>> -; SSE41-NEXT:    pextrb $0, %xmm2, %eax
>>>> -; SSE41-NEXT:    testb $1, %al
>>>> -; SSE41-NEXT:    jne LBB87_1
>>>> -; SSE41-NEXT:  ## %bb.2:
>>>> -; SSE41-NEXT:    xorps %xmm0, %xmm0
>>>> -; SSE41-NEXT:    jmp LBB87_3
>>>> -; SSE41-NEXT:  LBB87_1:
>>>> -; SSE41-NEXT:    roundss $10, %xmm0, %xmm0
>>>> -; SSE41-NEXT:  LBB87_3:
>>>> -; SSE41-NEXT:    blendps {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
>>>> -; SSE41-NEXT:    movaps %xmm1, %xmm0
>>>> +; SSE41-NEXT:    roundss $10, %xmm0, %xmm2
>>>> +; SSE41-NEXT:    cmpeqss %xmm1, %xmm0
>>>> +; SSE41-NEXT:    andps %xmm2, %xmm0
>>>> +; SSE41-NEXT:    blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>>>>  ; SSE41-NEXT:    retq
>>>>  ;
>>>>  ; AVX-LABEL: ceil_maskz_ss_mask8:
>>>>  ; AVX:       ## %bb.0:
>>>> -; AVX-NEXT:    vcmpeqps %xmm1, %xmm0, %xmm2
>>>> -; AVX-NEXT:    vpextrb $0, %xmm2, %eax
>>>> -; AVX-NEXT:    testb $1, %al
>>>> -; AVX-NEXT:    jne LBB87_1
>>>> -; AVX-NEXT:  ## %bb.2:
>>>> -; AVX-NEXT:    vxorps %xmm0, %xmm0, %xmm0
>>>> -; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>>>> -; AVX-NEXT:    retq
>>>> -; AVX-NEXT:  LBB87_1:
>>>> -; AVX-NEXT:    vroundss $10, %xmm0, %xmm0, %xmm0
>>>> +; AVX-NEXT:    vroundss $10, %xmm0, %xmm0, %xmm2
>>>> +; AVX-NEXT:    vcmpeqss %xmm1, %xmm0, %xmm0
>>>> +; AVX-NEXT:    vandps %xmm2, %xmm0, %xmm0
>>>>  ; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>>>>  ; AVX-NEXT:    retq
>>>>  ;
>>>> -; AVX512F-LABEL: ceil_maskz_ss_mask8:
>>>> -; AVX512F:       ## %bb.0:
>>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>>>> -; AVX512F-NEXT:    vcmpeqps %zmm1, %zmm0, %k1
>>>> -; AVX512F-NEXT:    vrndscaless $2, %xmm0, %xmm1, %xmm0 {%k1} {z}
>>>> -; AVX512F-NEXT:    vzeroupper
>>>> -; AVX512F-NEXT:    retq
>>>> -;
>>>> -; AVX512VL-LABEL: ceil_maskz_ss_mask8:
>>>> -; AVX512VL:       ## %bb.0:
>>>> -; AVX512VL-NEXT:    vcmpeqps %xmm1, %xmm0, %k1
>>>> -; AVX512VL-NEXT:    vrndscaless $2, %xmm0, %xmm1, %xmm0 {%k1} {z}
>>>> -; AVX512VL-NEXT:    retq
>>>> +; AVX512-LABEL: ceil_maskz_ss_mask8:
>>>> +; AVX512:       ## %bb.0:
>>>> +; AVX512-NEXT:    vcmpeqss %xmm1, %xmm0, %k1
>>>> +; AVX512-NEXT:    vrndscaless $2, %xmm0, %xmm1, %xmm0 {%k1} {z}
>>>> +; AVX512-NEXT:    retq
>>>>    %mask1 = fcmp oeq <4 x float> %x, %y
>>>>    %mask = extractelement <4 x i1> %mask1, i64 0
>>>>    %s = extractelement <4 x float> %x, i64 0
>>>> @@ -2839,47 +2707,28 @@ define <4 x float> @ceil_maskz_ss_mask8(
>>>>  define <2 x double> @ceil_mask_sd_mask8(<2 x double> %x, <2 x double> %y, <2 x double> %w) nounwind {
>>>>  ; SSE41-LABEL: ceil_mask_sd_mask8:
>>>>  ; SSE41:       ## %bb.0:
>>>> -; SSE41-NEXT:    movapd %xmm0, %xmm3
>>>> -; SSE41-NEXT:    cmpeqpd %xmm1, %xmm3
>>>> -; SSE41-NEXT:    pextrb $0, %xmm3, %eax
>>>> -; SSE41-NEXT:    testb $1, %al
>>>> -; SSE41-NEXT:    je LBB88_2
>>>> -; SSE41-NEXT:  ## %bb.1:
>>>> -; SSE41-NEXT:    xorps %xmm2, %xmm2
>>>> -; SSE41-NEXT:    roundsd $10, %xmm0, %xmm2
>>>> -; SSE41-NEXT:  LBB88_2:
>>>> -; SSE41-NEXT:    blendpd {{.*#+}} xmm1 = xmm2[0],xmm1[1]
>>>> -; SSE41-NEXT:    movapd %xmm1, %xmm0
>>>> +; SSE41-NEXT:    roundsd $10, %xmm0, %xmm3
>>>> +; SSE41-NEXT:    cmpeqsd %xmm1, %xmm0
>>>> +; SSE41-NEXT:    andpd %xmm0, %xmm3
>>>> +; SSE41-NEXT:    andnpd %xmm2, %xmm0
>>>> +; SSE41-NEXT:    orpd %xmm3, %xmm0
>>>> +; SSE41-NEXT:    blendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>>>>  ; SSE41-NEXT:    retq
>>>>  ;
>>>>  ; AVX-LABEL: ceil_mask_sd_mask8:
>>>>  ; AVX:       ## %bb.0:
>>>> -; AVX-NEXT:    vcmpeqpd %xmm1, %xmm0, %xmm3
>>>> -; AVX-NEXT:    vpextrb $0, %xmm3, %eax
>>>> -; AVX-NEXT:    testb $1, %al
>>>> -; AVX-NEXT:    je LBB88_2
>>>> -; AVX-NEXT:  ## %bb.1:
>>>> -; AVX-NEXT:    vroundsd $10, %xmm0, %xmm0, %xmm2
>>>> -; AVX-NEXT:  LBB88_2:
>>>> -; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm2[0],xmm1[1]
>>>> +; AVX-NEXT:    vroundsd $10, %xmm0, %xmm0, %xmm3
>>>> +; AVX-NEXT:    vcmpeqsd %xmm1, %xmm0, %xmm0
>>>> +; AVX-NEXT:    vblendvpd %xmm0, %xmm3, %xmm2, %xmm0
>>>> +; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>>>>  ; AVX-NEXT:    retq
>>>>  ;
>>>> -; AVX512F-LABEL: ceil_mask_sd_mask8:
>>>> -; AVX512F:       ## %bb.0:
>>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>>>> -; AVX512F-NEXT:    vcmpeqpd %zmm1, %zmm0, %k1
>>>> -; AVX512F-NEXT:    vrndscalesd $2, %xmm0, %xmm1, %xmm2 {%k1}
>>>> -; AVX512F-NEXT:    vmovapd %xmm2, %xmm0
>>>> -; AVX512F-NEXT:    vzeroupper
>>>> -; AVX512F-NEXT:    retq
>>>> -;
>>>> -; AVX512VL-LABEL: ceil_mask_sd_mask8:
>>>> -; AVX512VL:       ## %bb.0:
>>>> -; AVX512VL-NEXT:    vcmpeqpd %xmm1, %xmm0, %k1
>>>> -; AVX512VL-NEXT:    vrndscalesd $2, %xmm0, %xmm1, %xmm2 {%k1}
>>>> -; AVX512VL-NEXT:    vmovapd %xmm2, %xmm0
>>>> -; AVX512VL-NEXT:    retq
>>>> +; AVX512-LABEL: ceil_mask_sd_mask8:
>>>> +; AVX512:       ## %bb.0:
>>>> +; AVX512-NEXT:    vcmpeqsd %xmm1, %xmm0, %k1
>>>> +; AVX512-NEXT:    vrndscalesd $2, %xmm0, %xmm1, %xmm2 {%k1}
>>>> +; AVX512-NEXT:    vmovapd %xmm2, %xmm0
>>>> +; AVX512-NEXT:    retq
>>>>    %mask1 = fcmp oeq <2 x double> %x, %y
>>>>    %mask = extractelement <2 x i1> %mask1, i64 0
>>>>    %s = extractelement <2 x double> %x, i64 0
>>>> @@ -2893,50 +2742,25 @@ define <2 x double> @ceil_mask_sd_mask8(
>>>>  define <2 x double> @ceil_maskz_sd_mask8(<2 x double> %x, <2 x double> %y) nounwind {
>>>>  ; SSE41-LABEL: ceil_maskz_sd_mask8:
>>>>  ; SSE41:       ## %bb.0:
>>>> -; SSE41-NEXT:    movapd %xmm0, %xmm2
>>>> -; SSE41-NEXT:    cmpeqpd %xmm1, %xmm2
>>>> -; SSE41-NEXT:    pextrb $0, %xmm2, %eax
>>>> -; SSE41-NEXT:    testb $1, %al
>>>> -; SSE41-NEXT:    jne LBB89_1
>>>> -; SSE41-NEXT:  ## %bb.2:
>>>> -; SSE41-NEXT:    xorpd %xmm0, %xmm0
>>>> -; SSE41-NEXT:    jmp LBB89_3
>>>> -; SSE41-NEXT:  LBB89_1:
>>>> -; SSE41-NEXT:    roundsd $10, %xmm0, %xmm0
>>>> -; SSE41-NEXT:  LBB89_3:
>>>> -; SSE41-NEXT:    blendpd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
>>>> -; SSE41-NEXT:    movapd %xmm1, %xmm0
>>>> +; SSE41-NEXT:    roundsd $10, %xmm0, %xmm2
>>>> +; SSE41-NEXT:    cmpeqsd %xmm1, %xmm0
>>>> +; SSE41-NEXT:    andpd %xmm2, %xmm0
>>>> +; SSE41-NEXT:    blendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>>>>  ; SSE41-NEXT:    retq
>>>>  ;
>>>>  ; AVX-LABEL: ceil_maskz_sd_mask8:
>>>>  ; AVX:       ## %bb.0:
>>>> -; AVX-NEXT:    vcmpeqpd %xmm1, %xmm0, %xmm2
>>>> -; AVX-NEXT:    vpextrb $0, %xmm2, %eax
>>>> -; AVX-NEXT:    testb $1, %al
>>>> -; AVX-NEXT:    jne LBB89_1
>>>> -; AVX-NEXT:  ## %bb.2:
>>>> -; AVX-NEXT:    vxorpd %xmm0, %xmm0, %xmm0
>>>> -; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>>>> -; AVX-NEXT:    retq
>>>> -; AVX-NEXT:  LBB89_1:
>>>> -; AVX-NEXT:    vroundsd $10, %xmm0, %xmm0, %xmm0
>>>> +; AVX-NEXT:    vroundsd $10, %xmm0, %xmm0, %xmm2
>>>> +; AVX-NEXT:    vcmpeqsd %xmm1, %xmm0, %xmm0
>>>> +; AVX-NEXT:    vandpd %xmm2, %xmm0, %xmm0
>>>>  ; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>>>>  ; AVX-NEXT:    retq
>>>>  ;
>>>> -; AVX512F-LABEL: ceil_maskz_sd_mask8:
>>>> -; AVX512F:       ## %bb.0:
>>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>>>> -; AVX512F-NEXT:    vcmpeqpd %zmm1, %zmm0, %k1
>>>> -; AVX512F-NEXT:    vrndscalesd $2, %xmm0, %xmm1, %xmm0 {%k1} {z}
>>>> -; AVX512F-NEXT:    vzeroupper
>>>> -; AVX512F-NEXT:    retq
>>>> -;
>>>> -; AVX512VL-LABEL: ceil_maskz_sd_mask8:
>>>> -; AVX512VL:       ## %bb.0:
>>>> -; AVX512VL-NEXT:    vcmpeqpd %xmm1, %xmm0, %k1
>>>> -; AVX512VL-NEXT:    vrndscalesd $2, %xmm0, %xmm1, %xmm0 {%k1} {z}
>>>> -; AVX512VL-NEXT:    retq
>>>> +; AVX512-LABEL: ceil_maskz_sd_mask8:
>>>> +; AVX512:       ## %bb.0:
>>>> +; AVX512-NEXT:    vcmpeqsd %xmm1, %xmm0, %k1
>>>> +; AVX512-NEXT:    vrndscalesd $2, %xmm0, %xmm1, %xmm0 {%k1} {z}
>>>> +; AVX512-NEXT:    retq
>>>>    %mask1 = fcmp oeq <2 x double> %x, %y
>>>>    %mask = extractelement <2 x i1> %mask1, i64 0
>>>>    %s = extractelement <2 x double> %x, i64 0
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits


More information about the llvm-commits mailing list