[PATCH] D45733: [DAGCombiner] Unfold scalar masked merge if profitable

Tue Apr 24 13:44:39 PDT 2018

lebedev.ri added a comment.

In https://reviews.llvm.org/D45733#1077183, @lebedev.ri wrote:

> It seems this has uncovered something.
>  It does not look like a miscompilation to me (FIXME or is it?), but the produced code is certainly worse:
>
>    ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
>    ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+bmi | FileCheck %s
>   
>    define float @test_andnotps_scalar(float %a0, float %a1, float* %a2) {
>    ; CHECK-LABEL: test_andnotps_scalar:
>    ; CHECK:       # %bb.0:
>   -; CHECK-NEXT:    movd %xmm0, %eax
>   -; CHECK-NEXT:    movd %xmm1, %ecx
>   -; CHECK-NEXT:    andnl %ecx, %eax, %eax
>   -; CHECK-NEXT:    movd {{.*#+}} xmm1 = mem[0],zero,zero,zero
>   -; CHECK-NEXT:    notl %eax
>   -; CHECK-NEXT:    movd %eax, %xmm0
>   +; CHECK-NEXT:    movd %xmm1, %eax
>   +; CHECK-NEXT:    movd {{.*#+}} xmm2 = mem[0],zero,zero,zero
>    ; CHECK-NEXT:    pand %xmm1, %xmm0
>   +; CHECK-NEXT:    movd %xmm0, %ecx
>   +; CHECK-NEXT:    notl %eax
>   +; CHECK-NEXT:    orl %ecx, %eax
>   +; CHECK-NEXT:    movd %eax, %xmm0
>   +; CHECK-NEXT:    pand %xmm2, %xmm0
>    ; CHECK-NEXT:    retq
>      %tmp = bitcast float %a0 to i32
>      %tmp1 = bitcast float %a1 to i32
>      %tmp2 = xor i32 %tmp, -1
>      %tmp3 = and i32 %tmp2, %tmp1
>      %tmp4 = load float, float* %a2, align 16
>      %tmp5 = bitcast float %tmp4 to i32
>      %tmp6 = xor i32 %tmp3, -1
>      %tmp7 = and i32 %tmp5, %tmp6
>      %tmp8 = bitcast i32 %tmp7 to float
>      ret float %tmp8
>    }
>
>
> We **lost** `andnl`.
>  Discovered accidentally because the same happened to `@test_andnotps`/`@test_andnotpd` in `test/CodeGen/X86/*-schedule.ll` (they are no longer lowered to `andnps`/`andnpd`).

And it happened because both `xor`'s have the same [constant] operand - `-1`.

Repository:
  rL LLVM

https://reviews.llvm.org/D45733