[llvm] [X86] Adding lowerings for vector ISD::LRINT and ISD::LLRINT (PR #90065)

Tue Oct 1 10:20:22 PDT 2024

topperc wrote:

> It looks like this change is causing different results in some cases when using `@llvm.lrint.v4i64.v4f32`.
> 
> For
> 
> ```
> define i32 @foo(ptr %dst, <4 x float> %in) {
> entry:
>   %0 = call <4 x i64> @llvm.lrint.v4i64.v4f32(<4 x float> %in)
>   %1 = trunc <4 x i64> %0 to <4 x i32>
>   store <4 x i32> %1, ptr %dst, align 4
>   ret i32 0
> }
> ```
> 
> we now generate the assembly below, which does the conversion to signed double word.
> 
> ```
> _foo:                                   ## @foo
>         cvtps2dq        xmm0, xmm0
>         movupd  xmmword ptr [rdi], xmm0
>         xor     eax, eax
>         ret
> ```
> 
> Before this change, we generated the assembly below, which does the conversion to signed quad word, then truncating the result
> 
> ```
> _foo:                                   ## @foo
>         movaps  xmm1, xmm0
>         shufps  xmm1, xmm0, 85                  ## xmm1 = xmm1[1,1],xmm0[1,1]
>         cvtss2si        rax, xmm1
>         cvtss2si        rcx, xmm0
>         movaps  xmm1, xmm0
>         unpckhpd        xmm1, xmm0                      ## xmm1 = xmm1[1],xmm0[1]
>         cvtss2si        rdx, xmm1
>         shufps  xmm0, xmm0, 255                 ## xmm0 = xmm0[3,3,3,3]
>         cvtss2si        rsi, xmm0
>         movd    xmm0, esi
>         movd    xmm1, edx
>         punpckldq       xmm1, xmm0              ## xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
>         movd    xmm0, ecx
>         movd    xmm2, eax
>         punpckldq       xmm0, xmm2              ## xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
>         punpcklqdq      xmm0, xmm1              ## xmm0 = xmm0[0],xmm1[0]
>         movdqu  xmmword ptr [rdi], xmm0
>         xor     eax, eax
>         ret
> ```
> 
> I might be missing something, but I think for inputs like 2^33 new codegen will overflow, while the second won't.
> 
> See https://llvm.godbolt.org/z/E7W89q59M for a comparison `main` vs `18.x`

I think its this combine which isn't valid.

```
  // Try to combine (trunc (vNi64 (lrint x))) to (vNi32 (lrint x)).
  if (Src.getOpcode() == ISD::LRINT && VT.getScalarType() == MVT::i32 &&
      Src.hasOneUse())
    return DAG.getNode(ISD::LRINT, DL, VT, Src.getOperand(0));
```

https://github.com/llvm/llvm-project/pull/90065