[libclc] [libclc] Optimize isfpclass-like CLC builtins (PR #124145)

Thu Jan 23 22:56:30 PST 2025

frasercrmck wrote:

> How does using isfpclass avoid scalarization here? I think it's somewhat preferably to use the named operations here, they are subtly different since they canonicalize the input unlike is.fpclass

The builtins we were using before, like `__builtin_isnan`, don't take vector types so we were forced to scalarize.

I actually started looked into adding `__builtin_elementwise_isnan` etc. to clang before realizing that `__builtin_isfpclass(x, 0x3)` accepts vector types and generates the same code as `__builtin_isnan(x)` does for scalar types (and essentially the same for vectors). I don't see any input canonicalization going on before this change.

``` diff
in function _Z5isnanDv2_f:
  in block %entry:
    >   %0 = fcmp uno <2 x float> %a, zeroinitializer
    >   %sext.i = sext <2 x i1> %0 to <2 x i32>
    >   ret <2 x i32> %sext.i
    <   %0 = extractelement <2 x float> %a, i64 0
    <   %1 = fcmp uno float %0, 0.000000e+00
    <   %2 = zext i1 %1 to i32
    <   %vecinit.i = insertelement <2 x i32> poison, i32 %2, i64 0
    <   %3 = extractelement <2 x float> %a, i64 1
    <   %4 = fcmp uno float %3, 0.000000e+00
    <   %5 = zext i1 %4 to i32
    <   %vecinit2.i = insertelement <2 x i32> %vecinit.i, i32 %5, i64 1
    <   %cmp.i = icmp ne <2 x i32> %vecinit2.i, zeroinitializer
    <   %sext.i = sext <2 x i1> %cmp.i to <2 x i32>
    <   ret <2 x i32> %sext.i
```

https://github.com/llvm/llvm-project/pull/124145