[llvm] [RISCV] Support vectorizing FMINIMUMNUM and FMAXIMUMNUM (PR #135727)

Sat Apr 26 23:08:41 PDT 2025

wangpc-pp wrote:

> > The CodeGen is OK, but I think we need more tests (I think you just need to reorganize the structure):
> 
> If I understand your request correctly, I think that they existed. See `llvm/test/CodeGen/RISCV/rvv/maximumnum-minimumnum.ll`
> 
> > 1. `@llvm.maximumnum`/`@llvm.minimumnum` with scalable vector types.
> 
> ```
> define void @fmin32(ptr noundef readonly captures(none) %input1, ptr noundef readonly captures(none) %input2, ptr noundef writeonly captures(none) %output) {
> entry:
>   %input23 = ptrtoint ptr %input2 to i64
>   %input12 = ptrtoint ptr %input1 to i64
>   %output1 = ptrtoint ptr %output to i64
>   br label %vector.ph
> 
> vector.ph:
>   %9 = call i64 @llvm.vscale.i64()
>   %10 = mul i64 %9, 4
>   %n.mod.vf = urem i64 4096, %10
>   %n.vec = sub i64 4096, %n.mod.vf
>   %11 = call i64 @llvm.vscale.i64()
>   %12 = mul i64 %11, 4
>   br label %vector.body
> 
> vector.body:                                      ; preds = %vector.body, %vector.ph
>   %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
>   %13 = getelementptr inbounds nuw [4096 x float], ptr %input1, i64 0, i64 %index
>   %14 = getelementptr inbounds nuw float, ptr %13, i32 0
>   %wide.load = load <vscale x 4 x float>, ptr %14, align 4
>   %15 = getelementptr inbounds nuw [4096 x float], ptr %input2, i64 0, i64 %index
>   %16 = getelementptr inbounds nuw float, ptr %15, i32 0
>   %wide.load5 = load <vscale x 4 x float>, ptr %16, align 4
>   %17 = call <vscale x 4 x float> @llvm.minimumnum.nxv4f32(<vscale x 4 x float> %wide.load, <vscale x 4 x float> %wide.load5)
>   %18 = getelementptr inbounds nuw [4096 x float], ptr %output, i64 0, i64 %index
>   %19 = getelementptr inbounds nuw float, ptr %18, i32 0
>   store <vscale x 4 x float> %17, ptr %19, align 4
>   %index.next = add nuw i64 %index, %12
>   %20 = icmp eq i64 %index.next, %n.vec
>   br i1 %20, label %exit, label %vector.body
> 
> exit:                                             ; preds = %middle.block, %for.body
>   ret void
> }
> ```
> 
> > 2. `@llvm.maximumnum`/`@llvm.minimumnum` with fixed vector types.
> 
> ```
> define <2 x double> @max_v2f64(<2 x double> %a, <2 x double> %b) {
> entry:
>   %c = call <2 x double> @llvm.maximumnum.v2f64(<2 x double> %a, <2 x double> %b)
>   ret <2 x double> %c
> }
> ```
> 
> > And also the VP intrinsics version (if exists). Each is in a standalone file.
> 
> We haven't define the VP flavor yet (for all architectures and common code) If we add them, of course we will support RISC-V.

I see them, just put them in standalone files (see how we test others). That means you need to add four tests:

1. fmaximumnum-sdnode.ll (scalable vectors)
2. fminimumnum-sdnode.ll (scalable vectors)
3. fixed-vectors-fmaximumnum.ll (fixed vectors)
4. fixed-vectors-fminimumnum.ll (fixed vectors)

And also, for scalable vectors, you don't need to test the whole vectorized loop, just test test intrinsic itself.

https://github.com/llvm/llvm-project/pull/135727