[llvm] [RISCV] Improve llvm.reduce.fmaximum/minimum lowering (PR #75484)

Thu Dec 14 10:09:22 PST 2023

simeonkr wrote:

> Are you sure the FMINIMUM/FMAXIMUM came from the type legalizer? If the vector type is legal, the type legalizer shouldn't touch it. I would expect LegalizeVectorOps or LegalizeDAG to be where it gets broken down. So I think we could use custom lowering instead of a DAG combine.

It happens in `DAGTypeLegalizer::SplitVecOp_VECREDUCE()`, at least for illegal vector types. I'm assuming that we care about them, given that we consider them when testing other reduction operations? Without the patch, here are the DAGs after DAG combine and type legalization, respectively:

```
Optimized lowered selection DAG: %bb.0 'vreduce_fmaximum_v128f32:'
SelectionDAG has 9 nodes:
  t0: ch,glue = EntryToken
        t2: i32,ch = CopyFromReg t0, Register:i32 %0
      t5: v128f32,ch = load<(load (s4096) from %ir.x)> t0, t2, undef:i32
    t6: f32 = vecreduce_fmaximum t5
  t8: ch,glue = CopyToReg t0, Register:f32 $f10_f, t6
  t9: ch = RISCVISD::RET_GLUE t8, Register:f32 $f10_f, t8:1

Type-legalized selection DAG: %bb.0 'vreduce_fmaximum_v128f32:'
SelectionDAG has 20 nodes:
  t0: ch,glue = EntryToken
  t2: i32,ch = CopyFromReg t0, Register:i32 %0
          t22: v32f32,ch = load<(load (s1024) from %ir.x, align 512)> t0, t2, undef:i32
          t17: v32f32,ch = load<(load (s1024) from %ir.x + 256, align 256, basealign 512)> t0, t12, undef:i32
        t26: v32f32 = fmaximum t22, t17
            t23: i32 = add nuw t2, Constant:i32<128>
          t24: v32f32,ch = load<(load (s1024) from %ir.x + 128, basealign 512)> t0, t23, undef:i32
            t19: i32 = add nuw t12, Constant:i32<128>
          t20: v32f32,ch = load<(load (s1024) from %ir.x + 384, basealign 512)> t0, t19, undef:i32
        t27: v32f32 = fmaximum t24, t20
      t28: v32f32 = fmaximum t26, t27
    t29: f32 = vecreduce_fmaximum t28
  t8: ch,glue = CopyToReg t0, Register:f32 $f10_f, t29
  t12: i32 = add nuw t2, Constant:i32<256>
  t9: ch = RISCVISD::RET_GLUE t8, Register:f32 $f10_f, t8:1
```

Hence if I handle the reduction in `RISCVTargetLowering::LowerOperation()` as I originally tried to do, it will be too late to avoid the FMAXIMUM/FMINIMUMS. It still leads to some improvement but not a significant amount.

https://github.com/llvm/llvm-project/pull/75484