[llvm] [RISCV] Fold extract_vector_elt of a load into the scalar load (PR #76151)

Mon Jan 1 19:47:11 PST 2024

lukel97 wrote:

> > If there is similar optimization in DAGCombiner, is there any reason why we don't use it?
> 
> Thanks, I didn't explain myself clearly. They will be run after legalization, not for riscv. Eg：
> 
> ```
> Type-legalized selection DAG: %bb.0 'src:'
> SelectionDAG has 10 nodes:
>   t0: ch,glue = EntryToken
>         t2: i64,ch = CopyFromReg t0, Register:i64 %0
>       t5: v4i64,ch = load<(load (s256) from %ir.a, align 8)> t0, t2, undef:i64
>     t7: i64 = extract_vector_elt t5, Constant:i64<1>
>   t9: ch,glue = CopyToReg t0, Register:i64 $x10, t7
>   t10: ch = RISCVISD::RET_GLUE t9, Register:i64 $x10, t9:1
> 
> Legalized selection DAG: %bb.0 'src:'
> SelectionDAG has 20 nodes:
>   t0: ch,glue = EntryToken
>                 t2: i64,ch = CopyFromReg t0, Register:i64 %0
>               t22: nxv2i64,ch = llvm.riscv.vle<(load (s256) from %ir.a, align 8)> t0, TargetConstant:i64<10108>, undef:nxv2i64, t2, Constant:i64<4>
>             t23: v4i64 = extract_subvector t22, Constant:i64<0>
>           t13: nxv2i64 = insert_subvector undef:nxv2i64, t23, Constant:i64<0>
>         t14: nxv1i64 = extract_subvector t13, Constant:i64<0>
>         t15: nxv1i1 = RISCVISD::VMSET_VL Constant:i64<1>
>       t18: nxv1i64 = RISCVISD::VSLIDEDOWN_VL undef:nxv1i64, t14, Constant:i64<1>, t15, Constant:i64<1>, TargetConstant:i64<3>
>     t19: i64 = RISCVISD::VMV_X_S t18
>   t9: ch,glue = CopyToReg t0, Register:i64 $x10, t19
>   t10: ch = RISCVISD::RET_GLUE t9, Register:i64 $x10, t9:1
> ```

I took a brief look in `scalarizeExtractedVectorLoad`, it runs before legalization but only for non constant indices, and provided ISD::LOAD needs to be legal or custom for the element type. So for the test case from this PR:

```
define i32 @variable_index(ptr %v, i32 %i) {
  %a = load <8 x i32>, ptr %v
  %b = extractelement <8 x i32> %a, i32 %i
  ret i32 %b
}
```

We get a scalar load on RV32, but a vle32 + vslidedown on RV64. 

But I think adding a target combine here makes sense to handle the constant indices case.

https://github.com/llvm/llvm-project/pull/76151