[all-commits] [llvm/llvm-project] ba8147: Recommit "[RISCV][ISel] Combine scalable vector ad...

Wed Jan 17 18:30:39 PST 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: ba81477e9cdb5384a10c700ee247562f0f6c03e6
      https://github.com/llvm/llvm-project/commit/ba81477e9cdb5384a10c700ee247562f0f6c03e6
  Author: Chia <sun1011jacobi at gmail.com>
  Date:   2024-01-17 (Wed, 17 Jan 2024)

  Changed paths:
    M llvm/lib/Target/RISCV/RISCVISelLowering.cpp
    M llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll
    M llvm/test/CodeGen/RISCV/rvv/vscale-vw-web-simplification.ll

  Log Message:
  -----------
  Recommit "[RISCV][ISel] Combine scalable vector add/sub/mul with zero/sign extension." (#76785)

This patch was originally introduced in PR #72340, but was reverted due
to a bug on invalid extension combine.

Specifically, we resolve the case in the
https://github.com/llvm/llvm-project/pull/72340#issuecomment-1874810998

```
define <vscale x 1 x i32> @foo(<vscale x 1 x i1> %x, <vscale x 1 x i2> %y) {     
  %a = zext <vscale x 1 x i1> %x to <vscale x 1 x i32>                           
  %b = zext <vscale x 1 x i1> %y to <vscale x 1 x i32>                           
  %c = add <vscale x 1 x i32> %a, %b                                             
  ret <vscale x 1 x i32> %c                                                      
}
```
The previous patch didn't check if the semantic of `ISD::ZERO_EXTEND`
and `ISD::ZERO_EXTEND` is equivalent to the `vsext.vf2` or `vzext.vf2`
(not ensuring the SEW condition on widening Vector Arithmetic
Instructions).

Thanks for @topperc pointing out this bug.

## The original description 
This PR mainly aims at resolving the below missed-optimization case,
while it could also be considered as an extension of the previous patch
https://reviews.llvm.org/D133739?id=

### Missed-Optimization Case
Compiler Explorer: https://godbolt.org/z/GzWzP7Pfh

### Source Code: 
```
define <vscale x 2 x i16> @multiple_users(ptr  %x, ptr  %y, ptr %z) {
  %a = load <vscale x 2 x i8>, ptr %x
  %b = load <vscale x 2 x i8>, ptr %y
  %b2 = load <vscale x 2 x i8>, ptr %z
  %c = sext <vscale x 2 x i8> %a to <vscale x 2 x i16>
  %d = sext <vscale x 2 x i8> %b to <vscale x 2 x i16>
  %d2 = sext <vscale x 2 x i8> %b2 to <vscale x 2 x i16>
  %e = mul <vscale x 2 x i16> %c, %d
  %f = add <vscale x 2 x i16> %c, %d2
  %g = sub <vscale x 2 x i16> %c, %d2
  %h = or <vscale x 2 x i16> %e, %f
  %i = or <vscale x 2 x i16> %h, %g
  ret <vscale x 2 x i16> %i
}
```
### Before This Patch
```
# %bb.0:
        vsetvli a3, zero, e16, mf2, ta, ma
        vle8.v  v8, (a0)
        vle8.v  v9, (a1)
        vle8.v  v10, (a2)
        svf2       v11, v8
        vsext.vf2       v8, v9
        vsext.vf2       v9, v10
        vmul.vv v8, v11, v8
        vadd.vv v10, v11, v9
        vsub.vv v9, v11, v9
        vor.vv  v8, v8, v10
        vor.vv  v8, v8, v9
        ret
```
###  After This Patch 
```
# %bb.0:
	vsetvli	a3, zero, e8, mf4, ta, ma
	vle8.v	v8, (a0)
	vle8.v	v9, (a1)
	vle8.v	v10, (a2)
	vwmul.vv	v11, v8, v9
	vwadd.vv	v9, v8, v10
	vwsub.vv	v12, v8, v10
	vsetvli	zero, zero, e16, mf2, ta, ma
	vor.vv	v8, v11, v9
	vor.vv	v8, v8, v12
	ret
```
We can see Add/Sub/Mul are combined with the Sign Extension.

### Relation to the Patch D133739
The patch D133739 introduced an optimization for folding `ADD_VL`/
`SUB_VL` / `MUL_V` with `VSEXT_VL` / `VZEXT_VL`. However, the patch did
not consider the case of non-fixed length vector case, thus this PR
could also be considered as an extension for the D133739.