[llvm] [VectorCombine] Add type shrinking and zext propagation for fixed-width vector types (PR #104606)

Mon Aug 19 09:11:16 PDT 2024

igogo-x86 wrote:

There was a failing test, `PhaseOrdering/X86/pr50555.ll`, and it could be explained with this reduced test case:

```
target triple = "x86_64-unknown-unknown"

define <8 x i16>  @trunc_through_one_add(<8 x i8> %a) {
entry:
  %1 = zext <8 x i8> %a to <8 x i16>
  %2 = lshr <8 x i16> %1, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
  %3 = add nuw nsw <8 x i16> %1, %2
  ret <8 x i16> %3
}
```

ZExt has two users, and my patch allowed us to calculate the cost of the IR when moving only one ZExt, but for X86 the costs are these:

**Cost of lshr <8 x i16> is 15
Cost of lshr <8 x i8> is 14
Cost of zext is 1
Cost of leaving all as it is 17
Cost of shrinking is 16**

It makes it profitable to shrink types, but I am not sure this is a good optimisation (the number of assembly instructions is increasing after running LLC):

```
define <8 x i16> @trunc_through_one_add(<8 x i8> %a) {
entry:
  %0 = zext <8 x i8> %a to <8 x i16>
  %1 = lshr <8 x i8> %a, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
  %2 = zext <8 x i8> %1 to <8 x i16>
  %3 = add nuw nsw <8 x i16> %0, %2
  ret <8 x i16> %3
}
```

So, I turned off ZExt propagation if it can not propagate through all users, but I wish the cost model were more accurate.

https://github.com/llvm/llvm-project/pull/104606