[llvm] [LoadStoreVectorizer] Postprocess and merge equivalence classes (PR #114501)

Fri Nov 8 14:17:14 PST 2024

v-klochkov wrote:

> > Using any fixed lookup depth can result into creation of multiple equivalence classes that only differ by 1-level indirection bases.
> 
> The test IR doesn't look canonical / optimal. Other passes are expected to have rewritten the addressing expressions to be a simpler form. If I run your examples through -O3, it mostly vectorizes with some quirks:
> 
> 1. have to hackily avoid a memset that forms
> 2. Ends up with a scalar store + a 7x vector store

@arsenm : Matt - thank you for the review.
I added an incremental commit https://github.com/llvm/llvm-project/pull/114501/commits/427983a9378f8612c054ad219023564ecf803643 to address all of your comments.

Regarding the case itself. The original workload was way more complex, the LIT only shows the idea of it, it is a huge simplification.
For the 1st case/pattern I saw `7x` + `1x` vectors (for loads and stores), which ruined performance - `7x` vectors had to be lowered/legalized to `4x`+`2x`+`1x` producing 4 mem-ops + extra reg-to-reg moves and swizzles instead of a single `8x` mem-operation.

The 2nd pattern in LIT shows potentially worse situation, giving 8 equivalence classes having 1 scalar mem-operation in each.

https://github.com/llvm/llvm-project/pull/114501