[PATCH] D90445: [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic

Thu Nov 5 09:23:21 PST 2020

ABataev added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2848
+        // Vectorizing non-consecutive loads with `llvm.masked.gather`.
+        TreeEntry *TE = newTreeEntry(VL, Bundle /*vectorized*/, S, UserTreeIdx,
+                                     ReuseShuffleIndicies);
----------------
anton-afanasyev wrote:
> ABataev wrote:
> > anton-afanasyev wrote:
> > > ABataev wrote:
> > > > anton-afanasyev wrote:
> > > > > ABataev wrote:
> > > > > > anton-afanasyev wrote:
> > > > > > > ABataev wrote:
> > > > > > > > anton-afanasyev wrote:
> > > > > > > > > ABataev wrote:
> > > > > > > > > > Why it is `vectorized` if this is actually kind of gather?
> > > > > > > > > Actually I think `vectorized` is more appropriate here than `gathered` in sense it is used throughout SLPVectorizer module. Since we actually modify several instruction to _one_. I can change it to `scatter-vectorized`.
> > > > > > > > Could you give a bit more explanation why it should be treated as `vectorized`?
> > > > > > > Ok.
> > > > > > > For now, `enum EntryState {Vectorize, NeedToGather}` has two states: the first is for the bundle that is to be vectorized, the second is to be gathered, but here "gathered" means this bundle will stay untouched after tree vectorization, we need no replace several scalar instructions with one vector instruction. Also we need no handle users of gathered instruction.
> > > > > > > For the "scattered" entry we have opposite case: several scalar instructios to be replaced with `@llvm.masked.gather.*` and `@llvm.masked.scatter.*` intrinsics and we need to handle its users (at least, for `store` = `@llvm.masked.scatter.*` case (the next patch, also using `IsScattered` field)).
> > > > > > > Am I clear?
> > > > > > Shall we handle the users for loads?
> > > > > No, only for stores, of course. Vectorized `load`s are leaves of vectorized tree, whereas stores are seed points. Scattered stores can also be "vectorized" in the sense of being replaced with the one intrinsic.
> > > > Then for loads it is more just a kind a gather, rather than vectorize. Is it required to mark the scalars as vectorized or just enough to mark as gathered? Do we need all this stuff that is required for the vectorized instructions, like inorder, etc.?
> > > Yes, we need this stuff: at least, separate cost estimation, scheduling. We treat this as generalization of vectorization -- like `load <4 x i32>`, `@llvm.masked.gather` loads to `<4 x i32>`, for instance.
> > Why it cannot be model via gather? We gather the scalar loads here but into a different form. Instead of direct gather we just gather the addresses and then do a load. But this still a kind of gather. Do you need to continue scheduling here? No, you don't, just need to extend `Gather` function to generate gather of addresses + `llvm.masked.gather` call.
> It can be model via gather, ok, but only for load case, not for store one. But we already treat ordinary `load <4 x i32>` as vectorized entry, what is the difference with `@llvm.masked.gather <4 x i32>`? I see here consistent symmetry: 
> ```
> ordinary load of vector <-> ordinary store of vector
>              |                 |
> gather of vector <-> scatter of vector
> ```
Because the model is not based on the final outcome. Plus, you missed the real gather of addresses.
Did you think about adding a new gather treeentry for addresses? Looks like you missed the cost for addresses gather.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90445/new/

https://reviews.llvm.org/D90445