[PATCH] D90445: [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic

Alexey Bataev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 5 10:04:31 PST 2020


ABataev added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2848
+        // Vectorizing non-consecutive loads with `llvm.masked.gather`.
+        TreeEntry *TE = newTreeEntry(VL, Bundle /*vectorized*/, S, UserTreeIdx,
+                                     ReuseShuffleIndicies);
----------------
anton-afanasyev wrote:
> ABataev wrote:
> > anton-afanasyev wrote:
> > > ABataev wrote:
> > > > anton-afanasyev wrote:
> > > > > ABataev wrote:
> > > > > > anton-afanasyev wrote:
> > > > > > > ABataev wrote:
> > > > > > > > anton-afanasyev wrote:
> > > > > > > > > ABataev wrote:
> > > > > > > > > > anton-afanasyev wrote:
> > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > Why it is `vectorized` if this is actually kind of gather?
> > > > > > > > > > > Actually I think `vectorized` is more appropriate here than `gathered` in sense it is used throughout SLPVectorizer module. Since we actually modify several instruction to _one_. I can change it to `scatter-vectorized`.
> > > > > > > > > > Could you give a bit more explanation why it should be treated as `vectorized`?
> > > > > > > > > Ok.
> > > > > > > > > For now, `enum EntryState {Vectorize, NeedToGather}` has two states: the first is for the bundle that is to be vectorized, the second is to be gathered, but here "gathered" means this bundle will stay untouched after tree vectorization, we need no replace several scalar instructions with one vector instruction. Also we need no handle users of gathered instruction.
> > > > > > > > > For the "scattered" entry we have opposite case: several scalar instructios to be replaced with `@llvm.masked.gather.*` and `@llvm.masked.scatter.*` intrinsics and we need to handle its users (at least, for `store` = `@llvm.masked.scatter.*` case (the next patch, also using `IsScattered` field)).
> > > > > > > > > Am I clear?
> > > > > > > > Shall we handle the users for loads?
> > > > > > > No, only for stores, of course. Vectorized `load`s are leaves of vectorized tree, whereas stores are seed points. Scattered stores can also be "vectorized" in the sense of being replaced with the one intrinsic.
> > > > > > Then for loads it is more just a kind a gather, rather than vectorize. Is it required to mark the scalars as vectorized or just enough to mark as gathered? Do we need all this stuff that is required for the vectorized instructions, like inorder, etc.?
> > > > > Yes, we need this stuff: at least, separate cost estimation, scheduling. We treat this as generalization of vectorization -- like `load <4 x i32>`, `@llvm.masked.gather` loads to `<4 x i32>`, for instance.
> > > > Why it cannot be model via gather? We gather the scalar loads here but into a different form. Instead of direct gather we just gather the addresses and then do a load. But this still a kind of gather. Do you need to continue scheduling here? No, you don't, just need to extend `Gather` function to generate gather of addresses + `llvm.masked.gather` call.
> > > It can be model via gather, ok, but only for load case, not for store one. But we already treat ordinary `load <4 x i32>` as vectorized entry, what is the difference with `@llvm.masked.gather <4 x i32>`? I see here consistent symmetry: 
> > > ```
> > > ordinary load of vector <-> ordinary store of vector
> > >              |                 |
> > > gather of vector <-> scatter of vector
> > > ```
> > Because the model is not based on the final outcome. Plus, you missed the real gather of addresses.
> > Did you think about adding a new gather treeentry for addresses? Looks like you missed the cost for addresses gather.
> Why is model not based on the final outcome? The cost is separate issue -- I can fix this at `getEntryCost()`.
Because later this code can be transformed into something else, anyway, for example gather can be transformed into a single instruction. The model itself is based on the scalars, how they are transformed into a vector form.
I would suggest just to add 2 tree entries in this case. One for new scatter vectorization form and one for gathering of addresses. It would simplify handling of this in future and won't break the model. Plus, you would not need to add a new cost calculation and rely on the exisiting cost model for the gather of addresses.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90445/new/

https://reviews.llvm.org/D90445



More information about the llvm-commits mailing list