[PATCH] D100107: [AArch64][SVE] Combine add and index_vector

Tue Apr 13 20:23:47 PDT 2021

junparser added a comment.

In D100107#2685779 <https://reviews.llvm.org/D100107#2685779>, @sdesmalen wrote:

> In D100107#2682168 <https://reviews.llvm.org/D100107#2682168>, @junparser wrote:
>
>> What I want know something more is what is the boundary between tablegen pattern match and dag combine? I never figure this out.  Just use this case as example,  we can implement the feature in both place, but I don't know how to handle commutivity in tablegen (maybe SDNPCommutative? I don't know.) , and their effects on the next step: combine load/store with index_vector.
>
> There isn't a firm rule to follow re tablegen pattern vs manual dag combine, but I think the steer is that when the combine is needed after legalization and it's possible to specify as a pattern, a pattern is preferred. In this case, probably a pattern would be a clean way to implement it. I haven't used SDNPCommutative before, but that looks like it should do the trick.

Thanks for explain this.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13437
+// add(index_vector(zero, step), dup(X)) -> index_vector(X, step)
+static SDValue performAddIndexVectorCombine(SDNode *N, SelectionDAG &DAG) {
+  if (N->getOpcode() != ISD::ADD)
----------------
sdesmalen wrote:
> junparser wrote:
> > junparser wrote:
> > > sdesmalen wrote:
> > > > Hi @junparser, there are a few things to take into account here:
> > > > - Folding the add into the index_vector may be more expensive in practice, if `index_vector(zero, step)` is  created once, then subsequent `add`s may be cheaper than having multiple `index_vector` instructions which each need to calculate the expansion of the (very similar) series. It's probably best to check that the node (add) has only a single result.
> > > > - The form where it uses an `add` may come in useful when selecting an addressing mode for the gather/scatter operations, in which case: `add(nxv4i32 dup(X), nxv4i32 index_vector(zero, step))` can be implemented with a scalar+vector addressing mode such as `[x0, z0.s]`. You may want to add a check that the result is not used as the address in a load/store operation.
> > > > 
> > > > This can probably be implemented quite easily in TableGen with some patterns that try to do the fold, but where the PatFrag itself has a condition that it only has a single use that this use is not the address in a MemSDNode. See e.g. AArch64mul_p_oneuse where it also checks for a single use before selecting an MLA/MLS.
> > > > 
> > > > Because of how we have currently implemented the legalization of gathers/scatters, I don't think you can currently test the condition that the node is not used in a MemSDNode, but I'm not sure if that matters much.
> > > Hi @sdesmalen   It looks reasonable to me to check add has only one use.   as for addressing mode for gather/scatter, have we already implemented it now? I haven't see it in trunk. Since my next plan is to combine load/store with index_vector.
> > > Hi @junparser, there are a few things to take into account here:
> > > - Folding the add into the index_vector may be more expensive in practice, if `index_vector(zero, step)` is  created once, then subsequent `add`s may be cheaper than having multiple `index_vector` instructions which each need to calculate the expansion of the (very similar) series. It's probably best to check that the node (add) has only a single result.
> > > - The form where it uses an `add` may come in useful when selecting an addressing mode for the gather/scatter operations, in which case: `add(nxv4i32 dup(X), nxv4i32 index_vector(zero, step))` can be implemented with a scalar+vector addressing mode such as `[x0, z0.s]`. You may want to add a check that the result is not used as the address in a load/store operation.
> > > 
> > > This can probably be implemented quite easily in TableGen with some patterns that try to do the fold, but where the PatFrag itself has a condition that it only has a single use that this use is not the address in a MemSDNode. See e.g. AArch64mul_p_oneuse where it also checks for a single use before selecting an MLA/MLS.
> > > 
> > > Because of how we have currently implemented the legalization of gathers/scatters, I don't think you can currently test the condition that the node is not used in a MemSDNode, but I'm not sure if that matters much.
> > 
> > 
> The gather/scatter nodes are not implemented as patterns at the moment, only as custom dagcombine code. So adding a check that it isn't used in an addressing mode would currently have no effect, but it probably would if we change our gather/scatter implementation to not use the intrinsics, but use common nodes instead. This is something we want to do in the not-too-distant future. You can alternatively also leave a FIXME.
> 
> > Since my next plan is to combine load/store with index_vector
> Can you elaborate what you mean by that?
We already have node like GLD1_MERGE_ZERO and combine function like performGLD1Combine, so rather than implementing it as tablegen pattern, i prefer to do it in here, and then do some more work in performGLD1Combine with index_vector, that the plan.

Also what is the difference between scalar+vector and. vector+ index addressing mode? since we can also use vector+ index addressing mode after this combination. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100107/new/

https://reviews.llvm.org/D100107