[PATCH] D100107: [AArch64][SVE] Combine add and index_vector

Wed Apr 14 02:58:09 PDT 2021

paulwalker-arm added a comment.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13437
+// add(index_vector(zero, step), dup(X)) -> index_vector(X, step)
+static SDValue performAddIndexVectorCombine(SDNode *N, SelectionDAG &DAG) {
+  if (N->getOpcode() != ISD::ADD)
----------------
junparser wrote:
> sdesmalen wrote:
> > junparser wrote:
> > > junparser wrote:
> > > > sdesmalen wrote:
> > > > > Hi @junparser, there are a few things to take into account here:
> > > > > - Folding the add into the index_vector may be more expensive in practice, if `index_vector(zero, step)` is  created once, then subsequent `add`s may be cheaper than having multiple `index_vector` instructions which each need to calculate the expansion of the (very similar) series. It's probably best to check that the node (add) has only a single result.
> > > > > - The form where it uses an `add` may come in useful when selecting an addressing mode for the gather/scatter operations, in which case: `add(nxv4i32 dup(X), nxv4i32 index_vector(zero, step))` can be implemented with a scalar+vector addressing mode such as `[x0, z0.s]`. You may want to add a check that the result is not used as the address in a load/store operation.
> > > > > 
> > > > > This can probably be implemented quite easily in TableGen with some patterns that try to do the fold, but where the PatFrag itself has a condition that it only has a single use that this use is not the address in a MemSDNode. See e.g. AArch64mul_p_oneuse where it also checks for a single use before selecting an MLA/MLS.
> > > > > 
> > > > > Because of how we have currently implemented the legalization of gathers/scatters, I don't think you can currently test the condition that the node is not used in a MemSDNode, but I'm not sure if that matters much.
> > > > Hi @sdesmalen   It looks reasonable to me to check add has only one use.   as for addressing mode for gather/scatter, have we already implemented it now? I haven't see it in trunk. Since my next plan is to combine load/store with index_vector.
> > > > Hi @junparser, there are a few things to take into account here:
> > > > - Folding the add into the index_vector may be more expensive in practice, if `index_vector(zero, step)` is  created once, then subsequent `add`s may be cheaper than having multiple `index_vector` instructions which each need to calculate the expansion of the (very similar) series. It's probably best to check that the node (add) has only a single result.
> > > > - The form where it uses an `add` may come in useful when selecting an addressing mode for the gather/scatter operations, in which case: `add(nxv4i32 dup(X), nxv4i32 index_vector(zero, step))` can be implemented with a scalar+vector addressing mode such as `[x0, z0.s]`. You may want to add a check that the result is not used as the address in a load/store operation.
> > > > 
> > > > This can probably be implemented quite easily in TableGen with some patterns that try to do the fold, but where the PatFrag itself has a condition that it only has a single use that this use is not the address in a MemSDNode. See e.g. AArch64mul_p_oneuse where it also checks for a single use before selecting an MLA/MLS.
> > > > 
> > > > Because of how we have currently implemented the legalization of gathers/scatters, I don't think you can currently test the condition that the node is not used in a MemSDNode, but I'm not sure if that matters much.
> > > 
> > > 
> > The gather/scatter nodes are not implemented as patterns at the moment, only as custom dagcombine code. So adding a check that it isn't used in an addressing mode would currently have no effect, but it probably would if we change our gather/scatter implementation to not use the intrinsics, but use common nodes instead. This is something we want to do in the not-too-distant future. You can alternatively also leave a FIXME.
> > 
> > > Since my next plan is to combine load/store with index_vector
> > Can you elaborate what you mean by that?
> We already have node like GLD1_MERGE_ZERO and combine function like performGLD1Combine, so rather than implementing it as tablegen pattern, i prefer to do it in here, and then do some more work in performGLD1Combine with index_vector, that the plan.
> 
> Also what is the difference between scalar+vector and. vector+ index addressing mode? since we can also use vector+ index addressing mode after this combination. 
Hi @junparser, please can you provide an example for what you're ultimately trying to achieve.

As Sander points out the decision as to where things should live is not not clear cut and I've nothing against doing things within DAGCombine if it helps reduce isel pattern explosion.  That said I am very keen to cut down the need for duplicated combines due to the lack of a canonicalised DAG.  With regards to the addressing modes we should already have good support by looking for ADD based arithmetic used to compute addresses.  Unless there's a good reason I'd rather not have to duplicate this logic to handle INDEX_VECTOR.

Personally I see INDEX_VECTOR as a last stage optimisation whereby everything else has had a chance to remove the ADD (or MUL if we're talking about a stride) and thus the last option is to merge stray ADDs into the INDEX_VECTOR.

For what it's worth the target specific INDEX_VECTOR existed before the common STEP_VECTOR nodes so I think there's a good chance we'll put effort into ensuring any early uses of INDEX_VECTOR are first converted to STEP_VECTOR to maximum the effectiveness of future common STEP_VECTOR combines.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100107/new/

https://reviews.llvm.org/D100107