[PATCH] D67392: [ARM][ParallelDSP] Change smlad insertion order

Eli Friedman via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 24 09:55:40 PDT 2019


efriedma added a comment.

The current implementation of comparing loads is quadratic, yes, but you could use a different algorithm, like splitting loads into a base pointer plus an offset, and constructing a map from base pointers to load offsets.  Maybe not worthwhile, though; a threshold might be good enough.



================
Comment at: lib/Target/ARM/ARMParallelDSP.cpp:654
+    else
+      V = DT->dominates(cast<Instruction>(A), cast<Instruction>(B)) ? B : A;
+
----------------
samparker wrote:
> efriedma wrote:
> > samparker wrote:
> > > efriedma wrote:
> > > > You decided not to modify this dominates() call?
> > > I assumed it wouldn't be worth it... My understanding is that I'd have to create a new ordered block each time, because the block will change after each call.
> > My general concern here would be that the pass is O(N^2) in the number of transformations in a given BB.  (If you unroll a loop containing a transformable operation N times, for example.)  This contributes to that, constructing the OrderedBB later in InsertParallelMACs contributes to that.  But there are a bunch of other places with similar issues... for example, RecordMemoryOps has a loop that's O(N^2) in the number of loads.
> > 
> > Actually, thinking about it a bit more, I have another concern; you might not be picking a legal insertion point.  The instruction that produces the accumulator could be anything: for example, a phi, or an invoke, or an exception-handling instruction.
> What about if I delayed the RecordMemoryOps logic until after we've discovered the reduction? Then, at least, we won't be paying the cost for the vast majority of cases.
> 
> The accumulator could be a phi, but I don't think that's an issue here as it would dominate all other instructions. Only the original accumulator input value can be a phi or a non-instruction and, as we don't compare same values, I don't see how a phi could be in the insertion point.
> 
> And please pardon my ignorance, but could you elaborate why an invoke would be an issue? And why exception handling needs to be considered?
Oh, if the accumulator isn't an instruction in the same block as the multiply, we always choose the multiply as the insertion point?  That makes sense... but it probably deserves a comment noting that invariant.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67392/new/

https://reviews.llvm.org/D67392





More information about the llvm-commits mailing list