[PATCH] D82953: [ARM][MVE] Only tail-fold integer add reductions

Sam Parker via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 2 02:38:25 PDT 2020


samparker added inline comments.


================
Comment at: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp:1337
+    }
+    if (I->getOpcode() != Instruction::Add) {
+      LLVM_DEBUG(dbgs() << "Only add reductions supported\n");
----------------
SjoerdMeijer wrote:
> SjoerdMeijer wrote:
> > samparker wrote:
> > > dmgreen wrote:
> > > > I took a look at the reduction code in ARMLowOverheadLoops. The vectorizer can create add reductions in a number of places and in a number of ways (it can have multiple reductions, multiple reduction steps, look through phi's and selects and even change the type to a smaller bitwidth). I don't think the backend pass can handle all of them yet.
> > > > 
> > > > Considering how expensive reverting can be in comparison, what do you think of just disabling all reductions for the time being and re-enabling them once we know that things are working well enough? Maybe add a FIXME for it? I would expect D75069 to handle integer reduction in most cases eventually from the vectorizer directly and we can probably do something similar for other reduction types.
> > > In my testing, I couldn't get the vectorizer to produce a tail-folded loops with multiple reductions, so I don't think we have to worry about that at the moment. My basic example also didn't even vectorize a FP reduction loop. I think disabling all reductions would be too conservative now, we should have done it from the beginning but, that now we actually can handle something, I think we should try to have this cost function align with LowOverheadLoops implementation.
> > If we really don't want tail-folding, I think that should be controlled in other way than restricting this hook to do nothing. We have option `-prefer-predicate-over-epilog`, and that should do that for now. If source-code modifications and more fine grained control is possible, we even have a pragma.
> > 
> > Like Sam said, I would like to start with integer add reduction, so that we are also testing that part in the back-end.
> I accidentally hit Submit too soon, but wanted to add a bit more nuance to my previous comment: "this hook to do nothing" -> "this hook to do almost nothing".
> 
> I also wanted to add that in general matching up the behaviour here with the backend is probably best, and also that I could look into disallowing multiple reductions, but looks like we don't really have worry about that. 
> 
> And if there are some inefficiencies around integer add reductions, I guess that's fine at this stage (tail-folding can be disabled, see previous comment), and I think it's good to have these inefficiencies explicit and visible.
If we go for matching the limitations of LowOverheadLoops (of which there are many!) I think we want to check that:
- there's only one liveout.
- it's an add.
- and that add isn't using an add (because this breaks the pitifully weak pattern matching, see one_loop_add_add_v16i8 in the reductions.ll test.

Half related to reductions, I'd also prevent if there's a select in the loop - I'm pretty sure min/max kernels will fail to be tail predicated and we can't handle any VPSEL in the loop either.



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82953/new/

https://reviews.llvm.org/D82953





More information about the llvm-commits mailing list