[PATCH] D131118: [LV] Add generic scalarization support for unpredicated scalable vectors

Mon Aug 15 11:54:01 PDT 2022

reames added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6870
+          // the generic replicate path which isn't yet implemented.
+          if (!foldTailByMasking())
+            return true;
----------------
david-arm wrote:
> reames wrote:
> > reames wrote:
> > > david-arm wrote:
> > > > I think you can avoid the scalarisation for SVE here simply by asking for the scalarisation cost of the instruction, similarly to how it's done elsewhere. For SVE this should return Invalid. Alternatively you could add a TTI hook to ask if target should scalarise or not, i.e.
> > > > 
> > > >   if (!foldTailByMasking())
> > > >     return isLegalToScalarize();
> > > > 
> > > > We have always considered it 'illegal' to scalarise for SVE.
> > > You seem to be missing the point of the code.
> > > 
> > > The costing here is done at two levels: TTI, and LV.  TTI returns invalid costs for a bunch of cases which LV then turns around and decides to scalarize by reasoning about the scalarization cost and scalar op cost directly.
> > > 
> > > Doing as you suggest here completely prevents the use of the added lowering.  
> > > 
> > > Adding a TTI hook is feasible, but I don't have a good name/semantic for it.  Any suggestions?  
> > Another possibility here - would you be okay with the scalarization being behind a flag for now, and the cost model discussion (i.e. profitability) being done in a separate review?  I'm wanting to avoid too much coupling here, and when I tried various different approaches to costing, a ton of unrelated issues started falling out.  
> That might be an option too. I'll think about this more in the morning!
A further point for your consideration.

The scatter based lowering here has the following cost:
Found an estimated cost of 81 for VF vscale x 4 For instruction:   store i16 %ld, i16* %dst, align 2

Assuming this is a correct cost for the scatter, this is blatantly unprofitable.  Essentially, we've switched from one unprofitable vectorization to another.  As such, I don't think this test is particularly interesting.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131118/new/

https://reviews.llvm.org/D131118