[llvm] r174660 - Constrain PowerPC autovectorization to fix bug 15041.

Thu Feb 7 14:52:20 PST 2013

On Thu, 2013-02-07 at 15:23 -0600, Hal Finkel wrote:
> ----- Original Message -----
> > From: "Bill Schmidt" <wschmidt at linux.vnet.ibm.com>
> > To: llvm-commits at cs.uiuc.edu
> > Sent: Thursday, February 7, 2013 2:33:57 PM
> > Subject: [llvm] r174660 - Constrain PowerPC autovectorization to fix bug 15041.
> > 
> > Author: wschmidt
> > Date: Thu Feb  7 14:33:57 2013
> > New Revision: 174660
> > 
> > URL: http://llvm.org/viewvc/llvm-project?rev=174660&view=rev
> > Log:
> > Constrain PowerPC autovectorization to fix bug 15041.
> > 
> > Certain vector operations don't vectorize well with the current
> > PowerPC implementation.  Element insert/extract performs poorly
> > without VSX support because Altivec requires going through memory.
> > SREM, UREM, and VSELECT all produce bad scalar code.
> > 
> > There's a lot of work to do for the cost model before
> > autovectorization will be tuned well, and this is not an attempt to
> > address the larger problem.
> > 
> > Modified:
> >     llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
> > 
> > Modified: llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
> > URL:
> > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp?rev=174660&r1=174659&r2=174660&view=diff
> > ==============================================================================
> > --- llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
> > (original)
> > +++ llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp Thu Feb
> >  7 14:33:57 2013
> > @@ -194,6 +194,25 @@ unsigned PPCTTI::getVectorInstrCost(unsi
> >                                      unsigned Index) const {
> >    assert(Val->isVectorTy() && "This must be a vector type");
> >  
> > +  const unsigned Awful = 1000;
> 
> Does it need to be this high? I would think that:
> 
> For extract: Cost == 1 (vector store) + 1 (scalar load)
> For insert: Cost == 1 (vector store) + 1 (scalar store) + 1 (vector load)
> For srem/urem: Cost == 1 (vector store) + N (scalar loads) + N*O (operation costs) + N (scalar stores) + 1 (vector load)
> For vselect: Cost == 1 (vector store) + N (scalar loads) + N*O (selects) + N (scalar stores) + 1 (vector load)
> 
> would be pretty accurate. Is that not enough? Do we need additional costs on the loads to account for splitting the operations among different dispatch groups?

You don't understand the cost of the "load-hit-store" scenario on most
Power processors.  A store followed relatively quickly by a load from
the same doubleword suffers a large penalty because of the way the store
queues are designed.  We really don't want to vectorize when an
operation is going to go through memory in that fashion.  At a minimum,
the kinds of costs you describe above need to be inflated by a value
between 5 and 10 because of processor stall cycles.

The word "Awful" was chosen for a reason -- the code being generated by
vectorization in paq8p was absolutely horrific. ;)  I ran into the same
thing for urem in another test when I was playing around with getting
vector-select working (which was less straightforward than I'd hoped).

In all honesty, if a real cost model isn't going to be built, it would
seem better to just shut down auto-vectorization for PowerPC.  I'm sure
it does well for very simple cases, but the things I've seen were
amazingly bad (a 15x increase in code size and very poor performance on
the simple code in SingleSource/.../factor.c).

I'm open to trying to do a little better for these particular cases, but
I personally don't have the time to spend to put together a real cost
model for all instructions.  I'm just trying to patch up the spurting
artery I'm seeing for these few ugly scenarios.

I have commitments this evening, but will look at Nadem's suggestions
tomorrow.

Thanks,
Bill

> 
> Thanks again,
> Hal
> 
> > +
> > +  // Vector element insert/extract with Altivec is very expensive.
> > +  // Until VSX is available, avoid vectorizing loops that require
> > +  // these operations.
> > +  if (Opcode == ISD::EXTRACT_VECTOR_ELT ||
> > +      Opcode == ISD::INSERT_VECTOR_ELT)
> > +    return Awful;
> > +
> > +  // We don't vectorize SREM/UREM so well.  Constrain the vectorizer
> > +  // for those as well.
> > +  if (Opcode == ISD::SREM || Opcode == ISD::UREM)
> > +    return Awful;
> > +
> > +  // VSELECT is not yet implemented, leading to use of
> > insert/extract
> > +  // and ISEL, hence not a good idea.
> > +  if (Opcode == ISD::VSELECT)
> > +    return Awful;
> > +
> >    return TargetTransformInfo::getVectorInstrCost(Opcode, Val,
> >    Index);
> >  }
> >  
> > 
> > 
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > 
>