[llvm] r174660 - Constrain PowerPC autovectorization to fix bug 15041.

Thu Feb 7 17:38:00 PST 2013

----- Original Message -----
> From: "Bill Schmidt" <wschmidt at linux.vnet.ibm.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvm-commits at cs.uiuc.edu
> Sent: Thursday, February 7, 2013 4:52:20 PM
> Subject: Re: [llvm] r174660 - Constrain PowerPC autovectorization to fix bug 15041.
> 
> On Thu, 2013-02-07 at 15:23 -0600, Hal Finkel wrote:
> > ----- Original Message -----
> > > From: "Bill Schmidt" <wschmidt at linux.vnet.ibm.com>
> > > To: llvm-commits at cs.uiuc.edu
> > > Sent: Thursday, February 7, 2013 2:33:57 PM
> > > Subject: [llvm] r174660 - Constrain PowerPC autovectorization to
> > > fix bug 15041.
> > > 
> > > Author: wschmidt
> > > Date: Thu Feb  7 14:33:57 2013
> > > New Revision: 174660
> > > 
> > > URL: http://llvm.org/viewvc/llvm-project?rev=174660&view=rev
> > > Log:
> > > Constrain PowerPC autovectorization to fix bug 15041.
> > > 
> > > Certain vector operations don't vectorize well with the current
> > > PowerPC implementation.  Element insert/extract performs poorly
> > > without VSX support because Altivec requires going through
> > > memory.
> > > SREM, UREM, and VSELECT all produce bad scalar code.
> > > 
> > > There's a lot of work to do for the cost model before
> > > autovectorization will be tuned well, and this is not an attempt
> > > to
> > > address the larger problem.
> > > 
> > > Modified:
> > >     llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
> > > 
> > > Modified:
> > > llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
> > > URL:
> > > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp?rev=174660&r1=174659&r2=174660&view=diff
> > > ==============================================================================
> > > --- llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
> > > (original)
> > > +++ llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp Thu
> > > Feb
> > >  7 14:33:57 2013
> > > @@ -194,6 +194,25 @@ unsigned PPCTTI::getVectorInstrCost(unsi
> > >                                      unsigned Index) const {
> > >    assert(Val->isVectorTy() && "This must be a vector type");
> > >  
> > > +  const unsigned Awful = 1000;
> > 
> > Does it need to be this high? I would think that:
> > 
> > For extract: Cost == 1 (vector store) + 1 (scalar load)
> > For insert: Cost == 1 (vector store) + 1 (scalar store) + 1 (vector
> > load)
> > For srem/urem: Cost == 1 (vector store) + N (scalar loads) + N*O
> > (operation costs) + N (scalar stores) + 1 (vector load)
> > For vselect: Cost == 1 (vector store) + N (scalar loads) + N*O
> > (selects) + N (scalar stores) + 1 (vector load)
> > 
> > would be pretty accurate. Is that not enough? Do we need additional
> > costs on the loads to account for splitting the operations among
> > different dispatch groups?
> 
> You don't understand the cost of the "load-hit-store" scenario on
> most
> Power processors.  A store followed relatively quickly by a load from
> the same doubleword suffers a large penalty because of the way the
> store
> queues are designed.  We really don't want to vectorize when an
> operation is going to go through memory in that fashion.  At a
> minimum,
> the kinds of costs you describe above need to be inflated by a value
> between 5 and 10 because of processor stall cycles.

Fair enough. We should probably make a comment to this effect in the cost model definition.

> 
> The word "Awful" was chosen for a reason -- the code being generated
> by
> vectorization in paq8p was absolutely horrific. ;)  I ran into the
> same
> thing for urem in another test when I was playing around with getting
> vector-select working (which was less straightforward than I'd
> hoped).
> 
> In all honesty, if a real cost model isn't going to be built, it
> would
> seem better to just shut down auto-vectorization for PowerPC.  I'm
> sure
> it does well for very simple cases, but the things I've seen were
> amazingly bad (a 15x increase in code size and very poor performance
> on
> the simple code in SingleSource/.../factor.c).

Is the default cost model really that bad except for the scalarization (and insert/extract) costs?

> 
> I'm open to trying to do a little better for these particular cases,
> but
> I personally don't have the time to spend to put together a real cost
> model for all instructions.  I'm just trying to patch up the spurting
> artery I'm seeing for these few ugly scenarios.

Okay.

> 
> I have commitments this evening, but will look at Nadem's suggestions
> tomorrow.

I what Nadav was talking about is this:

unsigned BasicTTI::getScalarizationOverhead(Type *Ty, bool Insert,
                                            bool Extract) const {
  assert (Ty->isVectorTy() && "Can only scalarize vectors");
  unsigned Cost = 0;

  for (int i = 0, e = Ty->getVectorNumElements(); i < e; ++i) {
    if (Insert)
      Cost += TopTTI->getVectorInstrCost(Instruction::InsertElement, Ty, i);
    if (Extract)
      Cost += TopTTI->getVectorInstrCost(Instruction::ExtractElement, Ty, i);
  }

  return Cost;
}

We might need to make this overridable by the target as well? I'm not sure that adding the costs in this way will, by itself, capture the load-after-store hazards that you described.

Thanks again,
Hal

> 
> Thanks,
> Bill
> 
> > 
> > Thanks again,
> > Hal
> > 
> > > +
> > > +  // Vector element insert/extract with Altivec is very
> > > expensive.
> > > +  // Until VSX is available, avoid vectorizing loops that
> > > require
> > > +  // these operations.
> > > +  if (Opcode == ISD::EXTRACT_VECTOR_ELT ||
> > > +      Opcode == ISD::INSERT_VECTOR_ELT)
> > > +    return Awful;
> > > +
> > > +  // We don't vectorize SREM/UREM so well.  Constrain the
> > > vectorizer
> > > +  // for those as well.
> > > +  if (Opcode == ISD::SREM || Opcode == ISD::UREM)
> > > +    return Awful;
> > > +
> > > +  // VSELECT is not yet implemented, leading to use of
> > > insert/extract
> > > +  // and ISEL, hence not a good idea.
> > > +  if (Opcode == ISD::VSELECT)
> > > +    return Awful;
> > > +
> > >    return TargetTransformInfo::getVectorInstrCost(Opcode, Val,
> > >    Index);
> > >  }
> > >  
> > > 
> > > 
> > > _______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > > 
> > 
> 
>