[llvm] r174660 - Constrain PowerPC autovectorization to fix bug 15041.
Hal Finkel
hfinkel at anl.gov
Thu Feb 7 17:38:00 PST 2013
----- Original Message -----
> From: "Bill Schmidt" <wschmidt at linux.vnet.ibm.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvm-commits at cs.uiuc.edu
> Sent: Thursday, February 7, 2013 4:52:20 PM
> Subject: Re: [llvm] r174660 - Constrain PowerPC autovectorization to fix bug 15041.
>
> On Thu, 2013-02-07 at 15:23 -0600, Hal Finkel wrote:
> > ----- Original Message -----
> > > From: "Bill Schmidt" <wschmidt at linux.vnet.ibm.com>
> > > To: llvm-commits at cs.uiuc.edu
> > > Sent: Thursday, February 7, 2013 2:33:57 PM
> > > Subject: [llvm] r174660 - Constrain PowerPC autovectorization to
> > > fix bug 15041.
> > >
> > > Author: wschmidt
> > > Date: Thu Feb 7 14:33:57 2013
> > > New Revision: 174660
> > >
> > > URL: http://llvm.org/viewvc/llvm-project?rev=174660&view=rev
> > > Log:
> > > Constrain PowerPC autovectorization to fix bug 15041.
> > >
> > > Certain vector operations don't vectorize well with the current
> > > PowerPC implementation. Element insert/extract performs poorly
> > > without VSX support because Altivec requires going through
> > > memory.
> > > SREM, UREM, and VSELECT all produce bad scalar code.
> > >
> > > There's a lot of work to do for the cost model before
> > > autovectorization will be tuned well, and this is not an attempt
> > > to
> > > address the larger problem.
> > >
> > > Modified:
> > > llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
> > >
> > > Modified:
> > > llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
> > > URL:
> > > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp?rev=174660&r1=174659&r2=174660&view=diff
> > > ==============================================================================
> > > --- llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
> > > (original)
> > > +++ llvm/trunk/lib/Target/PowerPC/PPCTargetTransformInfo.cpp Thu
> > > Feb
> > > 7 14:33:57 2013
> > > @@ -194,6 +194,25 @@ unsigned PPCTTI::getVectorInstrCost(unsi
> > > unsigned Index) const {
> > > assert(Val->isVectorTy() && "This must be a vector type");
> > >
> > > + const unsigned Awful = 1000;
> >
> > Does it need to be this high? I would think that:
> >
> > For extract: Cost == 1 (vector store) + 1 (scalar load)
> > For insert: Cost == 1 (vector store) + 1 (scalar store) + 1 (vector
> > load)
> > For srem/urem: Cost == 1 (vector store) + N (scalar loads) + N*O
> > (operation costs) + N (scalar stores) + 1 (vector load)
> > For vselect: Cost == 1 (vector store) + N (scalar loads) + N*O
> > (selects) + N (scalar stores) + 1 (vector load)
> >
> > would be pretty accurate. Is that not enough? Do we need additional
> > costs on the loads to account for splitting the operations among
> > different dispatch groups?
>
> You don't understand the cost of the "load-hit-store" scenario on
> most
> Power processors. A store followed relatively quickly by a load from
> the same doubleword suffers a large penalty because of the way the
> store
> queues are designed. We really don't want to vectorize when an
> operation is going to go through memory in that fashion. At a
> minimum,
> the kinds of costs you describe above need to be inflated by a value
> between 5 and 10 because of processor stall cycles.
Fair enough. We should probably make a comment to this effect in the cost model definition.
>
> The word "Awful" was chosen for a reason -- the code being generated
> by
> vectorization in paq8p was absolutely horrific. ;) I ran into the
> same
> thing for urem in another test when I was playing around with getting
> vector-select working (which was less straightforward than I'd
> hoped).
>
> In all honesty, if a real cost model isn't going to be built, it
> would
> seem better to just shut down auto-vectorization for PowerPC. I'm
> sure
> it does well for very simple cases, but the things I've seen were
> amazingly bad (a 15x increase in code size and very poor performance
> on
> the simple code in SingleSource/.../factor.c).
Is the default cost model really that bad except for the scalarization (and insert/extract) costs?
>
> I'm open to trying to do a little better for these particular cases,
> but
> I personally don't have the time to spend to put together a real cost
> model for all instructions. I'm just trying to patch up the spurting
> artery I'm seeing for these few ugly scenarios.
Okay.
>
> I have commitments this evening, but will look at Nadem's suggestions
> tomorrow.
I what Nadav was talking about is this:
unsigned BasicTTI::getScalarizationOverhead(Type *Ty, bool Insert,
bool Extract) const {
assert (Ty->isVectorTy() && "Can only scalarize vectors");
unsigned Cost = 0;
for (int i = 0, e = Ty->getVectorNumElements(); i < e; ++i) {
if (Insert)
Cost += TopTTI->getVectorInstrCost(Instruction::InsertElement, Ty, i);
if (Extract)
Cost += TopTTI->getVectorInstrCost(Instruction::ExtractElement, Ty, i);
}
return Cost;
}
We might need to make this overridable by the target as well? I'm not sure that adding the costs in this way will, by itself, capture the load-after-store hazards that you described.
Thanks again,
Hal
>
> Thanks,
> Bill
>
> >
> > Thanks again,
> > Hal
> >
> > > +
> > > + // Vector element insert/extract with Altivec is very
> > > expensive.
> > > + // Until VSX is available, avoid vectorizing loops that
> > > require
> > > + // these operations.
> > > + if (Opcode == ISD::EXTRACT_VECTOR_ELT ||
> > > + Opcode == ISD::INSERT_VECTOR_ELT)
> > > + return Awful;
> > > +
> > > + // We don't vectorize SREM/UREM so well. Constrain the
> > > vectorizer
> > > + // for those as well.
> > > + if (Opcode == ISD::SREM || Opcode == ISD::UREM)
> > > + return Awful;
> > > +
> > > + // VSELECT is not yet implemented, leading to use of
> > > insert/extract
> > > + // and ISEL, hence not a good idea.
> > > + if (Opcode == ISD::VSELECT)
> > > + return Awful;
> > > +
> > > return TargetTransformInfo::getVectorInstrCost(Opcode, Val,
> > > Index);
> > > }
> > >
> > >
> > >
> > > _______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > >
> >
>
>
More information about the llvm-commits
mailing list