[llvm] r174660 - Constrain PowerPC autovectorization to fix bug 15041.

Bill Schmidt wschmidt at linux.vnet.ibm.com
Fri Feb 8 09:59:35 PST 2013


On Fri, 2013-02-08 at 10:58 -0600, Hal Finkel wrote:
> ----- Original Message -----
> > From: "Bill Schmidt" <wschmidt at linux.vnet.ibm.com>
> > To: "Nadav Rotem" <nrotem at apple.com>
> > Cc: llvm-commits at cs.uiuc.edu, hfinkel at anl.gov
> > Sent: Friday, February 8, 2013 8:59:13 AM
> > Subject: Re: [llvm] r174660 - Constrain PowerPC autovectorization to fix bug 15041.
> > 
> > On Fri, 2013-02-08 at 07:20 -0600, Bill Schmidt wrote:
> > > 
> > > On Thu, 2013-02-07 at 12:52 -0800, Nadav Rotem wrote:
> > > > Hi Bill,
> > > > 
> > > > 
> > > > Returning a really high constant would prevent vectorization, but
> > > > we
> > > > can do better. If you look at the ARM and X86 backend you will
> > > > see
> > > > that we have code to estimate the 'scalarization' cost.  You can
> > > > model
> > > > the expensive transition of data from scalar to vector registers
> > > > by
> > > > assigning a high cost to the 'Insert/ExtractElement'
> > > > instructions.
> > > > This is important because in some loops we have perfectly
> > > > vectorizable
> > > > code with one 'scalarized' instruction. We still want to catch
> > > > these
> > > > cases. Additionally, the vectorizer is not the only user of the
> > > > cost
> > > > model. Some other transformations may want to estimate the cost
> > > > of two
> > > > alternatives, and in that case 'awful' is not a useful answer.
> > > 
> > > Thanks, Nadav!  Now that I'm using the correct opcode space,
> > > penalizing
> > > just the scalarization at least solves the problem for paq8p.  I'll
> > > spot
> > > check some of the other problems I saw, but hopefully this will
> > > kill the
> > > worst offenders.
> > 
> > Attached is my current proposed patch.  Please let me know what you
> > think.  This stops vectorization of the paq8p and factor cases; it's
> > possible that the LHS penalty will need to be raised if we see other
> > cases where scalarization is occurring and shouldn't be.  Thanks for
> > all
> > the help!
> 
> The penalty factor of 12 seems about right, but may need to be a little higher. To model the pipeline flush, I'd think that it should be essentially:
>   (pipeline depth)*(ilp factor)
> I can imagine this being ~6*2, but the P7 can actually have more than 2 in-flight instructions. Guessing from the diagram in the Sinharoy, et al. 2011 paper, I'd estimate that flushing all pipelines costs, at maximum, ~53 in-flight instructions. Of course, the average fill percentage is probably lower, but we might want to use a worst-case cost here.
> 

>From what I recall when running code through simulation modelers, the
factor of 12 isn't too far off of what occurs in the wild.  I don't
think we want to get too close to worst-case, which is not going to
happen very often in practice, though using a slightly higher number
might be right.  I think for now I'd like to err on the
less-conservative side so that we expose any additional issues before
the next release.  We might consider bumping it up slightly at that
time.

Thanks,
Bill

> LGTM.
> 
> Thanks again,
> Hal
> 
> > 
> > Bill
> > 
> > > 
> > > Bill
> > > > 
> > > > 
> > > > Thanks,
> > > > Nadav
> > > > 
> > > > 
> > > > On Feb 7, 2013, at 12:33 PM, Bill Schmidt
> > > > <wschmidt at linux.vnet.ibm.com> wrote:
> > > > 
> > > > > +  const unsigned Awful = 1000;
> > > > > +
> > > > > +  // Vector element insert/extract with Altivec is very
> > > > > expensive.
> > > > > +  // Until VSX is available, avoid vectorizing loops that
> > > > > require
> > > > > +  // these operations.
> > > > > +  if (Opcode == ISD::EXTRACT_VECTOR_ELT ||
> > > > > +      Opcode == ISD::INSERT_VECTOR_ELT)
> > > > > +    return Awful;
> > > > > +
> > > > > +  // We don't vectorize SREM/UREM so well.  Constrain the
> > > > > vectorizer
> > > > > +  // for those as well.
> > > > > +  if (Opcode == ISD::SREM || Opcode == ISD::UREM)
> > > > > +    return Awful;
> > > > > +
> > > > > +  // VSELECT is not yet implemented, leading to use of
> > > > > insert/extract
> > > > > +  // and ISEL, hence not a good idea.
> > > > > +  if (Opcode == ISD::VSELECT)
> > > > > +    return Awful;
> > > > > +
> > > > >   return TargetTransformInfo::getVectorInstrCost(Opcode, Val,
> > > > > Index);
> > > > > }
> > > > > 
> > > > 
> > 
> 




More information about the llvm-commits mailing list