[cfe-commits] [PATCH] Optimize vec3 loads/stores

Fri Jul 27 02:41:45 PDT 2012

On Mon, 23 Jul 2012 13:14:07 -0700
Tanya Lattner <lattner at apple.com> wrote:

> 
> On Jul 18, 2012, at 6:51 PM, John McCall wrote:
> 
> > On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote:
> >> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote:
> >>> Hi Tanya,
> >>> Looks good and usefull, but I'm not sure if it should be clang's
> >>> decision if storing and loading vec4s is better than vec3.
> >> 
> >> The idea was to have Clang generate code that the optimizers would
> >> be more likely to do something useful and smart with. I understand
> >> the concern, but I'm not sure where the best place for this would
> >> be then?
> > 
> > Hmm.  The IR size of a <3 x blah> is basically the size of a <4 x
> > blah> anyway;  arguably the backend already has all the information
> > blah> it needs for this.  Dan, what do you think?
> > 
> > One objection to doing this in the frontend is that it's not clear
> > to me that this is a transformation we should be doing if <4 x
> > blah> isn't actually legal for the target.  But I'm amenable to the
> > blah> idea that this belongs here.
> > 
> 
> I do not think its Clangs job to care about this as we already have
> this problem for other vector sizes and its target lowering's job to
> fix it. 
> 
> > I'm also a little uncomfortable with this patch because it's so
> > special-cased to 3.  I understand that that might be all that
> > OpenCL really cares about, but it seems silly to add this code that
> > doesn't also kick in for, say, <7 x i16> or whatever.  It really
> > shouldn't be difficult to generalize.
> 
> While it could be generalized, I am only 100% confident in the
> codegen for vec3 as I know for sure that it improves the code quality
> that is ultimately generated. This is also throughly tested by our
> OpenCL compiler so I am confident we are not breaking anything and we
> are improving performance. 

On the request of several people, I recently enhanced the BB vectorizer
to produce odd-sized vector types (when possible). This was for two
reasons:

 1. Some targets actually have instructions for length-3 vectors
 (mostly for doing things on x,y,z triples), and they wanted
 autovectorization support for these.

 2. This is generally a win elsewhere as well because the odd-length
 vectors will be promoted to even-length vectors (which is good
 compares to leaving scalar code).

In this context, I am curious to know how generating length-4 vectors
in the frontend gives better performance. Is this something that the BB
vectorizer (or any other vectorizer) should be doing as well?

Thanks again,
Hal

> From my perspective, there is no vec7 in
> OpenCL, so there is no strong need for it to be generalized. It seems
> to make sense that it would also improve code quality for vec7. So
> while I could make the change, I don't have real world apps to test
> it. I think this can easily be left as an extension for later.
> 
> > 
> > Also, this optimization assumes that sizeof(element) is a power of
> > 2, because the only thing that the AST guarantees is that the size
> > of the vector type is a power of 2, and nextPow2(3 x E) ==
> > nextPow2(4 x E) is only true in general if E is a power of 2.  Is
> > that reasonable?  Is that checked somewhere?
> > 
> 
> I don't think this is officially checked anywhere but is a general
> rule.
> 
> -Tanya
> 
> 
> > Two minor notes.
> > 
> > +    const llvm::VectorType *VTy =
> > +    dyn_cast<llvm::VectorType>(EltTy);
> > 
> > How can this fail?  If it can, that's interesting;  if it can't, it
> > would be better if the "is this a vector operation" checks used the
> > IR type instead of the AST type.
> > 
> > +      llvm::PointerType *DstPtr =
> > cast<llvm::PointerType>(Addr->getType());
> > +      if (DstPtr->getElementType() != SrcTy) {
> > 
> > This can't fail in the case we care about, so you can just merge
> > this into the if-block above.
> > 
> > John.
> 
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory