[cfe-commits] [PATCH] Optimize vec3 loads/stores
Hal Finkel
hfinkel at anl.gov
Fri Jul 27 02:41:45 PDT 2012
On Mon, 23 Jul 2012 13:14:07 -0700
Tanya Lattner <lattner at apple.com> wrote:
>
> On Jul 18, 2012, at 6:51 PM, John McCall wrote:
>
> > On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote:
> >> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote:
> >>> Hi Tanya,
> >>> Looks good and usefull, but I'm not sure if it should be clang's
> >>> decision if storing and loading vec4s is better than vec3.
> >>
> >> The idea was to have Clang generate code that the optimizers would
> >> be more likely to do something useful and smart with. I understand
> >> the concern, but I'm not sure where the best place for this would
> >> be then?
> >
> > Hmm. The IR size of a <3 x blah> is basically the size of a <4 x
> > blah> anyway; arguably the backend already has all the information
> > blah> it needs for this. Dan, what do you think?
> >
> > One objection to doing this in the frontend is that it's not clear
> > to me that this is a transformation we should be doing if <4 x
> > blah> isn't actually legal for the target. But I'm amenable to the
> > blah> idea that this belongs here.
> >
>
> I do not think its Clangs job to care about this as we already have
> this problem for other vector sizes and its target lowering's job to
> fix it.
>
> > I'm also a little uncomfortable with this patch because it's so
> > special-cased to 3. I understand that that might be all that
> > OpenCL really cares about, but it seems silly to add this code that
> > doesn't also kick in for, say, <7 x i16> or whatever. It really
> > shouldn't be difficult to generalize.
>
> While it could be generalized, I am only 100% confident in the
> codegen for vec3 as I know for sure that it improves the code quality
> that is ultimately generated. This is also throughly tested by our
> OpenCL compiler so I am confident we are not breaking anything and we
> are improving performance.
On the request of several people, I recently enhanced the BB vectorizer
to produce odd-sized vector types (when possible). This was for two
reasons:
1. Some targets actually have instructions for length-3 vectors
(mostly for doing things on x,y,z triples), and they wanted
autovectorization support for these.
2. This is generally a win elsewhere as well because the odd-length
vectors will be promoted to even-length vectors (which is good
compares to leaving scalar code).
In this context, I am curious to know how generating length-4 vectors
in the frontend gives better performance. Is this something that the BB
vectorizer (or any other vectorizer) should be doing as well?
Thanks again,
Hal
> From my perspective, there is no vec7 in
> OpenCL, so there is no strong need for it to be generalized. It seems
> to make sense that it would also improve code quality for vec7. So
> while I could make the change, I don't have real world apps to test
> it. I think this can easily be left as an extension for later.
>
> >
> > Also, this optimization assumes that sizeof(element) is a power of
> > 2, because the only thing that the AST guarantees is that the size
> > of the vector type is a power of 2, and nextPow2(3 x E) ==
> > nextPow2(4 x E) is only true in general if E is a power of 2. Is
> > that reasonable? Is that checked somewhere?
> >
>
> I don't think this is officially checked anywhere but is a general
> rule.
>
> -Tanya
>
>
> > Two minor notes.
> >
> > + const llvm::VectorType *VTy =
> > + dyn_cast<llvm::VectorType>(EltTy);
> >
> > How can this fail? If it can, that's interesting; if it can't, it
> > would be better if the "is this a vector operation" checks used the
> > IR type instead of the AST type.
> >
> > + llvm::PointerType *DstPtr =
> > cast<llvm::PointerType>(Addr->getType());
> > + if (DstPtr->getElementType() != SrcTy) {
> >
> > This can't fail in the case we care about, so you can just merge
> > this into the if-block above.
> >
> > John.
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
More information about the cfe-commits
mailing list