[cfe-commits] [PATCH] Optimize vec3 loads/stores

Tue Jul 31 13:24:54 PDT 2012

On Tue, 31 Jul 2012 12:59:33 -0700
Doug Gregor <doug.gregor at gmail.com> wrote:

> On Fri, Jul 27, 2012 at 10:42 AM, Tanya Lattner <lattner at apple.com>
> wrote:
> >
> > On Jul 27, 2012, at 2:41 AM, Hal Finkel wrote:
> >
> >> On Mon, 23 Jul 2012 13:14:07 -0700
> >> Tanya Lattner <lattner at apple.com> wrote:
> >>
> >>>
> >>> On Jul 18, 2012, at 6:51 PM, John McCall wrote:
> >>>
> >>>> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote:
> >>>>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote:
> >>>>>> Hi Tanya,
> >>>>>> Looks good and usefull, but I'm not sure if it should be
> >>>>>> clang's decision if storing and loading vec4s is better than
> >>>>>> vec3.
> >>>>>
> >>>>> The idea was to have Clang generate code that the optimizers
> >>>>> would be more likely to do something useful and smart with. I
> >>>>> understand the concern, but I'm not sure where the best place
> >>>>> for this would be then?
> >>>>
> >>>> Hmm.  The IR size of a <3 x blah> is basically the size of a <4 x
> >>>> blah> anyway;  arguably the backend already has all the
> >>>> blah> information it needs for this.  Dan, what do you think?
> >>>>
> >>>> One objection to doing this in the frontend is that it's not
> >>>> clear to me that this is a transformation we should be doing if
> >>>> <4 x
> >>>> blah> isn't actually legal for the target.  But I'm amenable to
> >>>> blah> the idea that this belongs here.
> >>>>
> >>>
> >>> I do not think its Clangs job to care about this as we already
> >>> have this problem for other vector sizes and its target
> >>> lowering's job to fix it.
> >>>
> >>>> I'm also a little uncomfortable with this patch because it's so
> >>>> special-cased to 3.  I understand that that might be all that
> >>>> OpenCL really cares about, but it seems silly to add this code
> >>>> that doesn't also kick in for, say, <7 x i16> or whatever.  It
> >>>> really shouldn't be difficult to generalize.
> >>>
> >>> While it could be generalized, I am only 100% confident in the
> >>> codegen for vec3 as I know for sure that it improves the code
> >>> quality that is ultimately generated. This is also throughly
> >>> tested by our OpenCL compiler so I am confident we are not
> >>> breaking anything and we are improving performance.
> >>
> >> On the request of several people, I recently enhanced the BB
> >> vectorizer to produce odd-sized vector types (when possible). This
> >> was for two reasons:
> >>
> >> 1. Some targets actually have instructions for length-3 vectors
> >> (mostly for doing things on x,y,z triples), and they wanted
> >> autovectorization support for these.
> >>
> >> 2. This is generally a win elsewhere as well because the odd-length
> >> vectors will be promoted to even-length vectors (which is good
> >> compares to leaving scalar code).
> >>
> >> In this context, I am curious to know how generating length-4
> >> vectors in the frontend gives better performance. Is this
> >> something that the BB vectorizer (or any other vectorizer) should
> >> be doing as well?
> >>
> >
> > While I think you are doing great work with your vectorizer, its
> > not something we can rely on yet to get back this performance since
> > its not on by default.
> >
> > A couple more comments about my patch. First, vec3 is an OpenCL
> > type that is defined in the spec that says it should have the same
> > size and alignment as vec4. So generating this code pattern for
> > load/stores makes sense in that context. I can only enable this if
> > its OpenCL, but I think anyone who uses vec3 would want this
> > performance win.
> >
> > Secondly, I'm not arguing that this is the ultimate fix, but it is
> > a valid, simple, and easy to remove once other areas of the
> > compiler are improved to handle this case.
> >
> > After discussing with others, the proper fix might be something
> > like this: 1. Enhance target data to encode the "native" vector
> > types for the target, along the same lines as the "native" integer
> > types that it already can do. 2. Enhance the IR optimizer to force
> > vectors to those types when it can get away with it.
> >
> > This optimization needs to be early enough that other mid-level
> > optimizations can take advantage of it.
> >
> > I'd still like to check this in as there is not an alternative
> > solution that exists right now. Because there is some debate, I
> > think the code owner needs to have final say here (cc-ing Doug).
> 
> I think it makes sense for this to go in. If our optimizers improve to
> the point where they can handle vec3 directly as well as/better than
> vec3-lowered-as-vec4, we can revisit this topic.

Would it make sense to have a command-line switch to enable the old
behavior so that the status of the backend optimizers can be easily
checked?

 -Hal

> 
>   - Doug

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory