[cfe-commits] [PATCH] Optimize vec3 loads/stores

Fri Jul 27 02:30:12 PDT 2012

On Wed, 25 Jul 2012 06:41:07 +0000
"Benyei, Guy" <guy.benyei at intel.com> wrote:

> Hi Tanya,
> Since in your patch the fourth element is always undef, I guess the
> three elements vectors can always be detected in the backend even
> with the patch.

Unfortunately, I'm not sure that your guess is correct. InstSimplify
does not always combine shuffles (because it tries not to create new
shuffle masks), and so any code which does this might need to look back
through several shuffles to discover the undef.

Worse, looking back through loads and stores might be nearly impossible.

 -Hal

> I think it's good enough to know that this patch
> cannot cause any real problem, so I think it's OK.
> 
> Thanks
>     Guy
> 
> -----Original Message-----
> From: cfe-commits-bounces at cs.uiuc.edu
> [mailto:cfe-commits-bounces at cs.uiuc.edu] On Behalf Of Tanya Lattner
> Sent: Wednesday, July 25, 2012 01:03 To: llvm cfe
> Cc: Dan Gohman
> Subject: Re: [cfe-commits] [PATCH] Optimize vec3 loads/stores
> 
> 
> On Jul 23, 2012, at 1:24 PM, Dan Gohman wrote:
> 
> > 
> > On Jul 23, 2012, at 11:34 AM, Tanya Lattner <lattner at apple.com>
> > wrote:
> > 
> >> 
> >> On Jul 19, 2012, at 11:51 AM, Dan Gohman wrote:
> >> 
> >>> 
> >>> On Jul 18, 2012, at 6:51 PM, John McCall <rjmccall at apple.com>
> >>> wrote:
> >>> 
> >>>> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote:
> >>>>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote:
> >>>>>> Hi Tanya,
> >>>>>> Looks good and usefull, but I'm not sure if it should be
> >>>>>> clang's decision if storing and loading vec4s is better than
> >>>>>> vec3.
> >>>>> 
> >>>>> The idea was to have Clang generate code that the optimizers
> >>>>> would be more likely to do something useful and smart with. I
> >>>>> understand the concern, but I'm not sure where the best place
> >>>>> for this would be then?
> >>>> 
> >>>> Hmm.  The IR size of a <3 x blah> is basically the size of a <4
> >>>> x blah> anyway;  arguably the backend already has all the
> >>>> information it needs for this.  Dan, what do you think?
> >>> 
> >>> I guess optimizer passes won't be extraordinarily happy about all 
> >>> this bitcasting and shuffling. It seems to me that we have a
> >>> problem in that we're splitting up the high-level task of "lower
> >>> <3 x blah> to <4 x blah>" and doing some of it in the front-end
> >>> and some of it in the backend. Ideally, we should do it all in
> >>> one place, for conceptual simplicity, and to avoid the
> >>> awkwardness of having the optimizer run in a world that's half
> >>> one way and half the other, with awkward little bridges between
> >>> the two halves.
> >> 
> >> I think its hard to speculate that the optimizer passes are not
> >> happy about the bit cast and shuffling. I'm running with
> >> optimizations on and the code is still much better than not having
> >> Clang do this "optimization" for vec3.
> > 
> > Sorry for being unclear; I was speculating more about future 
> > optimization passes. I don't doubt your patch achieves its purpose
> > today.
> > 
> >> I strongly feel that Clang can make the decision to output code
> >> like this if it leads to better code in the end. 
> > 
> > Ok. What do you think about having clang doing all of the lowering
> > of <3 x blah> to <4 x blah> then? I mean all of the aritihmetic,
> > function arguments and return values, and so on? In other words, is
> > there something special about loads and stores of vec3, or are they
> > just one symptom of a broader vec3 problem?
> > 
> 
> For function args and return values, the calling convention will
> coerce the types (on X86). I haven't had time to totally verify, but
> I think that arithmetic is done correctly in the backend via
> widening. So its mostly this one issue that we are trying to address. 
> 
> While it still may be a good idea of the backends to optimize
> situations such as this, I think its still ok for Clang to go ahead
> and effectively widen the vector when doing its code generation since
> it is a win for most targets (assuming as I can't test them all).
> vec3 is pretty important for the OpenCL community and we'd like it to
> have good performance. 
> 
> Does anyone have a firm objection to this going in? I realize that
> all backends could be modified to try to handle this, but I don't see
> this happening in the near future. 
> 
> -Tanya
> 
> 
> 
> > Of course, I'm not asking you do this work right now; I'm asking 
> > whether this would be a better overall design.
> > 
> > Dan
> > 
> 
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> 
> 
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory