[cfe-commits] [PATCH] Optimize vec3 loads/stores

Mon Jul 23 11:34:47 PDT 2012

On Jul 19, 2012, at 11:51 AM, Dan Gohman wrote:

> 
> On Jul 18, 2012, at 6:51 PM, John McCall <rjmccall at apple.com> wrote:
> 
>> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote:
>>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote:
>>>> Hi Tanya,
>>>> Looks good and usefull, but I'm not sure if it should be clang's decision if storing and loading vec4s is better than vec3.
>>> 
>>> The idea was to have Clang generate code that the optimizers would be more likely to do something useful and smart with. I understand the concern, but I'm not sure where the best place for this would be then?
>> 
>> Hmm.  The IR size of a <3 x blah> is basically the size of a <4 x blah> anyway;  arguably the backend already has all the information it needs for this.  Dan, what do you think?
> 
> I guess optimizer passes won't be extraordinarily happy about all this
> bitcasting and shuffling. It seems to me that we have a problem in that
> we're splitting up the high-level task of "lower <3 x blah> to <4 x blah>"
> and doing some of it in the front-end and some of it in the backend.
> Ideally, we should do it all in one place, for conceptual simplicity, and
> to avoid the awkwardness of having the optimizer run in a world that's
> half one way and half the other, with awkward little bridges between the
> two halves.

I think its hard to speculate that the optimizer passes are not happy about the bit cast and shuffling. I'm running with optimizations on and the code is still much better than not having Clang do this "optimization" for vec3. I strongly feel that Clang can make the decision to output code like this if it leads to better code in the end. 

-Tanya

> 
> I think you could ma
> ke a reasonable case that clang should do all of it,
> including all the arithmetic. And if LLVM IR isn't quite flexible enough,
> then fix it. <3 x blah> is just a front-end abstraction, from a certain
> perspective, so it's reasonable to expect front-ends to lower it.
> 
> You could also make a reasonable case that clang should do none of it,
> even loads and stores. Just use <3 x blah> everywhere and make the
> backend do all the work. I believe the backend does have all the
> information it needs to do a reasonable job with this, in theory.
> 
>> Also, this optimization assumes that sizeof(element) is a power of 2, because the only thing that the AST guarantees is that the size of the vector type is a power of 2, and nextPow2(3 x E) == nextPow2(4 x E) is only true in general if E is a power of 2.  Is that reasonable?  Is that checked somewhere?
> 
> If sizeof(element) is not a power of 2, we have other issues, and it's
> debatable whether it's worth the effort to try to address them. For the
> time being, I'd suggest just having the front-end issue an error if it
> encounters such things, at least for now.
> 

> Dan
>