[cfe-commits] [PATCH] Optimize vec3 loads/stores

Mon Jul 23 13:24:01 PDT 2012

On Jul 23, 2012, at 11:34 AM, Tanya Lattner <lattner at apple.com> wrote:

> 
> On Jul 19, 2012, at 11:51 AM, Dan Gohman wrote:
> 
>> 
>> On Jul 18, 2012, at 6:51 PM, John McCall <rjmccall at apple.com> wrote:
>> 
>>> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote:
>>>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote:
>>>>> Hi Tanya,
>>>>> Looks good and usefull, but I'm not sure if it should be clang's decision if storing and loading vec4s is better than vec3.
>>>> 
>>>> The idea was to have Clang generate code that the optimizers would be more likely to do something useful and smart with. I understand the concern, but I'm not sure where the best place for this would be then?
>>> 
>>> Hmm.  The IR size of a <3 x blah> is basically the size of a <4 x blah> anyway;  arguably the backend already has all the information it needs for this.  Dan, what do you think?
>> 
>> I guess optimizer passes won't be extraordinarily happy about all this
>> bitcasting and shuffling. It seems to me that we have a problem in that
>> we're splitting up the high-level task of "lower <3 x blah> to <4 x blah>"
>> and doing some of it in the front-end and some of it in the backend.
>> Ideally, we should do it all in one place, for conceptual simplicity, and
>> to avoid the awkwardness of having the optimizer run in a world that's
>> half one way and half the other, with awkward little bridges between the
>> two halves.
> 
> I think its hard to speculate that the optimizer passes are not happy about the bit cast and shuffling. I'm running with optimizations on and the code is still much better than not having Clang do this "optimization" for vec3.

Sorry for being unclear; I was speculating more about future optimization
passes. I don't doubt your patch achieves its purpose today.

> I strongly feel that Clang can make the decision to output code like this if it leads to better code in the end. 

Ok. What do you think about having clang doing all of the lowering
of <3 x blah> to <4 x blah> then? I mean all of the aritihmetic,
function arguments and return values, and so on? In other words, is
there something special about loads and stores of vec3, or are they
just one symptom of a broader vec3 problem?

Of course, I'm not asking you do this work right now; I'm asking
whether this would be a better overall design.

Dan