[cfe-commits] [PATCH] Optimize vec3 loads/stores
Tanya Lattner
lattner at apple.com
Tue Jul 24 15:02:56 PDT 2012
On Jul 23, 2012, at 1:24 PM, Dan Gohman wrote:
>
> On Jul 23, 2012, at 11:34 AM, Tanya Lattner <lattner at apple.com> wrote:
>
>>
>> On Jul 19, 2012, at 11:51 AM, Dan Gohman wrote:
>>
>>>
>>> On Jul 18, 2012, at 6:51 PM, John McCall <rjmccall at apple.com> wrote:
>>>
>>>> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote:
>>>>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote:
>>>>>> Hi Tanya,
>>>>>> Looks good and usefull, but I'm not sure if it should be clang's decision if storing and loading vec4s is better than vec3.
>>>>>
>>>>> The idea was to have Clang generate code that the optimizers would be more likely to do something useful and smart with. I understand the concern, but I'm not sure where the best place for this would be then?
>>>>
>>>> Hmm. The IR size of a <3 x blah> is basically the size of a <4 x blah> anyway; arguably the backend already has all the information it needs for this. Dan, what do you think?
>>>
>>> I guess optimizer passes won't be extraordinarily happy about all this
>>> bitcasting and shuffling. It seems to me that we have a problem in that
>>> we're splitting up the high-level task of "lower <3 x blah> to <4 x blah>"
>>> and doing some of it in the front-end and some of it in the backend.
>>> Ideally, we should do it all in one place, for conceptual simplicity, and
>>> to avoid the awkwardness of having the optimizer run in a world that's
>>> half one way and half the other, with awkward little bridges between the
>>> two halves.
>>
>> I think its hard to speculate that the optimizer passes are not happy about the bit cast and shuffling. I'm running with optimizations on and the code is still much better than not having Clang do this "optimization" for vec3.
>
> Sorry for being unclear; I was speculating more about future optimization
> passes. I don't doubt your patch achieves its purpose today.
>
>> I strongly feel that Clang can make the decision to output code like this if it leads to better code in the end.
>
> Ok. What do you think about having clang doing all of the lowering
> of <3 x blah> to <4 x blah> then? I mean all of the aritihmetic,
> function arguments and return values, and so on? In other words, is
> there something special about loads and stores of vec3, or are they
> just one symptom of a broader vec3 problem?
>
For function args and return values, the calling convention will coerce the types (on X86). I haven't had time to totally verify, but I think that arithmetic is done correctly in the backend via widening. So its mostly this one issue that we are trying to address.
While it still may be a good idea of the backends to optimize situations such as this, I think its still ok for Clang to go ahead and effectively widen the vector when doing its code generation since it is a win for most targets (assuming as I can't test them all). vec3 is pretty important for the OpenCL community and we'd like it to have good performance.
Does anyone have a firm objection to this going in? I realize that all backends could be modified to try to handle this, but I don't see this happening in the near future.
-Tanya
> Of course, I'm not asking you do this work right now; I'm asking
> whether this would be a better overall design.
>
> Dan
>
More information about the cfe-commits
mailing list