[cfe-commits] [PATCH] Optimize vec3 loads/stores

Tue Jul 24 23:41:07 PDT 2012

Hi Tanya,
Since in your patch the fourth element is always undef, I guess the three elements vectors can always be detected in the backend even with the patch. 
I think it's good enough to know that this patch cannot cause any real problem, so I think it's OK.

Thanks
    Guy

-----Original Message-----
From: cfe-commits-bounces at cs.uiuc.edu [mailto:cfe-commits-bounces at cs.uiuc.edu] On Behalf Of Tanya Lattner
Sent: Wednesday, July 25, 2012 01:03
To: llvm cfe
Cc: Dan Gohman
Subject: Re: [cfe-commits] [PATCH] Optimize vec3 loads/stores

On Jul 23, 2012, at 1:24 PM, Dan Gohman wrote:

> 
> On Jul 23, 2012, at 11:34 AM, Tanya Lattner <lattner at apple.com> wrote:
> 
>> 
>> On Jul 19, 2012, at 11:51 AM, Dan Gohman wrote:
>> 
>>> 
>>> On Jul 18, 2012, at 6:51 PM, John McCall <rjmccall at apple.com> wrote:
>>> 
>>>> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote:
>>>>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote:
>>>>>> Hi Tanya,
>>>>>> Looks good and usefull, but I'm not sure if it should be clang's decision if storing and loading vec4s is better than vec3.
>>>>> 
>>>>> The idea was to have Clang generate code that the optimizers would be more likely to do something useful and smart with. I understand the concern, but I'm not sure where the best place for this would be then?
>>>> 
>>>> Hmm.  The IR size of a <3 x blah> is basically the size of a <4 x blah> anyway;  arguably the backend already has all the information it needs for this.  Dan, what do you think?
>>> 
>>> I guess optimizer passes won't be extraordinarily happy about all 
>>> this bitcasting and shuffling. It seems to me that we have a problem 
>>> in that we're splitting up the high-level task of "lower <3 x blah> to <4 x blah>"
>>> and doing some of it in the front-end and some of it in the backend.
>>> Ideally, we should do it all in one place, for conceptual 
>>> simplicity, and to avoid the awkwardness of having the optimizer run 
>>> in a world that's half one way and half the other, with awkward 
>>> little bridges between the two halves.
>> 
>> I think its hard to speculate that the optimizer passes are not happy about the bit cast and shuffling. I'm running with optimizations on and the code is still much better than not having Clang do this "optimization" for vec3.
> 
> Sorry for being unclear; I was speculating more about future 
> optimization passes. I don't doubt your patch achieves its purpose today.
> 
>> I strongly feel that Clang can make the decision to output code like this if it leads to better code in the end. 
> 
> Ok. What do you think about having clang doing all of the lowering of 
> <3 x blah> to <4 x blah> then? I mean all of the aritihmetic, function 
> arguments and return values, and so on? In other words, is there 
> something special about loads and stores of vec3, or are they just one 
> symptom of a broader vec3 problem?
> 

For function args and return values, the calling convention will coerce the types (on X86). I haven't had time to totally verify, but I think that arithmetic is done correctly in the backend via widening. So its mostly this one issue that we are trying to address. 

While it still may be a good idea of the backends to optimize situations such as this, I think its still ok for Clang to go ahead and effectively widen the vector when doing its code generation since it is a win for most targets (assuming as I can't test them all). vec3 is pretty important for the OpenCL community and we'd like it to have good performance. 

Does anyone have a firm objection to this going in? I realize that all backends could be modified to try to handle this, but I don't see this happening in the near future. 

-Tanya

> Of course, I'm not asking you do this work right now; I'm asking 
> whether this would be a better overall design.
> 
> Dan
> 

_______________________________________________
cfe-commits mailing list
cfe-commits at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.