[cfe-commits] [PATCH] Optimize vec3 loads/stores

Thu Jul 19 11:51:30 PDT 2012

On Jul 18, 2012, at 6:51 PM, John McCall <rjmccall at apple.com> wrote:

> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote:
>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote:
>>> Hi Tanya,
>>> Looks good and usefull, but I'm not sure if it should be clang's decision if storing and loading vec4s is better than vec3.
>> 
>> The idea was to have Clang generate code that the optimizers would be more likely to do something useful and smart with. I understand the concern, but I'm not sure where the best place for this would be then?
> 
> Hmm.  The IR size of a <3 x blah> is basically the size of a <4 x blah> anyway;  arguably the backend already has all the information it needs for this.  Dan, what do you think?

I guess optimizer passes won't be extraordinarily happy about all this
bitcasting and shuffling. It seems to me that we have a problem in that
we're splitting up the high-level task of "lower <3 x blah> to <4 x blah>"
and doing some of it in the front-end and some of it in the backend.
Ideally, we should do it all in one place, for conceptual simplicity, and
to avoid the awkwardness of having the optimizer run in a world that's
half one way and half the other, with awkward little bridges between the
two halves.

I think you could make a reasonable case that clang should do all of it,
including all the arithmetic. And if LLVM IR isn't quite flexible enough,
then fix it. <3 x blah> is just a front-end abstraction, from a certain
perspective, so it's reasonable to expect front-ends to lower it.

You could also make a reasonable case that clang should do none of it,
even loads and stores. Just use <3 x blah> everywhere and make the
backend do all the work. I believe the backend does have all the
information it needs to do a reasonable job with this, in theory.

> Also, this optimization assumes that sizeof(element) is a power of 2, because the only thing that the AST guarantees is that the size of the vector type is a power of 2, and nextPow2(3 x E) == nextPow2(4 x E) is only true in general if E is a power of 2.  Is that reasonable?  Is that checked somewhere?

If sizeof(element) is not a power of 2, we have other issues, and it's
debatable whether it's worth the effort to try to address them. For the
time being, I'd suggest just having the front-end issue an error if it
encounters such things, at least for now.

Dan