[PATCH] D30810: Preserve vec3 type.

Anastasia Stulova via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Apr 4 09:08:58 PDT 2017


Anastasia added a comment.

In https://reviews.llvm.org/D30810#710069, @jaykang10 wrote:

> > I believe the argument lacks numbers (or at least you have them but didn't mention). I didn't hear about performance results, or validation that this was actually tested for correctness. Small test cases prove a point, but can't be considered general.
> > 
> > OTOH, it seems like this is exactly why you want the flag, to hide any potential issues and play with it. I'm not opposed to adding the flag if there's a commitment to actually get the result and change the default to whatever seems to be better, does that seems reasonable?
>
> To be honest, I did not start this discussion to get better performance for GPU. But the goal has moved while we discussing it. I think I have showed you the gain from AMD GPU target on previous comments. On current version of clang/llvm, it generates one less load/store to preserve vec3 type instead of vec4 type on GPU target. But it could cause more load/store on another targets which have vector register because they usually expect aligned vector load/store. It means that if we want to change default  to vec3 on clang, we need to change llvm's codegen's default legalization or backend targets need to be modified in order to get aligned vector load/store with vec3 type. I think It is too big change at this moment. In the future, we could move this discussion to llvm-dev. Up to that time,  as you mentioned, I would like to hide the potential issues with vec3 type and play with it.


Just to summarize. There seems to be no issue in adding this feature by a special flag considering that changing default behaviors requires changing the backend that we might consider at some later point of time.  We can start a separate discussion on that.

The change can be used to generate more generic and optimal IR resulting in 1/4 less instructions generated in the final binary per each load and store of vec2 types in some architectures (e.g. AMD) . I am not sure what kind of further numbers are needed and if there is any measuring infrastructure LLVM project provides for that.

Considering all the above I don't feel there is any reason this can't be committed.


https://reviews.llvm.org/D30810





More information about the cfe-commits mailing list