[PATCH] D22792: VecClone Pass

Thu Dec 21 17:20:03 PST 2017

hfinkel added a comment.

In https://reviews.llvm.org/D22792#962822, @mmasten wrote:

> Thanks for the comments, Hal.

No problem. Thanks for working on this!

> Just to clarify your point #2, I think what you're saying is that we should start from a common parameter representation; i.e., parameters should be loaded/stored through memory. Please correct me if I'm wrong. I certainly think this would be a great way to reduce the complexity of the algorithm. The remainder of items in your list should already be covered, but some tweaking may be involved.

For point #2, I'm saying that we should take all local stack allocations and make them wider by a factor of VL. Thinking about this as having VL simultaneously-running copies of the function, one per vector lane, each of those gets a separate "lane" of the local stack allocations. In point #5, I sketched how I'd handle parameters (I'm not exactly sure what you mean by common representation, as different kinds of parameters do require different handling (i.e., vector, uniform, scalar)). What is true is that, for unoptimized code, where the function arguments are generally stored in local stack allocations, all of those stores are now just inside the loop with everything else, so nothing special needs to happen. Does that make sense?

https://reviews.llvm.org/D22792