Global Values Vectorization - Take 3

Arnold Schwaighofer aschwaighofer at apple.com
Mon Feb 18 15:01:43 PST 2013


On Feb 18, 2013, at 12:37 PM, Renato Golin <renato.golin at linaro.org> wrote:

> Hi Arnold, Nadav, Hal,
> 
> This patch works with the tests attached (the examples I had before), and it just vectorize the most conservative one, for now. Let me know if you can think of other examples, so I can test and include in my list of tests.
> 
> Considerations:
> 
> 1. I'm using std::multimap, which Hal found to be not as efficient as DenseMap<SmallVector<>>. I might start using them, as soon as I'm sure this algorithm makes sense.
> 
> 2. I'm storing the load/store instruction with both ReadWrites and WriteObjects. This is ineffective, but was a way to make sure I got the right stores when dealing with the right underlying objects (since I'll be iterating through them all). I'll re-think about this relationship, and try to find the most cost-effective (space-wise) solution.

Okay.

> 
> 3. I'm getting max vector register width for the vectorized access size, which might not be the best thing, but since the cost model is the one that calculates the vectorization factor, and we only create it *after* the legalization has finished, I didn't want to wrap initializaitons around, nor to duplicate code. Ideas welcome.
> 
> 4. BasicAA seems to be getting it right, when I pass the strides and access sizes, and it's not being too optimistic.
> 

+  // Biggest vector register, for Alias Analys unidentified is on potential vectorized code
+  unsigned MaxByteWidth = TTI->getRegisterBitWidth(true) / 8;

You need to take the unroll factor into account, too. The loop vectorizer "unrolls" the loop to better use ILP. It does so (in contrast with the loop unroller) by unrolling individual instructions "in place":

a[i] =
  = b[i]

becomes

a[i:0-3] =
a[i:4-7] =
  = b[i:0-3]
  = b[i:4-7]

Therefore, you also need to take the max unroll factor into account.

+  unsigned MaxByteWidth = (TTI->getRegisterBitWidth(true) / 8)*TTI->getMaximumUnrollFactor();

> Please, be aware that this patch is still not good for merge, I'm just looking for a confirmation that it does make sense.
> 

It makes sense.

> I don't want to add any RT checks for this first patch, but we'll need something like that for later...
> 

We already add runtime checks for unidentified objects (you probably know that - just making sure). Do you plan to add more? Are you thinking of bounds within underlying objects?

> Thanks!
> --renato
> <global_vectorize.patch>




More information about the llvm-commits mailing list