[LLVMdev] Vectorizer using Instruction, not opcodes

Mon Feb 4 10:25:06 PST 2013

Hi all,

My take on this is that, as you state below, at the IR level we are only roughly estimating cost, at best (or we would have to lower the code and then estimate cost - something we don't want to do).

I would propose for estimating the "worst case costs" and see how far we get with this. My rational here is that we don't want vectorization to decrease performance relative to scalar code. If we miss an opportunity this is bad, but not as bad as degrading relative to scalar code.

For cases where this approach breaks really badly we could consider adding a specialized api or parameters (like the type of a user/use). But we should do so only as a last resort and backed by actual code that would benefit from doing so.

- Arnold

On Feb 4, 2013, at 11:49 AM, Renato Golin <renato.golin at linaro.org> wrote:

> C. Settle for guess in the dark
> 
> We can continue guessing in the dark, but do it consistently. This is our approach, and I'm, wondering if it makes sense to deviate from it.
> 
> My main issue is that the costs for ARM instructions are almost random when seen from an IR perspective, we're bound to make as many mistakes as not.
> 
> Another issue is that the number of vector IR instructions (extract/select) is a lot more than the resulting vector code, and that can count negatively on the benefits of vectorizing.

Do you have an example where this happening? If the cost of extracts/inserts (could maybe shuffle instructions be used to express a series of extracts/inserts) is small it should be mostly neutralized by dividing by the vector length and amortized over other vectorized instructions in the block?