[PATCH] D32422: LoopVectorizer: let target prefer scalar addressing computations (+ minor improvements in SystemZTTI)

Wed May 17 08:13:25 PDT 2017

jonpa added a comment.

> Right, I see what you mean about the "address computation" to be forced into scalar registers. Does that mean that you have to use the same GR to load into all lanes of the vector, so supposedly load+increment+load+increment...?

It doesn't have to be the same GR for all the lanes, but it must be a GR (since we don't use the slow "element gather/scatter")

> Regardless, between iterations of the loop, you want to keep the addresses in the GRs throughout the execution of the loop. Is that correct?

Yes, that's better than having to extract elements. The extracts are relatively expensive.

> Does that make sense?

I am thinking that we are past the point of making the widening decision at the point where this patch starts to run in setCostBasedWideningDecision(). The only question then is what type of address computation the target wants.

1. For a gather/scatter access, it is obviously in vector registers - this is something the patch shouldn't change.
2. For the scalarized/widened/interleaved access, I think it is a general question of preference which may vary from target to target. It seems far-fetched to handle this at instruction level, at least on SystemZ.

BTW, If we indeed want to let target with gather/scatter support have a say about 2), there should also be a check there so that gather/scatter accesses aren't included in AddrDefs, since they shouldn't be scalarized.

https://reviews.llvm.org/D32422