[PATCH] D32422: LoopVectorizer: let target prefer scalar addressing computations (+ minor improvements in SystemZTTI)

Wed May 17 08:33:09 PDT 2017

rengolin added a comment.

In https://reviews.llvm.org/D32422#757432, @jonpa wrote:

> > Right, I see what you mean about the "address computation" to be forced into scalar registers. Does that mean that you have to use the same GR to load into all lanes of the vector, so supposedly load+increment+load+increment...?
>
> It doesn't have to be the same GR for all the lanes, but it must be a GR (since we don't use the slow "element gather/scatter")
>
> > Regardless, between iterations of the loop, you want to keep the addresses in the GRs throughout the execution of the loop. Is that correct?
>
> Yes, that's better than having to extract elements. The extracts are relatively expensive.

Ack.

> I am thinking that we are past the point of making the widening decision at the point where this patch starts to run in setCostBasedWideningDecision(). The only question then is what type of address computation the target wants.
> 
> 1. For a gather/scatter access, it is obviously in vector registers - this is something the patch shouldn't change.
> 2. For the scalarized/widened/interleaved access, I think it is a general question of preference which may vary from target to target. It seems far-fetched to handle this at instruction level, at least on SystemZ.

I think you have a good point, there. I may be over thinking it.

I'm ok with the patch as it is, but I'll let it simmer for a bit so that Ulrich, Hal, Elena and James can have a look. I'm happy if they are.

cheers,
--renato

https://reviews.llvm.org/D32422