[PATCH] D32422: LoopVectorizer: let target prefer scalar addressing computations (+ minor improvements in SystemZTTI)

Wed May 17 07:50:09 PDT 2017

rengolin added reviewers: delena, jmolloy.
rengolin added subscribers: delena, jmolloy.
rengolin added a comment.

In https://reviews.llvm.org/D32422#757376, @jonpa wrote:

> I don't quite follow how this has to do with vector shuffles...? On SystemZ, all addresses residing in vectors must be extracted before use. (There are slow vector element gather/scatter, which are not used by the backend, so they are irrelevant).

Right, I see what you mean about the "address computation" to be forced into scalar registers. Does that mean that you have to use the same GR to load into all lanes of the vector, so supposedly load+increment+load+increment...?

Regardless, between iterations of the loop, you want to keep the addresses in the GRs throughout the execution of the loop. Is that correct?

> I see your point that there is a theoretical possibility that if an address is computed in vectors, and that sequence of computations is long enough, this still might be cheaper. In practice, it however seems to be an impossible feat at the moment, because of the fact that LSR is not improving the vectorized addressing. So even if we computed the cost of vector instructions + extracts, and compared it to scalar instructions, it would be inaccurate.

Right, so this is not the cost of extract/insert, but the cost of indirect access, which is not the same as the masked scatter/gather that we currently have, I agree.

I'm adding Elena and James to see if they have some more ideas.

> I guess I could instead of this new hook check if getGatherScatterOpCost() returns INT_MAX.

That's what I was thinking...

> I am not sure however if it will always be true that targets want to keep it this simple. Couldn't it be that a target with such support actually wants scalarized addressing for scalarized/vectorized memory accesses, while still doing gather/scatter whenever possible?

There are two issues to discern here:

1. If the target has any kind of scatter/gather support.
2. What is the cost of doing so in a particular case.

The first problem could be solved by having a sub-target feature or similar. The second may need to inspect which instruction we're talking about, etc, which goes inline with some changes to the cost functions we've seen recently.

Both solutions could be applied on the same function, for example:

  getGatherScatterOpCost(Inst *I) {
    int Cost = 1;
    if (!Target->hasScatterGatther())
      return INT_MAX;
    // Uses inst to make sure this it what I think it is...
    ...
    return Cost;
  }

Does that make sense? @delena @jmolloy do we even need this level of complexity?

cheers,
--renato

https://reviews.llvm.org/D32422