[PATCH] D32422: LoopVectorizer: let target prefer scalar addressing computations (+ minor improvements in SystemZTTI)

Wed May 17 07:32:57 PDT 2017

jonpa added a comment.

In https://reviews.llvm.org/D32422#757327, @rengolin wrote:

> Hi Jonas,
>
> I understand your problem and the SystemZ part is probably fine (I can't review that myself), but I fear introducing such a call-back will not help the underlying cause.
>
> What we really need to to know if the shuffle costs will be higher than the savings, and that should be done by asking the shuffle costs directly.
>
> I'd assume that targets without scatter/gather support would return higher costs for those operations (probably a magnitude higher), so maybe there's a problem in the load/store cost analysis that could considerably simplify this.
>
> cheers,
> --renato

I don't quite follow how this has to do with vector shuffles...? On SystemZ, all addresses residing in vectors must be extracted before use. (There are slow vector element gather/scatter, which are not used by the backend, so they are irrelevant).

I see your point that there is a theoretical possibility that if an address is computed in vectors, and that sequence of computations is long enough, this still might be cheaper. In practice, it however seems to be an impossible feat at the moment, because of the fact that LSR is not improving the vectorized addressing. So even if we computed the cost of vector instructions + extracts, and compared it to scalar instructions, it would be inaccurate.

================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:394
+  /// Return true if target doesn't mind addresses in vectors.
+  bool prefersVectorizedAddressing() const;
+
----------------
rengolin wrote:
> Can't you check for scatter/gather support directly?
I didn't want to use isLegalMaskedScatter() / isLegalMaskedGather(), because "masked" has nothing to do with this.
I guess I could instead of this new hook check if getGatherScatterOpCost() returns INT_MAX.

I am not sure however if it will always be true that targets want to keep it this simple. Couldn't it be that a target with such support actually wants scalarized addressing for scalarized/vectorized memory accesses, while still doing gather/scatter whenever possible?

https://reviews.llvm.org/D32422