[PATCH] D111460: [X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()`

Tue Oct 19 01:44:47 PDT 2021

lebedev.ri added a comment.

@RKSimon / @craig.topper thoughts on the diff?

In D111460#3072267 <https://reviews.llvm.org/D111460#3072267>, @pengfei wrote:

>>> I think the high cost might have practical consideration, e.g., impact some benchmarks etc.
>>
>> Sure, any change might have practical considerations, in either direction.
>
> I agree, but we should evaluate it carefully to make sure we get good than bad, right?

Or that it is at least a step in the right direction.

>>> IIUC, you are saying we prefer naive interleaving to scalarized gather, right?
>>
>> Quote please? Define we prefer?
>
> X86TargetTransformInfo.cpp:3853, "interleaved load is better in general in reality"

I mean, sure, wide load + shuffles should be better than many narrow scalar loads + chain of insertelement's.
Emphasis on *should*, this is not an ultimate truth.

>>> Can we make sure we always generate the interleaving sequence, e.g., when not have a constant stride?
>>
>> Why? I don't think we should be doing anything like that. It is up to the vectorizer
>> to pick the best sequence, given the cost estimates provided by TTI for each one.
>
> Only when the estimate is precise enough. Underestimating the cost of scalarized gather sequence will fool vectorizer in practice.
> IA optimization manual says gather/scatter is prefered, emulated gather/scatter cost shouldn’t be lower than real gather/scatter.

Yeah no, if this is what you believe we won't make progress here.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111460/new/

https://reviews.llvm.org/D111460