[llvm-dev] getScalarizationOverhead()

Sun Jan 22 08:07:27 PST 2017

> On 20 Jan 2017, at 14:53, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> 
> On 01/20/2017 08:30 AM, Jonas Paulsson wrote:
>> 
>> 
>> On 2017-01-20 14:31, Hal Finkel wrote:
>>> 
>>> On 01/20/2017 06:11 AM, Jonas Paulsson via llvm-dev wrote:
>>>> Hi,
>>>> 
>>>> I wonder why getScalarizationOverhead() does not take into account the number of operands of the instruction? This should influence the number of extracts needed, so instead of
>>>> 
>>>> Scalarization cost = NumEls * (insert + extract)
>>>> 
>>>> it would be better to do
>>>> 
>>>> Scalarization cost = NumEls * (insert + (extract * numOperands))
>>> 
>>> I suspect this is an oversight (although we need to be a bit careful here because if two operands are the same, which is not uncommon, we don't want to double the cost).
>>> 
>>> -Hal
>> 
>> Do you in those cases of an identical operand want to count just a cost of "1" for a register move, instead of the "extraction cost"?
> 
> There should be no cost to reusing the operand. (mul a, a) should only extract a once, the fact that it is used twice should not increase the cost.
> 
> -Hal

There appears to be a similar issue within the x86 AVX1 cost tables for cases where we have to split the 256-bit integer operations. Some binops add 1*extract_subvector + 1*insert_subvector to the 2*128-binop costs whilst others don’t bother adding anything at all. We need to try harder to determine if we should add 1 (duplicate input or constant folded extract) or 2 extracts to the final cost.