[PATCH] D132872: [SLP] Account for loss due to FMAs when estimating fmul costs.

Tue Aug 30 07:19:29 PDT 2022

ABataev added a comment.

In D132872#3758351 <https://reviews.llvm.org/D132872#3758351>, @fhahn wrote:

> Add missing checks for reassoc FMF flag, compare fmuladd cost with FMUL cost to see if fmuladd is at least as cheap as FMUL.
>
> In D132872#3756279 <https://reviews.llvm.org/D132872#3756279>, @ABataev wrote:
>
>> In D132872#3756244 <https://reviews.llvm.org/D132872#3756244>, @fhahn wrote:
>>
>>> In D132872#3756205 <https://reviews.llvm.org/D132872#3756205>, @ABataev wrote:
>>>
>>>> Similar question to D125987 <https://reviews.llvm.org/D125987>, why we you cannot put related cost changes to getArithmeticInstrCost?
>>>
>>> I think one reason is that TTI usually avoids to rely on the contex instruction/use-list scans. Not saying this is a blocker though if we converge on this as solution.
>>
>> Not quite true, since it accepts operands as an argument.
>
> Yes it accepts it, but AFAICT one of the only user  of getArithmeticInstrCost is the ARMTargetTransformInfo AFAICT, so I'd argue that using it at the momeent is quite uncommon.

Anyway, you introducing another level of the dependency when we already have one. It makes the codebase complex and makes it harder to maintain.

>>> The other issue is that we can't estimate the fact that the widened FMUL will also be applicable for fusion because no context instruction to pass exists yet.
>>
>> Do not see a problem here.
>
> We need a way to estimate if the vector fmul will be fusable with its users, otherwise we may incorrectly overstate the benefit of scalar FMAs.
>
> For that we would need to check if all entries in the bundle are fusable with their users. But when we call getArithmeticInstrCost for with the vector type, we can only pass a single context instruction, which is one of the scalar instructions of the bundle. So I don't think getArithmeticInstrCost can really figure out whether the vector FMUL will be free or not.

It can be fixed by adding an extra parameter, extending existing function with array of instruction instead of single instruction, flags, etc.

> I think if we would want to sink this into TTI, it would need to be integerated into getArithmeticInstrCost, rather than adding a very specialized interface that would only be used by the SLP vectorizer.
>
>>> I think it would be good to role out a solution that first fixes the issue in SLP first and then possible move it to TTI once it has proven itself. This is an issue that mostly impacts the SLP vectorizer AFAICT.
>>
>> Not sure this a good decision.
>
> Could you elaborate your concerns with making the decision in SLP to start with? Even with the current implementation in SLP, the target specific information (availability and cost of FMAs) is accessed through TTI.

I think after this it will live here forever, nobody will try to rework it. It is the same problem as before - the patch does not try to address global problem, instead it introduces an immediate hack.

> I think it would be good to first converge on making the correct decision to address the regressions on both X86 and AArch64 and make sure it is actually the right decision. Then decide if and where to generalize the decision.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6467-6469
+          NumScalarFusable +=
+              TTI->getIntrinsicInstrCost(ICA, TTI::TCK_RecipThroughput) <=
+              ScalarEltCost;
----------------
Still it looks like a hack.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132872/new/

https://reviews.llvm.org/D132872