[llvm-dev] FMA canonicalization in IR

Mon Nov 21 01:01:45 PST 2016

On 20.11.2016 08:38, Hal Finkel via llvm-dev wrote:
> ----- Original Message -----
>> From: "Sanjay Patel" <spatel at rotateright.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
>> Sent: Saturday, November 19, 2016 10:40:27 PM
>> Subject: Re: [llvm-dev] FMA canonicalization in IR
>>
>>
>> The potential advantage I was considering would be more accurate cost
>> modeling in the vectorizer, inliner, etc. Like min/max, this is
>> another case where the sum of the IR parts is greater than the
>> actual cost.
>
> This is indeed a problem, but is a much larger problem than just FMAs (as you note). Our cost-modeling interfaces should be extended to handle instruction patterns -- I don't see any other way of solving this in general.
>
>>
>> Beyond that, it seems odd to me that we'd choose the longer IR
>> expression of something that could be represented in a minimal form.
>
> My fear is that, by forming the FMAs earlier than necessary, you'll just end up limiting opportunities for CSE, reassociation, etc. without any corresponding benefit.
>
>> I know we make practical concessions in IR based on backend
>> deficiencies, but in this case I think the fix would be easy - if
>> we're in contract=fast mode, just split all of these intrinsics at
>> DAG creation time and let the DAG or other passes behave exactly
>> like they do today to fuse them back together again?
>
> This is a good point; we could do this in fp-contract=fast mode.

I think there's a good reason to do this at the IR level already when 
the appropriate flags are set, see the example that I also sent in 
another mail:

   ((X * Y) * X) + Z

is transformed to

   ((X * X) * Y) + Z

when associative transforms are allowed, but when the original is built as

   fmuladd(X * Y, X, Z)

this optimization may be missed (I didn't actually check what happens 
today).

Nicolai