[llvm-dev] FMA canonicalization in IR
Nicolai Hähnle via llvm-dev
llvm-dev at lists.llvm.org
Mon Nov 21 01:01:45 PST 2016
On 20.11.2016 08:38, Hal Finkel via llvm-dev wrote:
> ----- Original Message -----
>> From: "Sanjay Patel" <spatel at rotateright.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
>> Sent: Saturday, November 19, 2016 10:40:27 PM
>> Subject: Re: [llvm-dev] FMA canonicalization in IR
>> The potential advantage I was considering would be more accurate cost
>> modeling in the vectorizer, inliner, etc. Like min/max, this is
>> another case where the sum of the IR parts is greater than the
>> actual cost.
> This is indeed a problem, but is a much larger problem than just FMAs (as you note). Our cost-modeling interfaces should be extended to handle instruction patterns -- I don't see any other way of solving this in general.
>> Beyond that, it seems odd to me that we'd choose the longer IR
>> expression of something that could be represented in a minimal form.
> My fear is that, by forming the FMAs earlier than necessary, you'll just end up limiting opportunities for CSE, reassociation, etc. without any corresponding benefit.
>> I know we make practical concessions in IR based on backend
>> deficiencies, but in this case I think the fix would be easy - if
>> we're in contract=fast mode, just split all of these intrinsics at
>> DAG creation time and let the DAG or other passes behave exactly
>> like they do today to fuse them back together again?
> This is a good point; we could do this in fp-contract=fast mode.
I think there's a good reason to do this at the IR level already when
the appropriate flags are set, see the example that I also sent in
((X * Y) * X) + Z
is transformed to
((X * X) * Y) + Z
when associative transforms are allowed, but when the original is built as
fmuladd(X * Y, X, Z)
this optimization may be missed (I didn't actually check what happens
More information about the llvm-dev