[PATCH] D75982: [DAGCombine] Respect the uses when combine FMA for a*b+/-c*d

qshanz via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 12 21:54:10 PDT 2020


steven.zhang marked an inline comment as done.
steven.zhang added inline comments.


================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:11924
+  if (isContractableFMUL(N0) && isContractableFMUL(N1) &&
+      (N0.getNode()->use_size() > N1.getNode()->use_size())) {
+    // fold (fsub x, (fmul y, z)) -> (fma (fneg y), z, x)
----------------
spatel wrote:
> It's not clear to me what we want to do in the case where both of the intermediate operands have other uses, and that difference is not visible in the existing tests. Would it be better to simplify this check to:
>   if (N1.hasOneUse())
>     // fold (fsub (fmul a, b), (fmul c, d)) -> (fma (fneg c), d, (fmul a, b))
>   else
>     // fold (fsub (fmul a, b), (fmul c, d)) -> (fma a, b, (fneg (fmul c, d)))
> 
> We should probably add tests like this either way:
> 
> ```
> define double @fma_multi_uses1(double %a, double %b, double %c, double %d, double* %p1, double* %p2, double* %p3) {
>   %ab = fmul fast double %a, %b
>   %cd = fmul fast double %c, %d
>   store double %ab, double* %p1 ; extra use of %ab
>   store double %ab, double* %p2 ; another extra use of %ab
>   store double %cd, double* %p3 ; extra use of %cd
>   %r = fsub fast double %ab, %cd
>   ret double %r
> }
> 
> define double @fma_multi_uses2(double %a, double %b, double %c, double %d, double* %p1, double* %p2, double* %p3) {
>   %ab = fmul fast double %a, %b
>   %cd = fmul fast double %c, %d
>   store double %ab, double* %p1 ; extra use of %ab
>   store double %cd, double* %p2 ; extra use of %cd
>   store double %cd, double* %p3 ; another extra use of %cd 
>   %r = fsub fast double %ab, %cd
>   ret double %r
> }
> 
> ```
Good point but the answer is NO. I think, the direction that reduce the use of fmul that has less uses make sense, as it will expose other opportunities while its uses reaches one or zero quicker than the other one. For example:
```
x = a*b - c*d;
y = e*f - c*d;
// there are more than one uses of "a*b" later.
```
We have 3 uses of "a*b" at least, and 2 uses of "c*d".  We should prefer to fold "c*d" for "x" and "y" so that, its def could be removed. That is the reason why we want to fold less uses instead of one use. Does it make sense ?

I have added the tests for your cases and the one I give above.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75982/new/

https://reviews.llvm.org/D75982





More information about the llvm-commits mailing list