[PATCH] D106058: [DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z))

Artem Belevich via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 15 11:19:57 PDT 2021


tra added inline comments.


================
Comment at: llvm/test/CodeGen/NVPTX/fast-math.ll:150
 ; CHECK: selp.f32
+; CHECK: div.rn.f32
   %x = fdiv arcp float %a, %divisor
----------------
lebedev.ri wrote:
> RKSimon wrote:
> > tra wrote:
> > > RKSimon wrote:
> > > > lebedev.ri wrote:
> > > > > tra wrote:
> > > > > > tra wrote:
> > > > > > > This looks like it may be a performance regression. Judging by the name of the tests, we do expect to see reciprocal and multiplies, not `div`.
> > > > > > > 
> > > > > > Never mind. These changes are not in the final version of the patch.
> > > > > Then you were already missing some kind of an inverse transform,
> > > > > because middle-end would have already done this fold:
> > > > > https://godbolt.org/z/cb38zEj9Y
> > > > I made a kuldge fix in rG3cc38703d5ab05be0b01c31f829d19b47f183c5f so it still tests the combineRepeatedFPDivisors code path which appeared to be their purpose. But as @lebedev.ri said, we're unlikely to see the original test code patterns from real world code as InstCombine will have already broken the pattern.
> > > This version of the patch resulted in `div` showing up in the PTX. The godbolt example above looks good and still generates rcp/mul/mul, which is what we want.
> > > 
> > > If the generated PTX changes to use `div` once godbolt picks up these changes, that would be a regression. Changing the test just hides the problem. It might've been better to disable the test with a comment that the test is still correct, but that there may be an issue in NVPTX which needs to be fixed.
> > > 
> > > I'm not sure yet why/how your change affectes NVPTX. I'll check what's going on.
> > > 
> > > 
> > This shows the llc output before/after opt -O3 https://godbolt.org/z/dz4PGvjEf
> > The godbolt example above looks good and still generates rcp/mul/mul, which is what we want.
> 
> No it doesn't:
> https://godbolt.org/z/P43Y5E76E
Your original link does produce correct PTX without div for me. However, RKSimon's example shows that there's indeed something weird going on that prevents lowering to multiplication by reciprocal.

To make it clear - I have no issue with this patch. It's apparently just exposed an issue in the NVPTX back-end. I'll deal with it.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106058/new/

https://reviews.llvm.org/D106058



More information about the llvm-commits mailing list