[PATCH] D140135: AMDGPU: Try to unfold fneg source when matching legacy fmin/fmax
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 20 06:06:24 PST 2023
arsenm added inline comments.
Herald added a subscriber: StephenFan.
================
Comment at: llvm/test/CodeGen/AMDGPU/fneg-combines.new.ll:295-297
-; SI-SAFE-NEXT: v_mov_b32_e32 v1, s0
-; SI-SAFE-NEXT: v_cmp_ngt_f32_e64 vcc, s0, 0
-; SI-SAFE-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc
----------------
foad wrote:
> foad wrote:
> > arsenm wrote:
> > > foad wrote:
> > > > I think these three lines implement roughly max(s0, 0) so how can converting it to min(0, s0) be correct?
> > > It's ngt, not gt. It's more like !(v_cmp_le_f32)
> > Right, the code is doing `s0<=0?v0:v1` which is `s0<=0?-0:s0` which is max.
> Oh sorry I think I've misremembered how cndmask works.
Right, cndmask is backward from how you would expect
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D140135/new/
https://reviews.llvm.org/D140135
More information about the llvm-commits
mailing list