[llvm] [AMDGPU] Use V_FMAC_F64 in "if (cond) a -= c" (PR #168710)

Jay Foad via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 9 01:09:14 PST 2026


jayfoad wrote:

> Only work on pre-gfx12 as gfx12+ has dual-issued `v_cndmask`.

I don't understand this part. You mean VOPD? Firstly that only works in wave32 and secondly, surely it is still better to remove one cndmask, so then the other one can be dual-issued with some other (unrelated) instruction?

> In assembly the transformation looks like:
> 
> ```
>  v_cndmask_b32_e64 vCondValue.lo, 0, vValue.lo, vCondReg
>  v_cndmask_b32_e64 vCondValue.hi, 0, vValue.hi, vCondReg
>  v_add_f64_e64 vDst[0:1], vAccum[0:1], -vCondValue[0:1]
> ```
> 
> to:
> 
> ```
>  v_mov_b32_e32 vNegOneHi, 0xbff00000   ; -1.0 high bits 
>  v_mov_b32_e32 vMul.lo, 0                             
>  v_cndmask_b32_e64 vMul.hi, 0, vNegOneHi, vCondReg  
>  v_fmac_f64_e32 vDst[0:1], vMul[0:1], vValue[0:1], vAccum[0:1] ; vAccum is tied-to vDst
> ```
> 
> Since the resulting pattern has one more instruction and requires 7 VGPRs instead of 6

But it's not really 7 VGPRs because vNegOneHi and vMul.hi don't overlap, right?

Anyway on GFX10+ those problems go away because v_cndmask can take a literal:
```
 v_mov_b32_e32 vMul.lo, 0                             
 v_cndmask_b32_e64 vMul.hi, 0, 0xbff00000, vCondReg  
 v_fmac_f64_e32 vDst[0:1], vMul[0:1], vValue[0:1], vAccum[0:1] ; vAccum is tied-to vDst
```

https://github.com/llvm/llvm-project/pull/168710


More information about the llvm-commits mailing list