[llvm] WIP: [AMDGPU] Use s_cselect_b32 for uniform select of f32 values (PR #111688)

via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 24 12:01:06 PDT 2024


================
@@ -9,7 +9,9 @@ define amdgpu_ps float @xor3_i1_const(float inreg %arg1, i32 inreg %arg2) {
 ; GCN-NEXT:    v_cmp_lt_f32_e64 s[2:3], s0, 0
 ; GCN-NEXT:    v_cmp_lt_f32_e32 vcc, s0, v0
 ; GCN-NEXT:    s_and_b64 s[0:1], s[2:3], vcc
-; GCN-NEXT:    v_cndmask_b32_e64 v0, 1.0, 0, s[0:1]
+; GCN-NEXT:    s_and_b64 s[0:1], s[0:1], exec
+; GCN-NEXT:    s_cselect_b32 s0, 0, 1.0
+; GCN-NEXT:    v_mov_b32_e32 v0, s0
----------------
alex-t wrote:

> > What I am concerned about is that there might be other users of the SCC definition introduced by COPY.
> > In the general case, we would have to DFS-walk looking for a definition that kills the current one or other users to decide if it is possible to remove the copy and convert s_cselect_b32 to v_cndmask.
> 
> Don't you have the same problem with a regular VGPR to SGPR copy? Do you analyse all uses of the SGPR in the whole function? Or it is different because SCC is a physical register?

Yes, SCC is a physreg so we cannot rely on SSA looking for its uses. That is my concern. 

https://github.com/llvm/llvm-project/pull/111688


More information about the llvm-commits mailing list