[clang-tools-extra] [lld] [llvm] [compiler-rt] [clang] [libc] [libcxx] [flang] [lldb] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

Joe Nash via cfe-commits cfe-commits at lists.llvm.org
Mon Jan 22 12:24:16 PST 2024


Mirko =?utf-8?q?Brkušanin?= <Mirko.Brkusanin at amd.com>,
Mirko =?utf-8?q?Brkušanin?= <Mirko.Brkusanin at amd.com>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/78414 at github.com>


================
@@ -305,6 +305,11 @@ class VOP3OpSel_gfx10<bits<10> op, VOPProfile p> : VOP3e_gfx10<op, p> {
 
 class VOP3OpSel_gfx11_gfx12<bits<10> op, VOPProfile p> : VOP3OpSel_gfx10<op, p>;
 
+class VOP3FP8OpSel_gfx11_gfx12<bits<10> op, VOPProfile p> : VOP3e_gfx10<op, p> {
+  let Inst{11} = !if(p.HasSrc0, src0_modifiers{2}, 0);
+  let Inst{12} = !if(p.HasSrc0, src0_modifiers{3}, 0);
----------------
Sisyph wrote:

 A couple points related to this. 
- I don't think the rules for forming op_sel with dpp are currently being checked correctly. In GCNDPPCombine.cpp:369, we check the named op_sel operand and op_sel_hi operand. We use src_modifier operands through most of the compiler, and typically don't (if ever?) copy the bits to the named op_sel operands. This should be fixed regardless of this patch.
- The rules we should enforce for dpp with op_sel is that for the alu to be combined, op_sel must be 0 and op_sel_hi must be 0b111
- My conclusion is that it is only safe to form the dpp alu with these instructions if op_sel bits are all zero
- We are emitting those alu based on this patch that probably shouldn't be allowed ( e.g. v_cvt_f32_fp8_e64_dpp v0, v0 op_sel:[1,1] quad_perm:[0,1,2,3] row_mask:0xf bank_mask:0xf bound_ctrl:1)
- The cleanup Ivan suggested using the dst_op_sel bit of src0 (equivalent to the op_sel_hi bit of src0) would require a special case in GCNDppCombine to check the correct bit. Otherwise, we might allow an alu to be formed if op_sel:[0,1] 


https://github.com/llvm/llvm-project/pull/78414


More information about the cfe-commits mailing list