[PATCH] D153479: [NFC] Tests for future commit in DAGCombiner

Mon Jun 26 10:29:00 PDT 2023

kmitropoulou added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/combine_andor_with_cmps.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -verify-machineinstrs -stop-after=finalize-isel < %s | FileCheck %s
+
----------------
arsenm wrote:
> kmitropoulou wrote:
> > arsenm wrote:
> > > kmitropoulou wrote:
> > > > kmitropoulou wrote:
> > > > > kmitropoulou wrote:
> > > > > > arsenm wrote:
> > > > > > > kmitropoulou wrote:
> > > > > > > > arsenm wrote:
> > > > > > > > > Why use mir for this?
> > > > > > > > CSE changes my optimization. Therefore, I need to do the checking earlier. 
> > > > > > > > 
> > > > > > > > For example, the following test:
> > > > > > > > 
> > > > > > > > define i1 @test1(i32 %arg1, i32 %arg2) #0 {
> > > > > > > >   %cmp1 = icmp slt i32 %arg1, 1000
> > > > > > > >   %cmp2 = icmp slt i32 %arg2, 1000
> > > > > > > >   %or  = or i1 %cmp1, %cmp2
> > > > > > > >   ret i1 %or
> > > > > > > > }
> > > > > > > > 
> > > > > > > > will be optimized as follows with my optimization:
> > > > > > > > 
> > > > > > > >   bb.0 (%ir-block.0):
> > > > > > > >     liveins: $vgpr0, $vgpr1
> > > > > > > >     %1:vgpr_32 = COPY $vgpr1
> > > > > > > >     %0:vgpr_32 = COPY $vgpr0
> > > > > > > >     %2:vgpr_32 = V_MIN_I32_e64 %0, %1, implicit $exec
> > > > > > > >     %3:sreg_32 = S_MOV_B32 1000
> > > > > > > >     %4:sreg_32_xm0_xexec = V_CMP_LT_I32_e64 killed %2, killed %3, implicit $exec
> > > > > > > >     %5:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, killed %4, implicit $exec
> > > > > > > >     $vgpr0 = COPY %5
> > > > > > > >     SI_RETURN implicit $vgpr0
> > > > > > > > 
> > > > > > > > This is the output after the instruction selection. After CSE, the predicate of the compare instruction changes:
> > > > > > > > 
> > > > > > > > ; %bb.0:
> > > > > > > >         s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
> > > > > > > >         s_waitcnt_vscnt null, 0x0
> > > > > > > >         v_min_i32_e32 v0, v0, v1
> > > > > > > >         s_delay_alu instid0(VALU_DEP_1)
> > > > > > > >         v_cmp_gt_i32_e32 vcc_lo, 0x3e8, v0
> > > > > > > >         v_cndmask_b32_e64 v0, 0, 1, vcc_lo
> > > > > > > >         s_setpc_b64 s[30:31]
> > > > > > > > 
> > > > > > > I don't understand. I assume you mean MachineCSE? Is your patch not actually a DAG combine as the description states?
> > > > > > > 
> > > > > > > Can you stop somewhere after SIFixSGPRCopies instead? 
> > > > > > I am sorry I meant MachineCSE. The patch will upload implements is in DAGCombiner. 
> > > > > > The new checks are generated after amdgpu-isel .
> > > > > *The patch that I will upload implements the optimization in DAGCombiner.
> > > > I am sorry I did not understand your comment earlier :) I update the test.
> > > My original point still stands. Why can't you test the end ISA? In general optimization patches are better of testing end to end unless you specifically need to check some intermediate state
> > Lets' say we have the following test:
> > 
> > define i1 @test1(i32 %arg1, i32 %arg2) #0 {
> >   %cmp1 = icmp slt i32 %arg1, 1000
> >   %cmp2 = icmp slt i32 %arg2, 1000
> >   %or  = or i1 %cmp1, %cmp2
> >   ret i1 %or
> > }
> > 
> > The dump after SI Fix SGPR copies is:
> >   bb.0 (%ir-block.0):
> >     liveins: $vgpr0, $vgpr1
> >     %1:vgpr_32 = COPY $vgpr1
> >     %0:vgpr_32 = COPY $vgpr0
> >     %2:vgpr_32 = V_MIN_I32_e64 %0, %1, implicit $exec
> >     %3:sreg_32 = S_MOV_B32 1000
> >     %4:sreg_32_xm0_xexec = V_CMP_LT_I32_e64 killed %2, killed %3, implicit $exec
> >     %5:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, killed %4, implicit $exec
> >     $vgpr0 = COPY %5
> >     SI_RETURN implicit $vgpr0
> > 
> > The final output is:
> > 
> > %bb.0:
> >         s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
> >         s_waitcnt_vscnt null, 0x0
> >         v_min_i32_e32 v0, v0, v1
> >         s_delay_alu instid0(VALU_DEP_1)
> >         v_cmp_gt_i32_e32 vcc_lo, 0x3e8, v0
> >         v_cndmask_b32_e64 v0, 0, 1, vcc_lo
> >         s_setpc_b64 s[30:31]
> > 
> > So, it is easier to check the correctness of the optimization after SI Fix SGPR copies. 
> > 
> > If you do not like it, then I can change it.
> > 
> You don't actually care about the MIR here, so just go to ISA. You can disable the s_delay_salu insertion for gfx11 with the flag
Which flag?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153479/new/

https://reviews.llvm.org/D153479