[llvm] r286171 - [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies

Tom Stellard via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 10 13:53:40 PST 2016


On Thu, Nov 10, 2016 at 07:15:44AM +0000, Mekhanoshin, Stanislav wrote:
> Thank you. In this case v2 is used instead of v1 and do not contribute to export... I'm still puzzled what can go wrong with this code.
> 

Good asm is on the left and bad asm is on the right:

s_and_b64 s[0:1], s[0:1], vcc            s_and_b64 s[0:1],s[0:1], vcc
v_cndmask_b32_e64 v1, 0, -1, s[0:1]   <
v_mov_b32_e32 v2, 0                      v_mov_b32_e32 v2, 0
v_cndmask_b32_e64 v3, 0, -1.0, s[0:1] |  v_cndmask_b32_e64 v1, 0, -1.0, s[0:1]
v_cmpx_le_f32_e32 vcc, 0, v3          |  v_cmpx_le_f32_e32 vcc, 0, v1
s_cbranch_execnz BB0_2                   s_cbranch_execnz BB0_2
exp 0, 9, 0, 1, 1, v0, v0, v0, v0        exp 0, 9, 0, 1, 1, v0, v0, v0, v0
s_endpgm                                 s_endpgm
v_sub_i32_e32 v3, vcc, 0, v0          |  v_sub_i32_e32 v1, vcc, 0, v0
s_mov_b64 s[2:3], 0                      s_mov_b64 s[2:3], 0
v_add_i32_e32 v3, vcc, v3, v0         |  v_add_i32_e32 v1, vcc, v1, v0
v_cmp_ne_u32_e64 s[0:1], 0, v1        |  v_cmp_gt_i32_e32 vcc, 10, v1
v_cmp_gt_i32_e32 vcc, 10, v3          <


The value of s[0:1] at the end of this block is the difference between
the good and bad asm.  This is because the output of v_cmp instructions
is dependent on exec, so this sequence isn't doing a true copy of
s[0:1] because the value of exex is modified in the middle:

v_cndmask_b32_e64 v1, 0, -1, s[0:1]
...
v_cmpx_le_f32_e32 vcc, 0, v3
...
v_cmp_ne_u32_e64 s[0:1], 0, v1


I think the fix is to only do the forward propagation if exec isn't being
modified between between the def and use of the sgpr.

-Tom

> Stas
> 
> -----Original Message-----
> From: Michel Dänzer [mailto:michel at daenzer.net] 
> Sent: Wednesday, November 09, 2016 10:56 PM
> To: Mekhanoshin, Stanislav
> Cc: llvm-commits at lists.llvm.org; Nicolai Hähnle; Marek Olšák; Matt Arsenault; Tom Stellard
> Subject: Re: [llvm] r286171 - [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies
> 
> On 10/11/16 03:36 PM, Mekhanoshin, Stanislav wrote:
> > I would really appreciate if someone familiar with GLSL could explain the problem. Here is the only difference in code produced:
> > 
> > 303d302
> > <       v_cndmask_b32_e64 v1, 0, -1, s[0:1]   ; D2000001 00018280
> > 305,306c304,305
> > <       v_cndmask_b32_e64 v3, 0, -1.0, s[0:1] ; D2000003 0001E680
> > <       v_cmpx_le_f32_e32 vcc, 0, v3          ; 7C260680
> > ---
> >>       v_cndmask_b32_e64 v1, 0, -1.0, s[0:1] ; D2000001 0001E680
> >>       v_cmpx_le_f32_e32 vcc, 0, v1          ; 7C260280
> > 310c309
> > <       v_sub_i32_e32 v3, vcc, 0, v0          ; 4C060080
> > ---
> >>       v_sub_i32_e32 v1, vcc, 0, v0          ; 4C020080
> > 312,314c311,312
> > <       v_add_i32_e32 v3, vcc, v3, v0         ; 4A060103
> > <       v_cmp_ne_u32_e64 s[0:1], 0, v1        ; D18A0000 00020280
> > <       v_cmp_gt_i32_e32 vcc, 10, v3          ; 7D08068A
> > ---
> >>       v_add_i32_e32 v1, vcc, v1, v0         ; 4A020101
> >>       v_cmp_gt_i32_e32 vcc, 10, v1          ; 7D08028A
> > 
> > So, what happens is this instruction is removed:
> > 
> > v_cndmask_b32_e64 v1, 0, -1, s[0:1]
> > 
> > it copies s[0:1] into v1 for the lane.
> > 
> > Then the instruction which restores s[0:1] from v1 is also removed:
> > 
> > v_cmp_ne_u32_e64 s[0:1], 0, v1
> > 
> > Neither s0 nor s1 are written in between and anywhere after this point. Since v1 is now free other modified instructions use v1 instead of v3, which again does not seem to be an issue for me.
> > The only difference I can see is the contents of v1 and v3 upon kernel termination in case if discard is called... Is there anything is GLSL ABI which requires v1 and v3 to hold specific values on exit? My question comes from this epilogue on non-discard return:
> > 
> >         v_mov_b32_e32 v0, 0
> >         v_mov_b32_e32 v1, 0
> >         v_mov_b32_e32 v3, 0
> >         v_mov_b32_e32 v13, v15
> >         ; return
> > 
> > Then this piece of code does not call s_endpgm as well. I also do not see branch target BB0_2 used in the generated code and generally have a suspicion this is just a part of a bigger kernel (based on the absence of s_endpgm at the end). I.e. there can be potentially a problem if this code is just inserted somewhere in a bigger context, not visible to the compiler.
> 
> The dumps were from radeonsi's default mode using separate shader prologues and epilogues. Attached to this e-mail are the corresponding dumps using monolithic shaders. As you can see, v1 contributes to the colour value exported from the pixel shader.
> 
> 
> -- 
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer


More information about the llvm-commits mailing list