[llvm] r286171 - [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies

Mekhanoshin, Stanislav via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 9 23:15:44 PST 2016


Thank you. In this case v2 is used instead of v1 and do not contribute to export... I'm still puzzled what can go wrong with this code.

Stas

-----Original Message-----
From: Michel Dänzer [mailto:michel at daenzer.net] 
Sent: Wednesday, November 09, 2016 10:56 PM
To: Mekhanoshin, Stanislav
Cc: llvm-commits at lists.llvm.org; Nicolai Hähnle; Marek Olšák; Matt Arsenault; Tom Stellard
Subject: Re: [llvm] r286171 - [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies

On 10/11/16 03:36 PM, Mekhanoshin, Stanislav wrote:
> I would really appreciate if someone familiar with GLSL could explain the problem. Here is the only difference in code produced:
> 
> 303d302
> <       v_cndmask_b32_e64 v1, 0, -1, s[0:1]   ; D2000001 00018280
> 305,306c304,305
> <       v_cndmask_b32_e64 v3, 0, -1.0, s[0:1] ; D2000003 0001E680
> <       v_cmpx_le_f32_e32 vcc, 0, v3          ; 7C260680
> ---
>>       v_cndmask_b32_e64 v1, 0, -1.0, s[0:1] ; D2000001 0001E680
>>       v_cmpx_le_f32_e32 vcc, 0, v1          ; 7C260280
> 310c309
> <       v_sub_i32_e32 v3, vcc, 0, v0          ; 4C060080
> ---
>>       v_sub_i32_e32 v1, vcc, 0, v0          ; 4C020080
> 312,314c311,312
> <       v_add_i32_e32 v3, vcc, v3, v0         ; 4A060103
> <       v_cmp_ne_u32_e64 s[0:1], 0, v1        ; D18A0000 00020280
> <       v_cmp_gt_i32_e32 vcc, 10, v3          ; 7D08068A
> ---
>>       v_add_i32_e32 v1, vcc, v1, v0         ; 4A020101
>>       v_cmp_gt_i32_e32 vcc, 10, v1          ; 7D08028A
> 
> So, what happens is this instruction is removed:
> 
> v_cndmask_b32_e64 v1, 0, -1, s[0:1]
> 
> it copies s[0:1] into v1 for the lane.
> 
> Then the instruction which restores s[0:1] from v1 is also removed:
> 
> v_cmp_ne_u32_e64 s[0:1], 0, v1
> 
> Neither s0 nor s1 are written in between and anywhere after this point. Since v1 is now free other modified instructions use v1 instead of v3, which again does not seem to be an issue for me.
> The only difference I can see is the contents of v1 and v3 upon kernel termination in case if discard is called... Is there anything is GLSL ABI which requires v1 and v3 to hold specific values on exit? My question comes from this epilogue on non-discard return:
> 
>         v_mov_b32_e32 v0, 0
>         v_mov_b32_e32 v1, 0
>         v_mov_b32_e32 v3, 0
>         v_mov_b32_e32 v13, v15
>         ; return
> 
> Then this piece of code does not call s_endpgm as well. I also do not see branch target BB0_2 used in the generated code and generally have a suspicion this is just a part of a bigger kernel (based on the absence of s_endpgm at the end). I.e. there can be potentially a problem if this code is just inserted somewhere in a bigger context, not visible to the compiler.

The dumps were from radeonsi's default mode using separate shader prologues and epilogues. Attached to this e-mail are the corresponding dumps using monolithic shaders. As you can see, v1 contributes to the colour value exported from the pixel shader.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer


More information about the llvm-commits mailing list