[llvm] r286171 - [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies

Wed Nov 9 22:36:05 PST 2016

I would really appreciate if someone familiar with GLSL could explain the problem. Here is the only difference in code produced:

303d302
<       v_cndmask_b32_e64 v1, 0, -1, s[0:1]   ; D2000001 00018280
305,306c304,305
<       v_cndmask_b32_e64 v3, 0, -1.0, s[0:1] ; D2000003 0001E680
<       v_cmpx_le_f32_e32 vcc, 0, v3          ; 7C260680
---
>       v_cndmask_b32_e64 v1, 0, -1.0, s[0:1] ; D2000001 0001E680
>       v_cmpx_le_f32_e32 vcc, 0, v1          ; 7C260280
310c309
<       v_sub_i32_e32 v3, vcc, 0, v0          ; 4C060080
---
>       v_sub_i32_e32 v1, vcc, 0, v0          ; 4C020080
312,314c311,312
<       v_add_i32_e32 v3, vcc, v3, v0         ; 4A060103
<       v_cmp_ne_u32_e64 s[0:1], 0, v1        ; D18A0000 00020280
<       v_cmp_gt_i32_e32 vcc, 10, v3          ; 7D08068A
---
>       v_add_i32_e32 v1, vcc, v1, v0         ; 4A020101
>       v_cmp_gt_i32_e32 vcc, 10, v1          ; 7D08028A

So, what happens is this instruction is removed:

v_cndmask_b32_e64 v1, 0, -1, s[0:1]

it copies s[0:1] into v1 for the lane.

Then the instruction which restores s[0:1] from v1 is also removed:

v_cmp_ne_u32_e64 s[0:1], 0, v1

Neither s0 nor s1 are written in between and anywhere after this point. Since v1 is now free other modified instructions use v1 instead of v3, which again does not seem to be an issue for me.
The only difference I can see is the contents of v1 and v3 upon kernel termination in case if discard is called... Is there anything is GLSL ABI which requires v1 and v3 to hold specific values on exit? My question comes from this epilogue on non-discard return:

        v_mov_b32_e32 v0, 0
        v_mov_b32_e32 v1, 0
        v_mov_b32_e32 v3, 0
        v_mov_b32_e32 v13, v15
        ; return

Then this piece of code does not call s_endpgm as well. I also do not see branch target BB0_2 used in the generated code and generally have a suspicion this is just a part of a bigger kernel (based on the absence of s_endpgm at the end). I.e. there can be potentially a problem if this code is just inserted somewhere in a bigger context, not visible to the compiler.

Stas

-----Original Message-----
From: Michel Dänzer [mailto:michel at daenzer.net] 
Sent: Wednesday, November 09, 2016 7:15 PM
To: Mekhanoshin, Stanislav
Cc: llvm-commits at lists.llvm.org; Nicolai Hähnle; Marek Olšák; Matt Arsenault; Tom Stellard
Subject: Re: [llvm] r286171 - [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies

[ Adding more AMD Mesa/LLVM developers ]

On 10/11/16 06:47 AM, Mekhanoshin, Stanislav wrote:
> I cannot see the problem with the code generated. In fact it is now 2 
> instructions less and seems to be equivalent to the old one on behavior.
> What does this test check? Presence of a missing instruction, exact 
> match of the produced ISA?

It tests the behaviour of the generated code. Here's the corresponding GLSL source code and description of what it's testing for:

/* This shader will discard one pixel coordinate, and do an infinite
 * loop on another pixel.  We set the two coordinates to the same, to
 * test whether discard on a channel avoids execution on that channel.
 */
static const char *fs_source =
        "#version 130\n"
        "uniform ivec2 coord1, coord2;\n"
        "void main()\n"
        "{\n"
        "       ivec2 fc = ivec2(gl_FragCoord);\n"
        "       int inc = abs(fc.x - coord2.x) + abs(fc.y - coord2.y);\n"
        "\n"
        "       if (fc == coord1)\n"
        "               discard;\n"
        "\n"
        "       gl_FragColor = vec4(0);\n"
        "       for (int i = 0; i < 10; i += inc)\n"
        "               gl_FragColor.b += 0.1;\n"
        "}\n";

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer