[all-commits] [llvm/llvm-project] 476bab: [AMDGPU] Introduce new ISel combine for trunc-slr ...

Thu Feb 3 09:06:56 PST 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 476babcc1dbc2f08bd2a10c1e5508662b4a77dc3
      https://github.com/llvm/llvm-project/commit/476babcc1dbc2f08bd2a10c1e5508662b4a77dc3
  Author: Thomas Symalla <thomas.symalla at amd.com>
  Date:   2022-02-03 (Thu, 03 Feb 2022)

  Changed paths:
    M llvm/lib/Target/AMDGPU/SIInstructions.td
    M llvm/test/CodeGen/AMDGPU/dagcombine-lshr-and-cmp.ll
    M llvm/test/CodeGen/AMDGPU/divergence-driven-trunc-to-i1.ll

  Log Message:
  -----------
  [AMDGPU] Introduce new ISel combine for trunc-slr patterns

In some cases, when selecting a (trunc (slr)) pattern, the slr gets translated
to a v_lshrrev_b3e2_e64 instruction whereas the truncation gets selected to
a sequence of v_and_b32_e64 and v_cmp_eq_u32_e64. In the final ISA, this appears
as selecting the nth-bit:

v_lshrrev_b32_e32 v0, 2, v1
v_and_b32_e32 v0, 1, v0
v_cmp_eq_u32_e32 vcc_lo, 1, v0

However, when the value used in the right shift is known at compilation time, the
whole sequence can be reduced to two VALUs when the constant operand in the v_and is adjusted to (1 << lshrrev_operand):

v_and_b32_e32 v0, (1 << 2), v1
v_cmp_ne_u32_e32 vcc_lo, 0, v0

In the example above, the following pseudo-code:

v0 = (v1 >> 2)
v0 = v0 & 1
vcc_lo = (v0 == 1)

would be translated to:

v0 = v1 & 0b100
vcc_lo = (v0 == 0b100)

which should yield an equivalent result.
This is a little bit hard to test as one needs to force the SelectionDAG to
contain the nodes before instruction selection, but the test sequence was
roughly derived from a production shader.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D118461