[PATCH] D134418: [AMDGPU] Improve ISel for v_bfi instructions.

Tue Oct 4 07:17:59 PDT 2022

tsymalla added a comment.

In D134418#3833436 <https://reviews.llvm.org/D134418#3833436>, @foad wrote:

> I don't immediately see how shifts are relevant.
>
> For the basic case of nested bitfield inserts, perhaps you could create tests for the cases you want to handle. For example, IR equivalents of:
>
> 1. (x & y) | (~x & z) // single insert
> 2. (x & y | (~x & ((u & v) | (~u & z))) // nested insert
> 3. (x & ((u & v) | (~u & y))) | (~x & z) // nested insert
>
> For the nested inserts we might want separate test cases depending on whether the "select" arguments x and u are known to be disjoint or not. E.g. 0x0F and 0xF0 are disjoint, 0xFF0 and 0x0FF overlap, and for non-constant values we don't know whether they overlap or not.

The right and determine way to do this would be to transform the second xor, and, xor sequence into a and, and, or sequence (just like the inner one), so it gets picked up by Isel as well without writing any special pattern matching.
However, InstCombine does not handle such cases and only converts the inner sequence into a sequence so that it can be matched to a BFI.
It is correct that the shifts don't really relate to the pattern, they are used here to match such cases. See for example:

  %24 = shl i32 %23, 10
  %25 = xor i32 %24, %21
  %26 = and i32 %25, 1047552
  %27 = xor i32 %26, %21
  %28 = select i1 false, i32 %23, i32 %27
  %param.0.vec.extract = extractelement <3 x float> %19, i64 0
  %29 = fmul reassoc nnan nsz arcp contract afn float %param.0.vec.extract, 1.023000e+03
  %30 = fptoui float %29 to i32
  %31 = shl i32 %30, 20
  %32 = xor i32 %31, %28
  %33 = and i32 %32, 1072693248
  %34 = xor i32 %33, %28

This gets transformed into:

  %15 = shl i32 %14, 10
    %16 = and i32 %15, 1047552
    %17 = and i32 %12, -1047553
    %18 = or i32 %16, %17
    %19 = fmul reassoc nnan nsz arcp contract afn float %8, 1.023000e+03
    %20 = fptoui float %19 to i32
    %21 = shl i32 %20, 20
    %22 = xor i32 %21, %12
    %23 = and i32 %22, 1072693248
    %24 = xor i32 %23, %18

If I see that correctly, this is implemented in `InstCombineAndOrXor::visitMaskedMerge`.
I don't know if such code sequence ever appears in other places, so I went with the route of implementing it in ISel.

I agree with creating all those tests.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134418/new/

https://reviews.llvm.org/D134418