[PATCH] D109963: [AArch64] Split bitmask immediate of bitwise AND operation

Mon Sep 20 01:12:52 PDT 2021

jaykang10 added a comment.

In D109963#3008656 <https://reviews.llvm.org/D109963#3008656>, @dmgreen wrote:

>> I have tried to implement it on ISelDAG level. There are already patterns to fold `and` node. After this transformation on ISelDAGToDAG, I was able to see the patterns failed to match. If we can guarantee the pattern for this transformation is matched after matching other patterns related to `and`, it could be ok to implement it on ISelDAG level.  Maybe, we could add `AddedComplexity` to the pattern for this transformation but I thought the CustomInserter is better than it because it guarantees all pattern matching is done.
>
> Custom Inserters are mostly needed for instructions that expand to multiple basic blocks. As far as I understand, this seems to be different selection from and(X, C), which should fit in fine as an AArch64 tblgen pattern or with Dag2Dag select. But the AArch64 backend can be a bit complex in places. Was there something getting in the way of that? If so do you know what?

For example, there are patterns as below.

  multiclass SIMDAcrossLanesUnsignedIntrinsic<string baseOpc,
                                              SDPatternOperator opNode>
      : SIMDAcrossLanesIntrinsic<baseOpc, opNode> {
  // If there is a masking operation keeping only what has been actually
  // generated, consume it.
  def : Pat<(i32 (and (i32 (vector_extract (insert_subvector undef,
              (opNode (v8i8 V64:$Rn)), (i64 0)), (i64 0))), maski8_or_more)),
        (i32 (EXTRACT_SUBREG
          (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
            (!cast<Instruction>(!strconcat(baseOpc, "v8i8v")) V64:$Rn), bsub),
          ssub))>;
  def : Pat<(i32 (and (i32 (vector_extract (opNode (v16i8 V128:$Rn)), (i64 0))),
              maski8_or_more)),
          (i32 (EXTRACT_SUBREG
            (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
              (!cast<Instruction>(!strconcat(baseOpc, "v16i8v")) V128:$Rn), bsub),
            ssub))>;
  def : Pat<(i32 (and (i32 (vector_extract (insert_subvector undef,
              (opNode (v4i16 V64:$Rn)), (i64 0)), (i64 0))), maski16_or_more)),
            (i32 (EXTRACT_SUBREG
              (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
                (!cast<Instruction>(!strconcat(baseOpc, "v4i16v")) V64:$Rn), hsub),
              ssub))>;
  def : Pat<(i32 (and (i32 (vector_extract (opNode (v8i16 V128:$Rn)), (i64 0))),
              maski16_or_more)),
          (i32 (EXTRACT_SUBREG
            (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
              (!cast<Instruction>(!strconcat(baseOpc, "v8i16v")) V128:$Rn), hsub),
            ssub))>;
  }

As you can see, the patterns checks roughly (and (...), mask[8|16]_or_more) and it folds the `and` node. When I tried to split the bitmask immediate on ISelDAGToDAG level, I saw the cases in which above patterns does not work because the `mask[8]16]_or_more` constraint is failed.

As other case, on AArch64ISelDAGToDAG, '(or (and' patterns are folded by `tryBitfieldInsertOp()`. After splitting the bitmask immediate, I saw the cases in which the `tryBitfieldInsertOp()` is failed.

I have not checked all of regressions but there were more cases in which there are more instructions after splitting the bitmask immediate on ISelDAGToDAG level. In order to avoid it, I implemented the logic with `CustomInserter`.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109963/new/

https://reviews.llvm.org/D109963