[PATCH] D109963: [AArch64] Split bitmask immediate of bitwise AND operation

Mon Sep 20 05:41:58 PDT 2021

jaykang10 added a comment.

In D109963#3009100 <https://reviews.llvm.org/D109963#3009100>, @dmgreen wrote:

>> For example, there are patterns as below.
>>
>>   multiclass SIMDAcrossLanesUnsignedIntrinsic<string baseOpc,
>>                                               SDPatternOperator opNode>
>>       : SIMDAcrossLanesIntrinsic<baseOpc, opNode> {
>>   // If there is a masking operation keeping only what has been actually
>>   // generated, consume it.
>>   def : Pat<(i32 (and (i32 (vector_extract (insert_subvector undef,
>>               (opNode (v8i8 V64:$Rn)), (i64 0)), (i64 0))), maski8_or_more)),
>>         (i32 (EXTRACT_SUBREG
>>           (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
>>             (!cast<Instruction>(!strconcat(baseOpc, "v8i8v")) V64:$Rn), bsub),
>>           ssub))>;
>>   def : Pat<(i32 (and (i32 (vector_extract (opNode (v16i8 V128:$Rn)), (i64 0))),
>>               maski8_or_more)),
>>           (i32 (EXTRACT_SUBREG
>>             (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
>>               (!cast<Instruction>(!strconcat(baseOpc, "v16i8v")) V128:$Rn), bsub),
>>             ssub))>;
>>   def : Pat<(i32 (and (i32 (vector_extract (insert_subvector undef,
>>               (opNode (v4i16 V64:$Rn)), (i64 0)), (i64 0))), maski16_or_more)),
>>             (i32 (EXTRACT_SUBREG
>>               (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
>>                 (!cast<Instruction>(!strconcat(baseOpc, "v4i16v")) V64:$Rn), hsub),
>>               ssub))>;
>>   def : Pat<(i32 (and (i32 (vector_extract (opNode (v8i16 V128:$Rn)), (i64 0))),
>>               maski16_or_more)),
>>           (i32 (EXTRACT_SUBREG
>>             (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)),
>>               (!cast<Instruction>(!strconcat(baseOpc, "v8i16v")) V128:$Rn), hsub),
>>             ssub))>;
>>   }
>>
>> As you can see, the patterns checks roughly (and (...), mask[8|16]_or_more) and it folds the `and` node. When I tried to split the bitmask immediate on ISelDAGToDAG level, I saw the cases in which above patterns does not work because the `mask[8]16]_or_more` constraint is failed.
>>
>> As other case, on AArch64ISelDAGToDAG, '(or (and' patterns are folded by `tryBitfieldInsertOp()`. After splitting the bitmask immediate, I saw the cases in which the `tryBitfieldInsertOp()` is failed.
>>
>> I have not checked all of regressions but there were more cases in which there are more instructions after splitting the bitmask immediate on ISelDAGToDAG level. In order to avoid it, I implemented the logic with `CustomInserter`.
>
> OK I see. So when adding the code to AArch64DAGToDAGISel::Select, it happened before the tablegen patterns and we do have tablegen patterns that conflict with it. I'm a little surprised to see "_or_more" here.
>
> Could it be done as a tblgen pattern then? That way the larger pattern should win. As far as I understand it would be a case of getting the correct ImmLeaf with a couple of xforms.

As I mentioned on previous comment, the patterns for splitting the bitmask would need `AddedComplexity` to guarantee that the other patterns with `and` are already matched. I thought it could not be good for potential patterns with `and` to be added in the future.
>From my personal opinion, MIR level could be suitable place to handle the bitmask immediate. At this moment, AArch64 target generates Pseudo `MOV` MIR with immediate which is big and expand it on `AArch64ExpandPseudo`. It could be good to generate Pseudo `AND` or logical MIR with the bitmask immediate and expand it like `MOV`. However, it could be big change...
Let me try to add the patterns with `xform` and `AddedComplexity`.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:2281
+
+  uint64_t Mask = 0xFFFFULL;
+  for (unsigned i = 0; i < 4; i++) {
----------------
Wilco1 wrote:
> Is there no function to check it is a 16-bit immediate? If not, it would be best to add it.
We can check it with `isUInt<16>(Mask)`. Let me update it.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:2290
+  Mask = 0xFFFFULL;
+  uint64_t OrgNImm = ~OrgImm;
+  for (unsigned i = 0; i < 4; i++) {
----------------
Wilco1 wrote:
> This won't correctly handle 32-bit immediates like 0x1234FFFF since the invert sets the top 32 bits.
You are right! Let me update it.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:2292
+  for (unsigned i = 0; i < 4; i++) {
+    Mask <<= i * 32;
+    // This immediate can be suitable for single MOV instruction.
----------------
Wilco1 wrote:
> Should be * 8.
Sorry, it is my mistake. Maybe, it could be 16 instead of 8?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:2308
+  // to the position of highest bit set.
+  unsigned NewImm1 = (2U << HighestBitSet) - (1U << LowestBitSet);
+  // Creat a mask which is filled with one outside the position of lowest bit
----------------
Wilco1 wrote:
> These need to be 64 bits
You are right! Let me update it.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109963/new/

https://reviews.llvm.org/D109963