[llvm] 96843d2 - [AArch64][GlobalISel] Change G_ANYEXT fed by scalar G_ICMP to G_ZEXT

Fri Oct 1 15:01:49 PDT 2021

Author: Jessica Paquette
Date: 2021-10-01T15:01:20-07:00
New Revision: 96843d220dd8cf10ef5e67b8bdb0205d6bb9d7f7

URL: https://github.com/llvm/llvm-project/commit/96843d220dd8cf10ef5e67b8bdb0205d6bb9d7f7
DIFF: https://github.com/llvm/llvm-project/commit/96843d220dd8cf10ef5e67b8bdb0205d6bb9d7f7.diff

LOG: [AArch64][GlobalISel] Change G_ANYEXT fed by scalar G_ICMP to G_ZEXT

This is a common pattern:

```
    %icmp:_(s32) = G_ICMP intpred(eq), ...
    %ext:_(s64) = G_ANYEXT %icmp(s32)
    %and:_(s64) = G_AND %ext, 1
```

Here's an example: https://godbolt.org/z/T13f6o8zE

This pattern appears because of the following combine in the
LegalizationArtifactCombiner:

```
// zext(trunc x) - > and (aext/copy/trunc x), mask
```

Which kicks in when we widen the result of G_ICMP from 1 bit to 32 bits.

We know that, on AArch64, a scalar G_ICMP will produce 0 or 1. So the result
of `%ext` will always be 0 or 1 as well.

We have some KnownBits combines which eliminate redundant G_ANDs with masks.
These combines don't kick in with G_ANYEXT.

So, if we replace the G_ANYEXT with G_ZEXT in this situation, the KnownBits
based combines can remove the redundant G_AND.

I wasn't sure if it woud be more appropriate to

* Take this route
* Put this in the LegalizationArtifactCombiner.
* Allow 64 bit G_ICMP destinations

I decided on this route because

1) It's simple

2) I'm not sure if philosophically-speaking, we should be handling non-artifact
instructions + target-specific details like TargetBooleanContents in the
LegalizationArtifactCombiner

3) There is a lot of existing code which assumes we only have 32 bit G_ICMP
destinations. So, adding support for 64-bit destinations seems rather invasive
right now. I think that adding support for 64-bit destinations, or modelling
G_ICMP as ADDS/SUBS/etc is probably cleaner long term though.

This gives minor code size savings on all CTMark benchmarks.

Differential Revision: https://reviews.llvm.org/D110959

Added: 
    llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-anyext-to-zext.mir

Modified: 
    llvm/lib/Target/AArch64/AArch64Combine.td
    llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/AArch64/AArch64Combine.td b/llvm/lib/Target/AArch64/AArch64Combine.td
index f702de60c0e4f..d2097f7e6ee34 100644

--- a/llvm/lib/Target/AArch64/AArch64Combine.td
+++ b/llvm/lib/Target/AArch64/AArch64Combine.td
@@ -189,6 +189,13 @@ def fold_merge_to_zext : GICombineRule<
   (apply [{ applyFoldMergeToZext(*${d}, MRI, B, Observer); }])
 >;
 
+def mutate_anyext_to_zext : GICombineRule<
+  (defs root:$d),
+  (match (wip_match_opcode G_ANYEXT):$d,
+          [{ return matchMutateAnyExtToZExt(*${d}, MRI); }]),
+  (apply [{ applyMutateAnyExtToZExt(*${d}, MRI, B, Observer); }])
+>;
+
 // Post-legalization combines which should happen at all optimization levels.
 // (E.g. ones that facilitate matching for the selector) For example, matching
 // pseudos.
@@ -204,7 +211,7 @@ def AArch64PostLegalizerLoweringHelper
 def AArch64PostLegalizerCombinerHelper
     : GICombinerHelper<"AArch64GenPostLegalizerCombinerHelper",
                        [copy_prop, erase_undef_store, combines_for_extload,
-                        sext_trunc_sextload,
+                        sext_trunc_sextload, mutate_anyext_to_zext,
                         hoist_logic_op_with_same_opcode_hands,
                         redundant_and, xor_of_and_with_same_reg,
                         extractvecelt_pairwise_add, redundant_or,

diff  --git a/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp b/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
index ffc62be25f82f..a9b3792e01182 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
@@ -262,6 +262,33 @@ void applyFoldMergeToZext(MachineInstr &MI, MachineRegisterInfo &MRI,
   Observer.changedInstr(MI);
 }
 
+/// \returns True if a G_ANYEXT instruction \p MI should be mutated to a G_ZEXT
+/// instruction.
+static bool matchMutateAnyExtToZExt(MachineInstr &MI, MachineRegisterInfo &MRI) {
+  // If this is coming from a scalar compare then we can use a G_ZEXT instead of
+  // a G_ANYEXT:
+  //
+  // %cmp:_(s32) = G_[I|F]CMP ... <-- produces 0/1.
+  // %ext:_(s64) = G_ANYEXT %cmp(s32)
+  //
+  // By doing this, we can leverage more KnownBits combines.
+  assert(MI.getOpcode() == TargetOpcode::G_ANYEXT);
+  Register Dst = MI.getOperand(0).getReg();
+  Register Src = MI.getOperand(1).getReg();
+  return MRI.getType(Dst).isScalar() &&
+         mi_match(Src, MRI,
+                  m_any_of(m_GICmp(m_Pred(), m_Reg(), m_Reg()),
+                           m_GFCmp(m_Pred(), m_Reg(), m_Reg())));
+}
+
+static void applyMutateAnyExtToZExt(MachineInstr &MI, MachineRegisterInfo &MRI,
+                              MachineIRBuilder &B,
+                              GISelChangeObserver &Observer) {
+  Observer.changingInstr(MI);
+  MI.setDesc(B.getTII().get(TargetOpcode::G_ZEXT));
+  Observer.changedInstr(MI);
+}
+
 #define AARCH64POSTLEGALIZERCOMBINERHELPER_GENCOMBINERHELPER_DEPS
 #include "AArch64GenPostLegalizeGICombiner.inc"
 #undef AARCH64POSTLEGALIZERCOMBINERHELPER_GENCOMBINERHELPER_DEPS

diff  --git a/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-anyext-to-zext.mir b/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-anyext-to-zext.mir
new file mode 100644
index 0000000000000..1b3b3408cb20e
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-anyext-to-zext.mir
@@ -0,0 +1,84 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple aarch64 -run-pass=aarch64-postlegalizer-combiner --aarch64postlegalizercombinerhelper-only-enable-rule="mutate_anyext_to_zext" -verify-machineinstrs %s -o - | FileCheck %s
+# REQUIRES: asserts
+
+...
+---
+name:            scalar_icmp
+legalized:       true
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $x0
+    ; Scalars have 0 or 1, so we want a ZExt.
+
+    ; CHECK-LABEL: name: scalar_icmp
+    ; CHECK: liveins: $x0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s64) = COPY $x0
+    ; CHECK-NEXT: %cst_1:_(s64) = G_CONSTANT i64 1
+    ; CHECK-NEXT: %icmp:_(s32) = G_ICMP intpred(eq), %copy(s64), %cst_1
+    ; CHECK-NEXT: %ext:_(s64) = G_ZEXT %icmp(s32)
+    ; CHECK-NEXT: $x0 = COPY %ext(s64)
+    ; CHECK-NEXT: RET_ReallyLR implicit $x0
+    %copy:_(s64) = COPY $x0
+    %cst_1:_(s64) = G_CONSTANT i64 1
+    %icmp:_(s32) = G_ICMP intpred(eq), %copy(s64), %cst_1
+    %ext:_(s64) = G_ANYEXT %icmp(s32)
+    $x0 = COPY %ext(s64)
+    RET_ReallyLR implicit $x0
+
+
+...
+---
+name:            vector_icmp
+legalized:       true
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $x0, $d0
+    ; Vectors have 0 or negative 1, so we don't produce a zext.
+
+    ; CHECK-LABEL: name: vector_icmp
+    ; CHECK: liveins: $x0, $d0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(<2 x s32>) = COPY $d0
+    ; CHECK-NEXT: %cst_1:_(s32) = G_CONSTANT i32 1
+    ; CHECK-NEXT: %vec_cst_1:_(<2 x s32>) = G_BUILD_VECTOR %cst_1(s32), %cst_1(s32)
+    ; CHECK-NEXT: %icmp:_(<2 x s32>) = G_ICMP intpred(eq), %copy(<2 x s32>), %vec_cst_1
+    ; CHECK-NEXT: %ext:_(<2 x s64>) = G_ANYEXT %icmp(<2 x s32>)
+    ; CHECK-NEXT: $q0 = COPY %ext(<2 x s64>)
+    ; CHECK-NEXT: RET_ReallyLR implicit $q0
+    %copy:_(<2 x s32>) = COPY $d0
+    %cst_1:_(s32) = G_CONSTANT i32 1
+    %vec_cst_1:_(<2 x s32>) = G_BUILD_VECTOR %cst_1, %cst_1
+    %icmp:_(<2 x s32>) = G_ICMP intpred(eq), %copy(<2 x s32>), %vec_cst_1
+    %ext:_(<2 x s64>) = G_ANYEXT %icmp(<2 x s32>)
+    $q0 = COPY %ext(<2 x s64>)
+    RET_ReallyLR implicit $q0
+
+...
+---
+name:            scalar_fcmp
+legalized:       true
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $x0, $d0
+    ; Scalars have 0 or 1, so we want a ZExt.
+
+    ; CHECK-LABEL: name: scalar_fcmp
+    ; CHECK: liveins: $x0, $d0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s64) = COPY $d0
+    ; CHECK-NEXT: %cst_1:_(s64) = G_FCONSTANT double 1.000000e+00
+    ; CHECK-NEXT: %fcmp:_(s32) = G_FCMP intpred(eq), %copy(s64), %cst_1
+    ; CHECK-NEXT: %ext:_(s64) = G_ZEXT %fcmp(s32)
+    ; CHECK-NEXT: $x0 = COPY %ext(s64)
+    ; CHECK-NEXT: RET_ReallyLR implicit $x0
+    %copy:_(s64) = COPY $d0
+    %cst_1:_(s64) = G_FCONSTANT double 1.0
+    %fcmp:_(s32) = G_FCMP intpred(eq), %copy(s64), %cst_1
+    %ext:_(s64) = G_ANYEXT %fcmp(s32)
+    $x0 = COPY %ext(s64)
+    RET_ReallyLR implicit $x0