[llvm] [InstCombine] Turn AShr into LShr more often in SimplifyDemandedUseBits (PR #99155)

Tue Jul 16 13:04:01 PDT 2024

https://github.com/bjope created https://github.com/llvm/llvm-project/pull/99155

The functional change here is to undo "llvm-svn: 311773", aka D36936,
aka commit 22178dd33b3460207b8. That patch avoided to convert AShr
into LShr in SimplifyDemandedUseBits based on known sign bits
analysis. Even if it would be legal to turn the shift into a
logical shift (given by the fact that the shifted in bits wasn't
demanded), that patch prevented converting the shift into LShr when
any of the original sign bits were demanded.
One side effect of the reverted functionalty was that the better we
were at computing number of sign bits, the less likely it was that
we would replace AShr by LShr during SimplifyDemandedUseBits. This
was seen in https://github.com/llvm/llvm-project/pull/97693/ when
an improvement of ComputeNumSignBits resulted in regressions due
to no longer rewriting AShr to LShr.
The test case from D36936 still passes after this commit. So it seems
like at least the compiler has been taught how to optimize that
scenario even if we do the AShr->LShr transform more aggressively.

>From 71880b32a3eade17b320eeef42dbb68cfaa30f72 Mon Sep 17 00:00:00 2001
From: Bjorn Pettersson <bjorn.a.pettersson at ericsson.com>
Date: Mon, 15 Jul 2024 00:09:49 +0200
Subject: [PATCH 1/2] [InstCombine] Add tests cases related to demanded use
 bits for ashr

When trying to improve value tracking in
   https://github.com/llvm/llvm-project/pull/97693
some regressions was found due to a "weirdness" in simplify demanded
use bits for ashr. Normally an ashr is replaced by lshr when the
shifted in bits aren't demanded. Some years ago (see commit
22178dd33b346020) there was a test case motivating to keep the ashr
when any sign bit (besides the shifted in bits) was demanded. The
weird part about it is that the better we get at analysing known sign
bits, the less likely it is that we canonicalize from ashr to lshr.
That makes it hard to tune other combines to work based on the
canonicalization, as well as possibly resulting in unexpected
regressions when improving value tracking.

This patch adds a test case for which it would be better to
canonicalize ashr into lshr when possible. Worth mentioning is also
that reverting 22178dd33b346020 doesn't seem to cause regressions
in any other lit tests (not even the one added in 22178dd33b346020).
---
 .../Transforms/InstCombine/ashr-demand.ll     | 36 +++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/llvm/test/Transforms/InstCombine/ashr-demand.ll b/llvm/test/Transforms/InstCombine/ashr-demand.ll
index 2b5ccf0626dd7..be7496d38f272 100644
--- a/llvm/test/Transforms/InstCombine/ashr-demand.ll
+++ b/llvm/test/Transforms/InstCombine/ashr-demand.ll
@@ -52,3 +52,39 @@ define <2 x i32> @srem2_ashr_mask_vector_nonconstant(<2 x i32> %a0, <2 x i32> %a
   %mask = and <2 x i32> %ashr, <i32 2, i32 2>
   ret <2 x i32> %mask
 }
+
+
+; If it does not matter if we do ashr or lshr, then we canonicalize to lshr.
+
+define i16 @ashr_can_be_lshr(i32 %a) {
+; CHECK-LABEL: @ashr_can_be_lshr(
+; CHECK-NEXT:    [[ASHR:%.*]] = lshr exact i32 [[A:%.*]], 16
+; CHECK-NEXT:    [[TRUNC:%.*]] = trunc nuw i32 [[ASHR]] to i16
+; CHECK-NEXT:    ret i16 [[TRUNC]]
+;
+  %ashr = ashr exact i32 %a, 16
+  %trunc = trunc nsw i32 %ashr to i16
+  ret i16 %trunc
+}
+
+; Historically SimplifyDemandedUseBits skipped replacing ashr with lshr here
+; due to known sign bits analysis indicating that %ashr had more than 33 sign
+; bits. It does however seem weird not to always canonicalize to lshr when
+; possible, and in this case rewriting into lshr would trigger further
+; optimizations.
+define i32 @ashr_can_be_lshr_2(i32 %a) {
+; CHECK-LABEL: @ashr_can_be_lshr_2(
+; CHECK-NEXT:    [[TMP1:%.*]] = or i32 [[A:%.*]], 1056964608
+; CHECK-NEXT:    [[OR:%.*]] = zext i32 [[TMP1]] to i64
+; CHECK-NEXT:    [[SHL:%.*]] = shl i64 [[OR]], 34
+; CHECK-NEXT:    [[ASHR:%.*]] = ashr exact i64 [[SHL]], 32
+; CHECK-NEXT:    [[TRUNC:%.*]] = trunc nsw i64 [[ASHR]] to i32
+; CHECK-NEXT:    ret i32 [[TRUNC]]
+;
+  %ext = zext i32 %a to i64
+  %or = or i64 %ext, 4278190080
+  %shl = shl i64 %or, 34
+  %ashr = ashr exact i64 %shl, 32
+  %trunc = trunc nsw i64 %ashr to i32
+  ret i32 %trunc
+}

>From 80523514784029ee89aaa8bd17ff74564768c45e Mon Sep 17 00:00:00 2001
From: Bjorn Pettersson <bjorn.a.pettersson at ericsson.com>
Date: Mon, 8 Jul 2024 01:35:53 +0200
Subject: [PATCH 2/2] [InstCombine] Turn AShr into LShr more often in
 SimplifyDemandedUseBits

The functional change here is to undo "llvm-svn: 311773", aka D36936,
aka commit 22178dd33b3460207b8. That patch avoided to convert AShr
into LShr in SimplifyDemandedUseBits based on known sign bits
analysis. Even if it would be legal to turn the shift into a
logical shift (given by the fact that the shifted in bits wasn't
demanded), that patch prevented converting the shift into LShr when
any of the original sign bits were demanded.
One side effect of the reverted functionalty was that the better we
were at computing number of sign bits, the less likely it was that
we would replace AShr by LShr during SimplifyDemandedUseBits. This
was seen in https://github.com/llvm/llvm-project/pull/97693/ when
an improvement of ComputeNumSignBits resulted in regressions due
to no longer rewriting AShr to LShr.
The test case from D36936 still passes after this commit. So it seems
like at least the compiler has been taught how to optimize that
scenario even if we do the AShr->LShr transform more aggressively.
---
 .../InstCombineSimplifyDemanded.cpp           | 26 ++++++++-----------
 .../Transforms/InstCombine/ashr-demand.ll     |  9 +++----
 2 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
index 081e783c964fd..8a6ec3076ac62 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
@@ -806,34 +806,30 @@ Value *InstCombinerImpl::SimplifyDemandedUseBits(Instruction *I,
 
       // Signed shift right.
       APInt DemandedMaskIn(DemandedMask.shl(ShiftAmt));
-      // If any of the high bits are demanded, we should set the sign bit as
-      // demanded.
-      if (DemandedMask.countl_zero() <= ShiftAmt)
+      // If any of the bits being shifted in are demanded, then we should set
+      // the sign bit as demanded.
+      bool ShiftedInBitsDemanded = DemandedMask.countl_zero() < ShiftAmt;
+      if (ShiftedInBitsDemanded)
         DemandedMaskIn.setSignBit();
-
       if (SimplifyDemandedBits(I, 0, DemandedMaskIn, Known, Depth + 1, Q)) {
         // exact flag may not longer hold.
         I->dropPoisonGeneratingFlags();
         return I;
       }
 
-      Known = KnownBits::ashr(
-          Known, KnownBits::makeConstant(APInt(BitWidth, ShiftAmt)),
-          ShiftAmt != 0, I->isExact());
-
-      // If the input sign bit is known to be zero, or if none of the top bits
-      // are demanded, turn this into an unsigned shift right.
-      assert(BitWidth > ShiftAmt && "Shift amount not saturated?");
-      APInt HighBits(APInt::getHighBitsSet(
-          BitWidth, std::min(SignBits + ShiftAmt - 1, BitWidth)));
-      if (Known.Zero[BitWidth-ShiftAmt-1] ||
-          !DemandedMask.intersects(HighBits)) {
+      // If the input sign bit is known to be zero, or if none of the shifted in
+      // bits are demanded, turn this into an unsigned shift right.
+      if (Known.Zero[BitWidth - 1] || !ShiftedInBitsDemanded) {
         BinaryOperator *LShr = BinaryOperator::CreateLShr(I->getOperand(0),
                                                           I->getOperand(1));
         LShr->setIsExact(cast<BinaryOperator>(I)->isExact());
         LShr->takeName(I);
         return InsertNewInstWith(LShr, I->getIterator());
       }
+
+      Known = KnownBits::ashr(
+          Known, KnownBits::makeConstant(APInt(BitWidth, ShiftAmt)),
+          ShiftAmt != 0, I->isExact());
     } else {
       llvm::computeKnownBits(I, Known, Depth, Q);
     }
diff --git a/llvm/test/Transforms/InstCombine/ashr-demand.ll b/llvm/test/Transforms/InstCombine/ashr-demand.ll
index be7496d38f272..a0e2af93b809b 100644
--- a/llvm/test/Transforms/InstCombine/ashr-demand.ll
+++ b/llvm/test/Transforms/InstCombine/ashr-demand.ll
@@ -74,12 +74,9 @@ define i16 @ashr_can_be_lshr(i32 %a) {
 ; optimizations.
 define i32 @ashr_can_be_lshr_2(i32 %a) {
 ; CHECK-LABEL: @ashr_can_be_lshr_2(
-; CHECK-NEXT:    [[TMP1:%.*]] = or i32 [[A:%.*]], 1056964608
-; CHECK-NEXT:    [[OR:%.*]] = zext i32 [[TMP1]] to i64
-; CHECK-NEXT:    [[SHL:%.*]] = shl i64 [[OR]], 34
-; CHECK-NEXT:    [[ASHR:%.*]] = ashr exact i64 [[SHL]], 32
-; CHECK-NEXT:    [[TRUNC:%.*]] = trunc nsw i64 [[ASHR]] to i32
-; CHECK-NEXT:    ret i32 [[TRUNC]]
+; CHECK-NEXT:    [[TMP1:%.*]] = shl i32 [[A:%.*]], 2
+; CHECK-NEXT:    [[TMP2:%.*]] = or i32 [[TMP1]], -67108864
+; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
   %ext = zext i32 %a to i64
   %or = or i64 %ext, 4278190080