[llvm] [Codegen][LegalizeIntegerTypes] Improve shift through stack (PR #96151)

Fri Jun 21 02:36:18 PDT 2024

================
@@ -4530,14 +4530,25 @@ void DAGTypeLegalizer::ExpandIntRes_ShiftThroughStack(SDNode *N, SDValue &Lo,
   SDValue ShAmt = N->getOperand(1);
   EVT ShAmtVT = ShAmt.getValueType();
 
-  // This legalization is optimal when the shift is by a multiple of byte width,
-  //   %x * 8 <-> %x << 3   so 3 low bits should be be known zero.
-  bool ShiftByByteMultiple =
-      DAG.computeKnownBits(ShAmt).countMinTrailingZeros() >= 3;
+  EVT LoadStoreVT = VT;
+  do {
+      LoadStoreVT = TLI.getTypeToTransformTo(*DAG.getContext(), LoadStoreVT);
+  }while (!TLI.isTypeLegal(LoadStoreVT));
+
+  const Align LoadStoreAlign = [&]() -> Align {
+      if (TLI.allowsMisalignedMemoryAccesses(LoadStoreVT))
+          return Align(1);
----------------
futog wrote:

I did not wanted to change the behavior for the targets which supports unaligned memory access. In retrospect, this was not a good idea. Also I found this comment on the original commit by efriedma (https://reviews.llvm.org/D140638):

> It might be worth implementing a strategy that avoids unaligned loads (by splitting the shift amount by the native register width instead of CHAR_BIT). On targets that don't have native unaligned loads, they're pretty expensive. Even on targets that do have unaligned loads, an aligned load can reduce the cost of the store forwarding stall. (But on targets with fast unaligned loads, they're probably worth using if the shift amount is known to be a multiple of CHAR_BIT.)

So I will implement this instead.

https://github.com/llvm/llvm-project/pull/96151