[PATCH] D136015: [InstCombine] Fold series of instructions into mull

Wed Oct 19 07:40:48 PDT 2022

spatel added inline comments.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineInternal.h:550
+  /// Tries to simplify a few sequence operations into MULL
+  Value *SimplifyMull(BinaryOperator &I);
+
----------------
There's no need to make a class function for this transform. Just create a static function above InstCombinerImpl::visitAdd(). 

Use the raw BinaryOperator::CreateMul() to return an Instruction, so we don't need to pass the Builder or use replaceInstUsesWith().

================
Comment at: llvm/lib/Transforms/InstCombine/InstructionCombining.cpp:856
+Value *InstCombinerImpl::SimplifyMull(BinaryOperator &I) {
+  if (!I.getType()->isIntegerTy())
+    return nullptr;
----------------
The type check is insufficient in at least 2 ways and over-restrictive in other. 

So we need at least 3 more tests like this:

```
define i9 @mul9_low(i9 %in0, i9 %in1) {
  %In0Lo = and i9 %in0, 15
  %In0Hi = lshr i9 %in0, 4
  %In1Lo = and i9 %in1, 15
  %In1Hi = lshr i9 %in1, 4
  %m10 = mul i9 %In1Hi, %In0Lo
  %m01 = mul i9 %In1Lo, %In0Hi
  %m00 = mul i9 %In1Lo, %In0Lo
  %addc = add i9 %m10, %m01
  %shl = shl i9 %addc, 4
  %addc9 = add i9 %shl, %m00
  ret i9 %addc9
}

define <2 x i8> @mul_v2i8_low(<2 x i8> %in0, <2 x i8> %in1) {
  %In0Lo = and <2 x i8> %in0, <i8 15, i8 15>
  %In0Hi = lshr <2 x i8> %in0, <i8 4, i8 4>
  %In1Lo = and <2 x i8> %in1, <i8 15, i8 15>
  %In1Hi = lshr <2 x i8> %in1, <i8 4, i8 4>
  %m10 = mul <2 x i8> %In1Hi, %In0Lo
  %m01 = mul <2 x i8> %In1Lo, %In0Hi
  %m00 = mul <2 x i8> %In1Lo, %In0Lo
  %addc = add <2 x i8> %m10, %m01
  %shl = shl <2 x i8> %addc, <i8 4, i8 4>
  %addc9 = add <2 x i8> %shl, %m00
  ret <2 x i8> %addc9
}

define i128 @mul128_low(i128 %in0, i128 %in1) {
  %In0Lo = and i128 %in0, 18446744073709551615
  %In0Hi = lshr i128 %in0, 64
  %In1Lo = and i128 %in1, 18446744073709551615
  %In1Hi = lshr i128 %in1, 64
  %m10 = mul i128 %In1Hi, %In0Lo
  %m01 = mul i128 %In1Lo, %In0Hi
  %m00 = mul i128 %In1Lo, %In0Lo
  %addc = add i128 %m10, %m01
  %shl = shl i128 %addc, 64
  %addc9 = add i128 %shl, %m00
  ret i128 %addc9
}
```

================
Comment at: llvm/lib/Transforms/InstCombine/InstructionCombining.cpp:866
+  // ResLo = m00 + (addc >> 32);
+  bool IsMulLow =
+      match(&I, m_c_Add(m_Value(M00),
----------------
The structure of these matches is confusing. I'd prefer to organize it more like this:

```
  // R = (CrossSum << HalfBits) + (XLo * YLo)
  Value *XLo, *YLo;
  Value *CrossSum;
  if (!match(&I, m_c_Add(m_Shl(m_Value(CrossSum), m_SpecificInt(HalfBits)),
                         m_Mul(m_Value(XLo), m_Value(YLo)))))
    return nullptr;

  // XLo = X & HalfMask
  // YLo = Y & HalfMask
  Value *X, *Y;
  if (!match(XLo, m_And(m_Value(X), m_SpecificInt(HalfMask))) ||
      !match(YLo, m_And(m_Value(Y), m_SpecificInt(HalfMask))))
    return nullptr;

  // CrossSum = (X' * (Y >> Halfbits)) + (Y' * (X >> HalfBits))
...

```
IIUC, X' can be either X or XLo in the pattern (and the same for Y'). You can probably use `m_CombineOr(m_Specific(), m_Specific())` to match that with minimal code.

================
Comment at: llvm/test/Transforms/InstCombine/mul.ll:1578
+
+define i64 @mul64_low(i64 noundef %in0, i64 noundef %in1) {
+; CHECK-LABEL: @mul64_low(
----------------
The tests are incomplete for commutative patterns. As I said earlier, I think we need at least 16 tests to verify that the matching is working as expected. 

Once we have the right tests in place, please pre-commit the baseline tests (CHECK lines without the code change), so we will only show diffs in this patch.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136015/new/

https://reviews.llvm.org/D136015