[PATCH] D87384: [PowerPC] Add ISEL patterns for Mul with Imm.

Thu Sep 17 18:08:41 PDT 2020

nemanjai added a comment.

In D87384#2279886 <https://reviews.llvm.org/D87384#2279886>, @jsji wrote:

> Why this can NOT be done in DAGCombiner by implementing `decomposeMulByConstant` target hook?

Do you have evidence of profitability of `(mul (shl %a, N) M)` being preferable over `(mul %a, C)` on other targets? If so, I suppose extending the DAG combine that handles:

  mul x, (2^N + 1) --> add (shl x, N), x
  mul x, (2^N - 1) --> sub (shl x, N), x

Perhaps it wold be good to gauge interest from other RISC targets.

Scenarios 2 and 3 are quite similar to the existing DAG combine so this would be a straightforward implementation. However I am not sure how likely it is that two shifts and an add/sub are better than a multiply on other targets.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:4988
+    int64_t ImmSh = Imm >> Shift;
+    if (isInt<16>(ImmSh)) {
+      uint64_t SextImm = SignExtend64(ImmSh & 0xFFFF, 16);
----------------
jsji wrote:
> Can we add comments about all these scenarios?  With simple examples.
I agree 100%. We should illustrate this with some example bit patterns.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:4998
+    if (isPowerOf2_64(ImmSh - 1)) {
+      // The multiplicand has exactly two bits set so we don't need a multiply.
+      unsigned Shift2 = countTrailingZeros<uint64_t>(ImmSh - 1);
----------------
jsji wrote:
> `exactly two bits set `? What do you mean?
2^N + 2^M has exactly two bits set.

However, I don't think this is profitable as implemented. This implementation does not benefit from an ILP improvement since the instructions are dependent. What we probably want is to convert
`(mul %a, (2^N + 2^M))` to `(add (shl %a, N), (shl %a, M))` since that improves ILP so the total latency of the sequence is actually reduced. That would also mean that for something like this:
```
  *res1 = a * 0x8800;
  *res2 = a * 0x8080;
```
we only need three total shifts (we should also add that as a test case).
Similar argument applies to the subtract case below.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87384/new/

https://reviews.llvm.org/D87384