[PATCH] D87384: [PowerPC] Add ISEL patterns for Mul with Imm.
Nemanja Ivanovic via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Sep 17 18:08:41 PDT 2020
nemanjai added a comment.
In D87384#2279886 <https://reviews.llvm.org/D87384#2279886>, @jsji wrote:
> Why this can NOT be done in DAGCombiner by implementing `decomposeMulByConstant` target hook?
Do you have evidence of profitability of `(mul (shl %a, N) M)` being preferable over `(mul %a, C)` on other targets? If so, I suppose extending the DAG combine that handles:
mul x, (2^N + 1) --> add (shl x, N), x
mul x, (2^N - 1) --> sub (shl x, N), x
Perhaps it wold be good to gauge interest from other RISC targets.
Scenarios 2 and 3 are quite similar to the existing DAG combine so this would be a straightforward implementation. However I am not sure how likely it is that two shifts and an add/sub are better than a multiply on other targets.
================
Comment at: llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:4988
+ int64_t ImmSh = Imm >> Shift;
+ if (isInt<16>(ImmSh)) {
+ uint64_t SextImm = SignExtend64(ImmSh & 0xFFFF, 16);
----------------
jsji wrote:
> Can we add comments about all these scenarios? With simple examples.
I agree 100%. We should illustrate this with some example bit patterns.
================
Comment at: llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:4998
+ if (isPowerOf2_64(ImmSh - 1)) {
+ // The multiplicand has exactly two bits set so we don't need a multiply.
+ unsigned Shift2 = countTrailingZeros<uint64_t>(ImmSh - 1);
----------------
jsji wrote:
> `exactly two bits set `? What do you mean?
2^N + 2^M has exactly two bits set.
However, I don't think this is profitable as implemented. This implementation does not benefit from an ILP improvement since the instructions are dependent. What we probably want is to convert
`(mul %a, (2^N + 2^M))` to `(add (shl %a, N), (shl %a, M))` since that improves ILP so the total latency of the sequence is actually reduced. That would also mean that for something like this:
```
*res1 = a * 0x8800;
*res2 = a * 0x8080;
```
we only need three total shifts (we should also add that as a test case).
Similar argument applies to the subtract case below.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D87384/new/
https://reviews.llvm.org/D87384
More information about the llvm-commits
mailing list