[PATCH] D50222: [CodeGen] [SelectionDAG] More efficient code for X % C == 0
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 16 02:13:55 PDT 2018
lebedev.ri added a comment.
A few passing-by thoughts.
================
Comment at: include/llvm/CodeGen/TargetLowering.h:3502
+ SDValue BuildREMEqFold(SDNode *N, SDNode *CondNode,
+ SelectionDAG &DAG, std::vector<SDNode *> &Created) const;
+
----------------
This should be `SmallVectorImpl<SDNode *> &Created` like in all the other cases.
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:3372-3392
+ // fold (rem lhs, rhs) -> lhs when |lhs| < |rhs|
+ // FIXME: this optimization is otherwise missed if BuildREMEqFold succeeds.
+ // Ref: test1 in test/CodeGen/X86/jump_sign.ll
+ if (N1C) {
+ KnownBits LHSBits;
+ DAG.computeKnownBits(N0, LHSBits);
+ LHSBits.makeNonNegative();
----------------
I think this is independent from the rest?
Can you split it into a separate review?
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:3394-3397
+ if (N->hasOneUse() && N->use_begin()->getOpcode() == ISD::SETCC) {
+ if (SDValue Folded = BuildREMEqFold(N, *N->use_begin()))
+ return Folded;
+ }
----------------
This feels slightly backwards.
Can you do this in `DAGCombiner::visitSETCC()`, and thus avoid all this use-dance?
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:18133-18135
+ // UREM/SREM will be shorter than any other path we take here
+ if (DAG.getMachineFunction().getFunction().optForMinSize())
+ return SDValue();
----------------
Would be good to have a test (in a new file) for that.
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:18135
+ if (DAG.getMachineFunction().getFunction().optForMinSize())
+ return SDValue();
+
----------------
Please format your changes with clang-format (you can automate that via git pre-commit hook)
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:18142
+
+ std::vector<SDNode *> Built;
+ SDValue NewCond = TLI.BuildREMEqFold(REMNode, CondNode, DAG, Built);
----------------
```
SmallVector<SDNode *, 8> Built;
```
like in all the other functions.
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:18143-18145
+ SDValue NewCond = TLI.BuildREMEqFold(REMNode, CondNode, DAG, Built);
+
+ if (NewCond) {
----------------
Early return please.
```
SDValue NewCond = TLI.BuildREMEqFold(REMNode, CondNode, DAG, Built);
if (!NewCond)
return SDValue();
```
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:18146-18151
+ SDNode *Last = Built.back();
+ for (SDNode *N : Built)
+ AddToWorklist(N);
+ AddToWorklist(NewCond.getNode());
+ DAG.ReplaceAllUsesWith(SDValue(CondNode, 0), NewCond);
+ return SDValue(Last, 0);
----------------
Hm. Why is this different from what `DAGCombiner::BuildUDIV()` does?
I expected to see just the:
```
for (SDNode *N : Built)
AddToWorklist(N);
return NewCond;
```
================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:3606
+ SelectionDAG &DAG,
+ std::vector<SDNode *> &Created) const {
+ // fold (seteq/ne (urem N, D), 0) -> (setule/gte (rotr (mul N, P), K), Q)
----------------
This should be `SmallVectorImpl<SDNode *> &Created` like in all the other cases.
================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:3631
+ return SDValue();
+ APInt D = Divisor->getAPIntValue();
+ unsigned W = D.getBitWidth();
----------------
Separate with newline.
================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:3633-3634
+ unsigned W = D.getBitWidth();
+ // The algorithm implemented below assumes D > 1
+ // We use the fact that N % D == 0 <=> N % |D| == 0 to guarantee this
+ if (IsSigned && D.isNegative())
----------------
Add self-explanatory assertion?
================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:3645
+ // Calculate the multiplicative inverse of D0 using Newton's method.
+ APInt tmp, P = D0;
+ while ((tmp = D0*P) != 1)
----------------
This is begging to be wrong. Can you create two variables on different lines?
================
Comment at: test/CodeGen/Hexagon/swp-const-tc2.ll:36
+ ret i32 %v11
}
----------------
I think the change to this test can land right away separately from the rest of the changes.
================
Comment at: test/CodeGen/X86/rem.ll:81
+; This tests the BuildREMEqFold optimization with UREM, i32, odd divisor, SETEQ.
+; The corresponding pseudocode is:
----------------
Why not instead put all this into a new `test/CodeGen/X86/rem-seteq.ll` file?
I'm not sure whether it is worth it splitting the `urem` and `srem` tests into different files though.
Repository:
rL LLVM
https://reviews.llvm.org/D50222
More information about the llvm-commits
mailing list