[PATCH] D50222: [CodeGen] [SelectionDAG] More efficient code for X % C == 0

Thu Aug 16 02:13:55 PDT 2018

lebedev.ri added a comment.

A few passing-by thoughts.

================
Comment at: include/llvm/CodeGen/TargetLowering.h:3502
+  SDValue BuildREMEqFold(SDNode *N, SDNode *CondNode,
+                         SelectionDAG &DAG, std::vector<SDNode *> &Created) const;
+
----------------
This should be `SmallVectorImpl<SDNode *> &Created` like in all the other cases.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:3372-3392
+  // fold (rem lhs, rhs) -> lhs when |lhs| < |rhs|
+  // FIXME: this optimization is otherwise missed if BuildREMEqFold succeeds.
+  // Ref: test1 in test/CodeGen/X86/jump_sign.ll
+  if (N1C) {
+    KnownBits LHSBits;
+    DAG.computeKnownBits(N0, LHSBits);
+    LHSBits.makeNonNegative();
----------------
I think this is independent from the rest?
Can you split it into a separate review?

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:3394-3397
+  if (N->hasOneUse() && N->use_begin()->getOpcode() == ISD::SETCC) {
+    if (SDValue Folded = BuildREMEqFold(N, *N->use_begin()))
+      return Folded;
+  }
----------------
This feels slightly backwards.
Can you do this in `DAGCombiner::visitSETCC()`, and thus avoid all this use-dance?

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:18133-18135
+  // UREM/SREM will be shorter than any other path we take here
+  if (DAG.getMachineFunction().getFunction().optForMinSize())
+      return SDValue();
----------------
Would be good to have a test (in a new file) for that.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:18135
+  if (DAG.getMachineFunction().getFunction().optForMinSize())
+      return SDValue();
+
----------------
Please format your changes with clang-format (you can automate that via git pre-commit hook)

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:18142
+
+  std::vector<SDNode *> Built;
+  SDValue NewCond = TLI.BuildREMEqFold(REMNode, CondNode, DAG, Built);
----------------
```
SmallVector<SDNode *, 8> Built;
```
like in all the other functions.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:18143-18145
+  SDValue NewCond = TLI.BuildREMEqFold(REMNode, CondNode, DAG, Built);
+
+  if (NewCond) {
----------------
Early return please.
```
SDValue NewCond = TLI.BuildREMEqFold(REMNode, CondNode, DAG, Built);

if (!NewCond)
  return SDValue();
```

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:18146-18151
+    SDNode *Last = Built.back();
+    for (SDNode *N : Built)
+      AddToWorklist(N);
+    AddToWorklist(NewCond.getNode());
+    DAG.ReplaceAllUsesWith(SDValue(CondNode, 0), NewCond);
+    return SDValue(Last, 0);
----------------
Hm. Why is this different from what `DAGCombiner::BuildUDIV()` does?
I expected to see just the:
```
for (SDNode *N : Built)
  AddToWorklist(N);
return NewCond;
```

================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:3606
+                                       SelectionDAG &DAG,
+                                       std::vector<SDNode *> &Created) const {
+  // fold (seteq/ne (urem N, D), 0) -> (setule/gte (rotr (mul N, P), K), Q)
----------------
This should be `SmallVectorImpl<SDNode *> &Created` like in all the other cases.

================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:3631
+    return SDValue();
+  APInt D = Divisor->getAPIntValue();
+  unsigned W = D.getBitWidth();
----------------
Separate with newline.

================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:3633-3634
+  unsigned W = D.getBitWidth();
+  // The algorithm implemented below assumes D > 1
+  // We use the fact that N % D == 0 <=> N % |D| == 0 to guarantee this
+  if (IsSigned && D.isNegative())
----------------
Add self-explanatory assertion?

================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:3645
+  // Calculate the multiplicative inverse of D0 using Newton's method.
+  APInt tmp, P = D0;
+  while ((tmp = D0*P) != 1)
----------------
This is begging to be wrong. Can you create two variables on different lines?

================
Comment at: test/CodeGen/Hexagon/swp-const-tc2.ll:36
+  ret i32 %v11
 }
----------------
I think the change to this test can land right away separately from the rest of the changes.

================
Comment at: test/CodeGen/X86/rem.ll:81

+; This tests the BuildREMEqFold optimization with UREM, i32, odd divisor, SETEQ.
+; The corresponding pseudocode is:
----------------
Why not instead put all this into a new `test/CodeGen/X86/rem-seteq.ll` file?
I'm not sure whether it is worth it splitting the `urem` and `srem` tests into different files though.

Repository:
  rL LLVM

https://reviews.llvm.org/D50222