[PATCH] D50222: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case)

Mon Sep 10 21:49:52 PDT 2018

hermord marked 5 inline comments as done.
hermord added inline comments.

================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:3782
+
+  // TODO: Add non-uniform constant support
+  ConstantSDNode *Divisor = isConstOrConstSplat(REMNode->getOperand(1));
----------------
RKSimon wrote:
> It should be trivial for you to add non-uniform vector support in this patch, following the patterns I did for the other divsion-by-constant builders. 
I've got conflicting directions for this patch (add nonsplat here or submit it separately), so I didn't touch that part in this update.

================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:3799
+  APInt D0 = D.lshr(K);
+
+  // P = inv(D0, 2^W)
----------------
lebedev.ri wrote:
> Ok, thank you for the explanation regarding `D0 != 1`.
> I think here now we can just add an assert, to document it.
> ```
> APInt D0 = D.lshr(K);
> 
> assert(!D0.isOneValue() && "The fold is invalid for D0 of 1. Unreachable since we leave powers-of-two to be improved elsewhere.");
> ```
This turns out to actually be reachable: the logic that handles REM by powers of two is in `visitREM`, which is invoked after `visitSetCC` from where we would've come here. I kept this as an early return if that's OK.

================
Comment at: test/CodeGen/X86/jump_sign.ll:405
+; CHECK-NEXT:    andl %ecx, %eax
+; CHECK-NEXT:    movb $-85, %dl
+; CHECK-NEXT:    # kill: def $al killed $al killed $eax
----------------
This is the `test1` regression I mentioned in previous updates. I've made it explicit now, for the lack of a clean fix that I can see (without `computeKnownBits`).

================
Comment at: test/CodeGen/X86/urem-seteq.ll:4
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu < %s | FileCheck %s --check-prefixes=CHECK,X64,NOBMI2
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+bmi2 < %s | FileCheck %s --check-prefixes=CHECK,X64,BMI2
+
----------------
lebedev.ri wrote:
> lebedev.ri wrote:
> > Actually, looking at the check-lines, i'm not sure we want to check the bmi2 version.
> > I'm not seeing anything there that would benefit from anything above baseline, i think.
> I've meant to drop this check-line, but apparently forgot.
> I'm not 100% sure if we want/don't want it.
Should I drop it? On a related note, I could add `AVX2` to the vector tests on `X86` if that's likely to be useful.

Repository:
  rL LLVM

https://reviews.llvm.org/D50222