[PATCH] D54095: [X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from `X`

Wed Nov 14 14:12:00 PST 2018

craig.topper added inline comments.

================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:2874
+    if (ShiftAmt.getValueType() != MVT::i8) {
+      ShiftAmt = CurDAG->getNode(ISD::TRUNCATE, DL, MVT::i8, ShiftAmt);
+      insertDAGNode(*CurDAG, OrigShiftAmt, ShiftAmt);
----------------
lebedev.ri wrote:
> craig.topper wrote:
> > When does this ever happen? Type and op legalization should have canonicalized all shifts to use an i8 amount.
> Not all. See `if (NBits.getValueType() != NVT) {` check above.
> It catches (used to catch?) some cases in the pattern c/d.
> 
> I don't have a testcase for *this* branch, i'm simply being cautious based on that branch.
Since you know for sure the Shift Amount came from an SRL node, I think you can just get by with an assert.

================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:2879
+    // Now, *zero*-extend the shift amount. The bits 8...15 *must* be zero!
+    ShiftAmt = CurDAG->getNode(ISD::ZERO_EXTEND, DL, MVT::i16, ShiftAmt);
+    insertDAGNode(*CurDAG, OrigShiftAmt, ShiftAmt);
----------------
lebedev.ri wrote:
> craig.topper wrote:
> > Can we just directly zero extend to NVT and avoid the insert_subreg?
> I answered that in https://reviews.llvm.org/D54095#1298483 :
> > On these tests it has the same effect as directly zero-extending to the proper register size, so not sure maybe i'm again over-engineering this?
> 
> I can't test right now but i'm basically thinking about `i16 %shamt`,
> which would (should) be truncated to `i8`, as you have said, but it was `i16`,
> so `ZERO_EXT` to i16 should be omittable? (and we do not care about other bits)
So my concern is that zero_extend to i16 is 1 byte longer than zero_extend to 32/64. It also has a partial register dependency to preserve bits 31:16. But we know that so the isel pattern for zero_extend to i16 emits a zero extend to i32 and an extract_subreg to avoid that. But your code also emits an insert_subreg. So a later pass has to merge and remove the extract_subreg+insert_subreg. So it feels like we should just emit the right thing here.

If the shift amount was i16 we are way too late in the proccess for anything to realize the upper bits are 0. You would have to have an entirely different path for that.

Repository:
  rL LLVM

https://reviews.llvm.org/D54095