[PATCH] D48768: [X86] When have BMI2, prefer shifts to clear low/high bits, rather than variable mask.

Sat Jun 30 10:33:08 PDT 2018

craig.topper added inline comments.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:4797
+  // There are only 32-bit and 64-bit forms for SHLX/SHRX.
+  return Subtarget.hasBMI2() && (VT == MVT::i32 || VT == MVT::i64);
+}
----------------
You may not want to do this for i64 on 32-bit targets. Would that improve the results in clear_highbits64_c0 for example?

================
Comment at: lib/Target/X86/X86InstrInfo.td:2566
+  // x << (64 - y) >> (64 - y)
+  def : Pat<(srl (shl GR64:$src, (i8 (trunc (sub 64, GR64:$lz)))),
+                 (i8 (trunc (sub 64, GR64:$lz)))),
----------------
The fact that you needed this patterns is actually a bug in the code that turn turns load+and into the bzhi pattern. Normally the trunc would have narrowed the subtract to 32-bits. But the trunc, shift, and subtract were created after legalize ops, but the shit amount was created as i64. It should have been created directly as an i8. The re-legalize we run as part of the last DAG combine caught it and fixed it, but it was too late to get the subtract fixed.

I suppose this pattern might be useful if the subtract had other users that weren't the truncate which would prevent the normal combine from kicking in. But by that logic we would need to replicate all patterns that all look for truncated subtracts. And it wouldn't be limited to just the BZHI64 patterns. You could have a 32 bit data, but a shift amount that is initially 64 bits.

Repository:
  rL LLVM

https://reviews.llvm.org/D48768