[PATCH] D141653: [X86] Improve instruction ordering of constant `srl/shl` with `and` to get better and-masks
Phoebe Wang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Jan 14 02:05:15 PST 2023
pengfei added inline comments.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:47438
-static SDValue combineShiftLeft(SDNode *N, SelectionDAG &DAG) {
+// Try and re-ordering an `and` and `srl/shl` if it result in a better constant
+// for the `and`. Note this only tries to optimize by re-ordering, other
----------------
results
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:47438
-static SDValue combineShiftLeft(SDNode *N, SelectionDAG &DAG) {
+// Try and re-ordering an `and` and `srl/shl` if it result in a better constant
+// for the `and`. Note this only tries to optimize by re-ordering, other
----------------
pengfei wrote:
> results
One more space. You may run clang-format to help solvoing the format nits.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:47533
+ } else
+ NewAndMask = AndMask.lshr(ShiftCnt);
+
----------------
Use curly braces for consistency.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:47679
- // Try to improve a sequence of srl (and X, C1), C2 by inverting the order.
- // TODO: This is a generic DAG combine that became an x86-only combine to
- // avoid shortcomings in other folds such as bswap, bit-test ('bt'), and
- // and-not ('andn').
- if (N0.getOpcode() != ISD::AND || !N0.hasOneUse())
- return SDValue();
-
- auto *ShiftC = dyn_cast<ConstantSDNode>(N1);
- auto *AndC = dyn_cast<ConstantSDNode>(N0.getOperand(1));
- if (!ShiftC || !AndC)
- return SDValue();
-
- // If we can shrink the constant mask below 8-bits or 32-bits, then this
- // transform should reduce code size. It may also enable secondary transforms
- // from improved known-bits analysis or instruction selection.
- APInt MaskVal = AndC->getAPIntValue();
-
- // If this can be matched by a zero extend, don't optimize.
- if (MaskVal.isMask()) {
- unsigned TO = MaskVal.countTrailingOnes();
- if (TO >= 8 && isPowerOf2_32(TO))
- return SDValue();
- }
+ if(SDValue V = combineLogicalShiftWithAnd(N, DAG, DCI))
+ return V;
----------------
Add one space after `if`.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:49129
+ if(SDValue V = combineLogicalShiftWithAnd(N, DAG, DCI))
+ return V;
----------------
ditto.
================
Comment at: llvm/test/CodeGen/X86/bitreverse.ll:538
-; X64-NEXT: andb $8, %al
-; X64-NEXT: leal (%rdi,%rdi), %ecx
-; X64-NEXT: andb $4, %cl
----------------
craig.topper wrote:
> pengfei wrote:
> > IIRC, LEA is expensive, so this looks like a good deal.
> I thought LEA was only expensive if it uses 3 sources.
You are right. SOM says only 3 operand LEA is slow. But I found MCA cannot reflect the difference.
So this is yet another regression.
old: https://godbolt.org/z/dPW93v8zT
new: https://godbolt.org/z/hrEovKW4f
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D141653/new/
https://reviews.llvm.org/D141653
More information about the llvm-commits
mailing list