[PATCH] D138904: [AArch64] Transform shift+and to shift+shift to select more shifted register

Tue Nov 29 17:37:23 PST 2022

mingmingl added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:632
+
+  unsigned ShiftAmtC = ShiftAmtNode->getZExtValue();
+  SDValue RHS = N.getOperand(1);
----------------
nit: use `uint64_t` (the same type as the return value of `getZExtValue()` to avoid warnings.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:647-671
+  if (LHSOpcode == ISD::SHL) {
+    if (LowZBits <= ShiftAmtC || (BitWidth != LowZBits + MaskLen))
+      return false;
+
+    NewShiftC = LowZBits - ShiftAmtC;
+    NewShiftOp = VT == MVT::i64 ? AArch64::UBFMXri : AArch64::UBFMWri;
+  } else {
----------------
nit pick:

To ensure that UBFM/SBFM is used as a right shift (essentially [[ https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/UBFX--Unsigned-Bitfield-Extract--an-alias-of-UBFM-?lang=en | UBFX ]]) here, add `assert(BitWidth-1>=NewShiftC && "error message")` 

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:654-655
+  } else {
+    if (LowZBits == 0)
+      return false;
+
----------------
Relatedly, with 1) N as `and (srl x, c), mask`  and 2) LowZBits == 0, `N` should be selected as UBFX by [[ https://github.com/llvm/llvm-project/blob/6a91a5e82647f206a31706586d201ecc638e9365/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp#L2174 | isBitfieldExtractOp ]] //before// tablegen pattern (where `AArch64DAGToDAGISel::SelectShiftedRegister` gets called)  applies. Similarly, with `and(shl x, c), mask` and LowZBits == 0, `N` is a UBFIZ.

So I think this shouldn't happen be seen in practice; but an early exit (not assertion) looks okay.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:657-658
+
+    if (LHSOpcode == ISD::SRA && (BitWidth != LowZBits + MaskLen))
+      return false;
+
----------------
For correctness, similar check is needed when LHSOpcode is ISD::SRL; meanwhile for `srl`, the condition `BitWidth == LowZBits + MaskLen + ShiftAmtC` should be sufficient

For example, https://gcc.godbolt.org/z/YvGG3Pov3 shows an IR at trunk; and with the current patch the optimized code changes the result (not correct).
```
	lsr	x8, x0, #3
	add	x0, x1, x8, lsl #1
	ret
```

================
Comment at: llvm/test/CodeGen/AArch64/shiftregister-from-and.ll:137-139
+; negative test: type is not i32 or i64
+
+define i16 @shiftedreg_from_and_negative_type(i16 %a, i16 %b) {
----------------
nit pick:

i16 will be legalized to i32 [1], and the patch optimizes `shiftedreg_from_and_negative_type` (trunk https://gcc.godbolt.org/z/cnac5989e). This doesn't seem to be a negative test . Probably use a vector like https://gcc.godbolt.org/z/T4szTsEav

[1] `llc -O3 -mtriple=aarch64 -debug -print-after-all test.ll` gives debug logs, and https://gist.github.com/minglotus/763173f099bec76c720ec9ce7c181471 shows the legalization step.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138904/new/

https://reviews.llvm.org/D138904