[PATCH] D145583: [AArch64][SME] Fix an infinite loop in DAGCombine related to adding -force-streaming-compatible-sve flag.

Fri Mar 10 03:58:50 PST 2023

sdesmalen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:24500-24502
+  if (Subtarget->forceStreamingCompatibleSVE() &&
+      (Opc == ISD::ZERO_EXTEND || Opc == ISD::SIGN_EXTEND))
+    return false;
----------------
The only reason this currently works for NEON is because `DAGCombiner::SimplifyVCastOp` for no good reason singles out `ISD::SPLAT_VECTOR`, which is only used for scalable vectors and SVE.
When I add `|| N0.getOpcode() == ISD::BUILD_VECTOR` to the condition in `SimplifyVCastOp`, various AArch64 tests fail for the same reason as the test you've added here.

I think the better solution would be to pass the whole `Node*` to preferScalarizeSplat, and then to look at its uses. If the opcode is a zero/sign extend and any of the uses is a mul, then we don't want to do this transformation, because that's when a umull/smull instruction can be used.

I looked into whether it was possible to make `LoweMUL` smarter and have it recognise the `dup(ext(x))` pattern so that we don't need to do the combine for `dup(ext(x)) -> ext(dup(x)` in the first place, but this seems rather non-trivial.

================
Comment at: llvm/test/CodeGen/AArch64/aarch64-force-streaming-compatible-sve.ll:4
+
+define void @jpeg_add_quant_table(i32 %0, <8 x i64> %1, ptr %2) {
+; CHECK-LABEL: jpeg_add_quant_table:
----------------
Just a general comment (and not a suggestion for this patch): It would be nice if we could use SVE's `[us]mull[bt]` instructions for this.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145583/new/

https://reviews.llvm.org/D145583