[PATCH] D110237: [AArch64][SVE] Add DAG combines to improve SVE fixed type FP_EXTEND lowering

Wed Sep 29 08:53:44 PDT 2021

bsmith added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15248
+
+  if (!VT.isFixedLengthVector())
+    return SDValue();
----------------
sdesmalen wrote:
> is there a reason this is specific to fixed-width vectors? It seems the same problem exists for:
> 
>   %op1 = load <vscale x 8 x half>, <vscale x 8 x half>* %a
>   %res = fpext <vscale x 8 x half> %op1 to <vscale x 8 x float>
>   store <vscale x 8 x float> %res, <vscale x 8 x float>* %b
> 
For the scalable case the generated extract_subvector doesn't require a trip through memory, instead it just does an unpack lo/hi pair which is much more reasonable.

(In fact this optimization actually makes things worse for scalable, we end up splitting the load but then combining it back together again with a UZP1, meaning the extract_subvectors are still present, but now also with a split load + recombine).

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15269-15277
+    // Check if there are other uses. If so, do not combine as it will introduce
+    // an extra load.
+    for (SDNode::use_iterator UI = LD->use_begin(), UE = LD->use_end();
+         UI != UE; ++UI) {
+      if (UI.getUse().getResNo() == 1) // Ignore uses of the chain result.
+        continue;
+      if (*UI != N)
----------------
sdesmalen wrote:
> nit: is this loop maybe better expressed with llvm::any_of ?
I'm not sure we can actually do that in this case. Using any_of would automatically dereference the `use_iterator` which calls `getUser()`, we also need to be able to call `getUse()` on the iterator.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110237/new/

https://reviews.llvm.org/D110237