[PATCH] D140069: [DAGCombiner] Scalarize vectorized loads that are splatted

Tue Dec 20 08:42:14 PST 2022

dmgreen added a comment.

Thanks. The remaining Arm and AArch64 test cases look OK to me.

================
Comment at: llvm/test/CodeGen/AArch64/arm64-vmul.ll:1227-1230
+; CHECK-NEXT:    add x8, x1, #4
+; CHECK-NEXT:    ldr d1, [x0]
+; CHECK-NEXT:    ld1r.2s { v0 }, [x8]
+; CHECK-NEXT:    fmulx.2s v0, v1, v0
----------------
luke wrote:
> @dmgreen This looks like a regression to me but I'm not familiar enough with aarch64 to really know for certain. I presume the cost of the additional add instruction outweighs any gains from a smaller load, is that correct?
> 
>  (Hope I'm not bombarding you with too many questions, let me know if there's someone else I can ask!) 
Yeah it's the same as the other cases. I heard that it can be a little worse than if the dup could be part of the mul/fmulx/etc, but that's a separate issue from this patch. In practice many cases will already be a splat of a scalar value, so will already run into the same issue.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140069/new/

https://reviews.llvm.org/D140069