[PATCH] D124612: [AArch64][LV] AArch64 does not prefer vectorized addressing
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 2 10:54:15 PDT 2022
dmgreen added a comment.
Yes - do you have benchmarking results for this patch? This option makes sense, but I'm not sure what it's doing is always optimal. There's something going on with how it alters interleaving group costs, that doesn't look like it should be related to vector addresses. One such case was cleaned up (or maybe hidden) by D124786 <https://reviews.llvm.org/D124786>, but more problems might be present.
================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll:63
+; SVE-NEXT: [[TMP9:%.*]] = getelementptr inbounds double, double* [[DATA:%.*]], <vscale x 2 x i64> [[TMP8]]
+; SVE-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 2 x double> @llvm.masked.gather.nxv2f64.nxv2p0f64(<vscale x 2 x double*> [[TMP9]], i32 8, <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i32 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), <vscale x 2 x double> undef)
+; SVE-NEXT: [[TMP10:%.*]] = getelementptr inbounds [[STRUCT_STU:%.*]], %struct.stu* [[PARAM:%.*]], i64 0, i32 0, i64 [[TMP4]]
----------------
sdesmalen wrote:
> For a scalable VF there will be no difference in practice, because it won't try to scalarise the addresses.
> If you want to test the difference between SVE and NEON, you'll need to force the VF using `-force-vector-width=2` for both RUN lines.
It might be OK in this case, but in general just _having_ the SVE architecture feature ideally shouldn't make fixed-length NEON vectorization worse. I guess with something that needs a gather, we would always expect it to use VLA vectorization, so have the gather instruction? In that case here it sounds reasonable to base it on the arch feature.
================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/interleaved-vs-scalar.ll:13
; CHECK: vector.body
-; CHECK: load <4 x i8>
+; CHECK: load i8
; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
----------------
It is hard to see why this is now correct.. the vector body looks pretty empty?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D124612/new/
https://reviews.llvm.org/D124612
More information about the llvm-commits
mailing list