[PATCH] D122436: Teach the AArch64 backend that vector reduction NEON instructions implicitly zero the high lanes of the result, meaning that we can eliminate explicit zeroing.
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 4 05:03:12 PDT 2022
dmgreen edited reviewers, added: dmgreen, efriedma; removed: greened.
dmgreen added a comment.
My worry with this is that the top lanes are not always defined to be zero by the DAG nodes. There is a comment in the header that says:
// Vector across-lanes addition
// Only the lower result lane is defined.
And they can be selected in a number of ways, things like ADDPv2i64p are defined to produce a scalar results which is inserted into an undef vector.
Maybe that's OK, but we are relying on shaky semantics. Whilst it is true that the ADDV/ADDP instructions clear the top bits (as do many other instruction that set s/d regs), it's not clear to me where that is ensured through the pipeline.
================
Comment at: llvm/test/CodeGen/AArch64/vecreduce-zeroing.ll:6
+
+define dso_local noundef <4 x i32> @umaxv(<4 x i32> noundef %0) local_unnamed_addr #0 {
+; CHECK-LABEL: umaxv:
----------------
We can usually remove dso_local and local_unnamed_addr #0 to clean up the tests a bit.
================
Comment at: llvm/test/CodeGen/AArch64/vecreduce-zeroing.ll:62
+
+attributes #0 = { mustprogress nofree nosync nounwind readnone willreturn uwtable "frame-pointer"="non-leaf" "min-legal-vector-width"="128" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="neoverse-n1" "target-features"="+aes,+crc,+crypto,+dotprod,+fp-armv8,+fullfp16,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+spe,+ssbs,+v8.2a" }
+attributes #1 = { nofree nosync nounwind readnone willreturn }
----------------
Can these be removed?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D122436/new/
https://reviews.llvm.org/D122436
More information about the llvm-commits
mailing list