[PATCH] D122436: Teach the AArch64 backend that vector reduction NEON instructions implicitly zero the high lanes of the result, meaning that we can eliminate explicit zeroing.

Mon Apr 4 05:03:12 PDT 2022

dmgreen edited reviewers, added: dmgreen, efriedma; removed: greened.
dmgreen added a comment.

My worry with this is that the top lanes are not always defined to be zero by the DAG nodes. There is a comment in the header that says:

  // Vector across-lanes addition
  // Only the lower result lane is defined.

And they can be selected in a number of ways, things like ADDPv2i64p are defined to produce a scalar results which is inserted into an undef vector.

Maybe that's OK, but we are relying on shaky semantics. Whilst it is true that the ADDV/ADDP instructions clear the top bits (as do many other instruction that set s/d regs), it's not clear to me where that is ensured through the pipeline.

================
Comment at: llvm/test/CodeGen/AArch64/vecreduce-zeroing.ll:6
+
+define dso_local noundef <4 x i32> @umaxv(<4 x i32> noundef %0) local_unnamed_addr #0 {
+; CHECK-LABEL: umaxv:
----------------
We can usually remove dso_local and local_unnamed_addr #0 to clean up the tests a bit.

================
Comment at: llvm/test/CodeGen/AArch64/vecreduce-zeroing.ll:62
+
+attributes #0 = { mustprogress nofree nosync nounwind readnone willreturn uwtable "frame-pointer"="non-leaf" "min-legal-vector-width"="128" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="neoverse-n1" "target-features"="+aes,+crc,+crypto,+dotprod,+fp-armv8,+fullfp16,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+spe,+ssbs,+v8.2a" }
+attributes #1 = { nofree nosync nounwind readnone willreturn }
----------------
Can these be removed?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122436/new/

https://reviews.llvm.org/D122436