[PATCH] D82553: [ARM] Allow rounding intrinsics to be tail predicated

Thu Jun 25 09:07:39 PDT 2020

dmgreen added inline comments.

================
Comment at: llvm/lib/Target/ARM/MVETailPredication.cpp:361
           case Intrinsic::uadd_sat:
+          case Intrinsic::trunc:
+          case Intrinsic::rint:
----------------
Um, you might be fixing my bug here, but can you make it so that the floating point instructions only tail predicate when Subtarget->hasMVEEFloatOps() is true? Otherwise in integer only MVE we could start trying to and tail predicate where it will end up expanding the instruction (which probably isn't a huge deal, but we should try and get it correct).

================
Comment at: llvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-intrinsic-round.ll:3
+; RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve -verify-machineinstrs -disable-mve-tail-predication=false -o - %s | FileCheck %s
+define arm_aapcs_vfpcc void @round(float* noalias nocapture readonly %pSrcA, float* noalias nocapture %pDst, i32 %n) #0 {
+; CHECK-LABEL: round:
----------------
Can you also add a test for nearbyint too, to show that it _doesn't_ get tail predicated (I think it gets expanded to a multiple scalar instructions.

================
Comment at: llvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-intrinsic-round.ll:259
+
+attributes #0 = { nofree nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cortex-m55" "target-features"="+armv8.1-m.main,+dsp,+fp-armv8d16,+fp-armv8d16sp,+fp16,+fp64,+fullfp16,+hwdiv,+lob,+mve,+mve.fp,+ras,+strict-align,+thumb-mode,+vfp2,+vfp2sp,+vfp3d16,+vfp3d16sp,+vfp4d16,+vfp4d16sp,-aes,-bf16,-cdecp0,-cdecp1,-cdecp2,-cdecp3,-cdecp4,-cdecp5,-cdecp6,-cdecp7,-crc,-crypto,-dotprod,-fp16fml,-hwdiv-arm,-i8mm,-sb,-sha2" "unsafe-fp-math"="true" "use-soft-float"="false" }
+attributes #1 = { nosync nounwind readnone willreturn }
----------------
I think you can remove all this and the code below. It complains about stuff then use -mattr=+mve.fp in the run line.

You will probably have to remove the !tbaa info too.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82553/new/

https://reviews.llvm.org/D82553