[PATCH] D156112: [AArch64][LoopVectorize] Improve tail-folding heuristic on neoverse-v1

Tue Jul 25 01:34:35 PDT 2023

david-arm added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:3789
+  // insufficient computation between comparisons can slow down the code.
+  return NumInsns >= SVETailFoldInsnThreshold * NumComparisons;
 }
----------------
At first glance this feels a little brutal - doubling (or even tripling!) the threshold when there is an extra compare in the loop. Also, I would have thought that after adding one or two more compares we should really hit a plateau for the threshold because at the point the volume of compares is more likely to be the bottleneck after filling up all the pipelines?

@igor.kirillov What loops have you tested this on and have you established what the minimum thresholds required to prevent tail-folding are? I just wonder if instead of multiplying the threshold you can actually do something like this:

  unsigned AdditionalInsns = NumComparisons > 1 ? 5 : 0;
  return NumInsns >= (SVETailFoldInsnThreshold  + AdditionalInsns );

If you haven't done so already, then I think it's worth collecting some data on what thresholds are really needed.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156112/new/

https://reviews.llvm.org/D156112