[PATCH] D156112: [AArch64][LoopVectorize] Improve tail-folding heuristic on neoverse-v1
David Sherwood via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 25 01:34:35 PDT 2023
david-arm added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:3789
+ // insufficient computation between comparisons can slow down the code.
+ return NumInsns >= SVETailFoldInsnThreshold * NumComparisons;
}
----------------
At first glance this feels a little brutal - doubling (or even tripling!) the threshold when there is an extra compare in the loop. Also, I would have thought that after adding one or two more compares we should really hit a plateau for the threshold because at the point the volume of compares is more likely to be the bottleneck after filling up all the pipelines?
@igor.kirillov What loops have you tested this on and have you established what the minimum thresholds required to prevent tail-folding are? I just wonder if instead of multiplying the threshold you can actually do something like this:
unsigned AdditionalInsns = NumComparisons > 1 ? 5 : 0;
return NumInsns >= (SVETailFoldInsnThreshold + AdditionalInsns );
If you haven't done so already, then I think it's worth collecting some data on what thresholds are really needed.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D156112/new/
https://reviews.llvm.org/D156112
More information about the llvm-commits
mailing list