[PATCH] D138421: [AArch64][SVE] Enable Tail-Folding. WIP

Mon Nov 21 06:12:31 PST 2022

david-arm added a comment.

Hi @SjoerdMeijer, thanks for looking into this. I do actually already have a patch to enable this by default (https://reviews.llvm.org/D130618), where the default behaviour is tuned according to the CPU. I think this is what we want because the profile will change according to what CPU you're running on - some CPUs may handle reductions better than others. The decision in this patch may be incorrect for 128-bit vector implementations. I also ran SPEC2k17 on a SVE-enabled CPU as well and I remember I saw a small (2-3%) regression in parest or something like that, which is one of the reasons I didn't push the patch any further. I also think it's really important to run a much larger set of benchmarks besides SPEC2k17 and collect numbers to show the benefits, since there isn't much vectorisation actually going on in SPEC2k17.

One of the major problems with the currrent tail-folding implementation is that we make the decision before doing any cost analysis in the vectoriser, which isn't great because we may be forcing the vectoriser to take different code paths to if we didn't tail-fold. Ideally what we really want is to move to a model where the vectoriser has a two-dimensional matrix of costs considering the combination of VF and vectorisation style (e.g. tail-folding vs whole vector loops, etc.), and choose the most optimal combination.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138421/new/

https://reviews.llvm.org/D138421