[PATCH] D87231: [AArch64] ExtractElement is free when combined with pairwise add
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 8 05:21:25 PDT 2020
spatel added a comment.
In D87231#2260558 <https://reviews.llvm.org/D87231#2260558>, @sanwou01 wrote:
> Thanks @spatel . You're right that we miss that pattern, but, so does x86 currently it seems (I don't read x86 very well so I might be wrong).
Horizontal math ops are a special case for x86 (not all targets support them and even fewer prefer them for performance), so we need to make a CPU subtarget adjustment to see if that example is working:
$ clang -O1 faddp.c -S -o - -target x86_64 -mllvm -disable-vector-combine -march=btver2
vhaddps %xmm0, %xmm0, %xmm0
> I did find `scalarizeBinOpOfSplats` in `DAGCombiner` but that doesn't seem to work here, nor do any of the other patterns in `SimplifyVBinOp`.
The x86 horizontal transforms are specialized because the HW instructions themselves are weird - no sane target would ever create that functionality from scratch. :)
See "LowerToHorizontalOp" and "lowerAddSubToHorizontalOp" in X86ISelLowering.cpp.
That said, there may still be room to improve the cost models and/or usage here, but I'm not sure exactly how to adjust it. For example, we might match this pattern as a 2-way pairwise reduction?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D87231/new/
https://reviews.llvm.org/D87231
More information about the llvm-commits
mailing list