[PATCH] D131028: [AArch64] Fix cost model for FADD vector reduction
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 16 00:34:45 PDT 2022
dmgreen added a comment.
Out of the results I have seen - there are a number that are a little better that have 4x manually unrolled float summations. They look OK, they seem to improve by 5-10%.
The example I shared was the most obviously worse, even if it is wrapped up in awkward SLP codegen. It is 20%-40% worse depending on the CPU. There are a few other cases that get worse that have the 4x manual unrolling, including a f64 matrix multiply and something called iir_lattice. As far as I can see all the example that get worse have multiplies into a reduction.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D131028/new/
https://reviews.llvm.org/D131028
More information about the llvm-commits
mailing list