[PATCH] D131028: [AArch64] Fix cost model for FADD vector reduction

Tue Aug 16 00:34:45 PDT 2022

dmgreen added a comment.

Out of the results I have seen - there are a number that are a little better that have 4x manually unrolled float summations. They look OK, they seem to improve by 5-10%.

The example I shared was the most obviously worse, even if it is wrapped up in awkward SLP codegen. It is 20%-40% worse depending on the CPU. There are a few other cases that get worse that have the 4x manual unrolling, including a f64 matrix multiply and something called iir_lattice. As far as I can see all the example that get worse have multiplies into a reduction.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131028/new/

https://reviews.llvm.org/D131028