[PATCH] D87231: [AArch64] ExtractElement is free when combined with pairwise add

Tue Sep 8 05:21:25 PDT 2020

spatel added a comment.

In D87231#2260558 <https://reviews.llvm.org/D87231#2260558>, @sanwou01 wrote:

> Thanks @spatel . You're right that we miss that pattern, but, so does x86 currently it seems (I don't read x86 very well so I might be wrong).

Horizontal math ops are a special case for x86 (not all targets support them and even fewer prefer them for performance), so we need to make a CPU subtarget adjustment to see if that example is working:

  $ clang -O1 faddp.c -S -o - -target x86_64 -mllvm -disable-vector-combine -march=btver2
    vhaddps	%xmm0, %xmm0, %xmm0

> I did find `scalarizeBinOpOfSplats` in `DAGCombiner` but that doesn't seem to work here, nor do any of the other patterns in `SimplifyVBinOp`.

The x86 horizontal transforms are specialized because the HW instructions themselves are weird - no sane target would ever create that functionality from scratch. :) 
See "LowerToHorizontalOp" and "lowerAddSubToHorizontalOp" in X86ISelLowering.cpp.

That said, there may still be room to improve the cost models and/or usage here, but I'm not sure exactly how to adjust it. For example, we might match this pattern as a 2-way pairwise reduction?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87231/new/

https://reviews.llvm.org/D87231