[all-commits] [llvm/llvm-project] 0f3230: [SLP] Better estimate cost of no-op extracts on ta...

Fri Apr 2 02:52:28 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 0f3230390b8becb59362963b8be630b3e32541b1
      https://github.com/llvm/llvm-project/commit/0f3230390b8becb59362963b8be630b3e32541b1
  Author: Florian Hahn <flo at fhahn.com>
  Date:   2021-04-02 (Fri, 02 Apr 2021)

  Changed paths:
    M llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
    M llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll
    M llvm/test/Transforms/SLPVectorizer/X86/alternate-fp-inseltpoison.ll
    M llvm/test/Transforms/SLPVectorizer/X86/alternate-fp.ll

  Log Message:
  -----------
  [SLP] Better estimate cost of no-op extracts on target vectors.

The motivation for this patch is to better estimate the cost of
extracelement instructions in cases were they are going to be free,
because the source vector can be used directly.

A simple example is

    %v1.lane.0 = extractelement <2 x double> %v.1, i32 0
    %v1.lane.1 = extractelement <2 x double> %v.1, i32 1

    %a.lane.0 = fmul double %v1.lane.0, %x
    %a.lane.1 = fmul double %v1.lane.1, %y

Currently we only consider the extracts free, if there are no other
users.

In this particular case, on AArch64 which can fit <2 x double> in a
vector register, the extracts should be free, independently of other
users, because the source vector of the extracts will be in a vector
register directly, so it should be free to use the vector directly.

The SLP vectorized version of noop_extracts_9_lanes is 30%-50% faster on
certain AArch64 CPUs.

It looks like this does not impact any code in
SPEC2000/SPEC2006/MultiSource both on X86 and AArch64 with -O3 -flto.

This originally regressed after D80773, so if there's a better
alternative to explore, I'd be more than happy to do that.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D99719