[all-commits] [llvm/llvm-project] 0aff17: [Analysis] Add simple cost model for strict (in-or...

Mon Jul 26 02:26:25 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 0aff1798b5721d5f95d16f465b99d357012bb8d1
      https://github.com/llvm/llvm-project/commit/0aff1798b5721d5f95d16f465b99d357012bb8d1
  Author: David Sherwood <david.sherwood at arm.com>
  Date:   2021-07-26 (Mon, 26 Jul 2021)

  Changed paths:
    M llvm/include/llvm/Analysis/TargetTransformInfo.h
    M llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
    M llvm/include/llvm/CodeGen/BasicTTIImpl.h
    M llvm/lib/Analysis/TargetTransformInfo.cpp
    M llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
    M llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
    M llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
    M llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
    M llvm/lib/Target/ARM/ARMTargetTransformInfo.h
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/lib/Target/X86/X86TargetTransformInfo.h
    M llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
    M llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
    A llvm/test/Analysis/CostModel/AArch64/reduce-fadd.ll
    M llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
    M llvm/test/Analysis/CostModel/X86/intrinsic-cost-kinds.ll
    M llvm/test/Analysis/CostModel/X86/reduce-fadd.ll
    M llvm/test/Analysis/CostModel/X86/reduce-fmul.ll
    A llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
    A llvm/test/Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

  Log Message:
  -----------
  [Analysis] Add simple cost model for strict (in-order) reductions

I have added a new FastMathFlags parameter to getArithmeticReductionCost
to indicate what type of reduction we are performing:

  1. Tree-wise. This is the typical fast-math reduction that involves
  continually splitting a vector up into halves and adding each
  half together until we get a scalar result. This is the default
  behaviour for integers, whereas for floating point we only do this
  if reassociation is allowed.
  2. Ordered. This now allows us to estimate the cost of performing
  a strict vector reduction by treating it as a series of scalar
  operations in lane order. This is the case when FP reassociation
  is not permitted. For scalable vectors this is more difficult
  because at compile time we do not know how many lanes there are,
  and so we use the worst case maximum vscale value.

I have also fixed getTypeBasedIntrinsicInstrCost to pass in the
FastMathFlags, which meant fixing up some X86 tests where we always
assumed the vector.reduce.fadd/mul intrinsics were 'fast'.

New tests have been added here:

  Analysis/CostModel/AArch64/reduce-fadd.ll
  Analysis/CostModel/AArch64/sve-intrinsics.ll
  Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
  Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

Differential Revision: https://reviews.llvm.org/D105432