[all-commits] [llvm/llvm-project] 86bb7d: [CostModel][X86] getScalarizationOverhead - handle...

Mon May 2 01:58:56 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 86bb7df6e6ea6cfbba3b13b82c0c48eb1c45d198
      https://github.com/llvm/llvm-project/commit/86bb7df6e6ea6cfbba3b13b82c0c48eb1c45d198
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2022-05-02 (Mon, 02 May 2022)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/intrinsic-cost-kinds.ll
    M llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost-inseltpoison.ll
    M llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
    M llvm/test/Analysis/CostModel/X86/shuffle-replication-i1.ll

  Log Message:
  -----------
  [CostModel][X86] getScalarizationOverhead - handle vXi1 extracts with MOVMSK (pre-AVX512)

We can quickly extract multiple elements of a bool vector using MOVMSK ops - since we don't know what generated the vXi1, I've been optimistic and assumed we can use PMOVMSKB to extract the maximum number of bools with a single op.

The MOVMSK pattern isn't great for extract+insert round trips as vXi1 type legalization can interfere with this a lot - so this relies on us remaining good at using getScalarizationOverhead properly (and tagging both Insert and Extract modes) for those round trip cases.

The AVX512 KMOV codegen for bool extraction is a bit of a mess so for now I've not included that - the per-element cost is a lot more accurate for current codegen.