[all-commits] [llvm/llvm-project] 70995a: [ScalarizeMaskedMemIntr] Optimize splat non-consta...

Fri Aug 16 14:24:46 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 70995a1a3379ed3c21b1c5da6723f04166cb0ae6
      https://github.com/llvm/llvm-project/commit/70995a1a3379ed3c21b1c5da6723f04166cb0ae6
  Author: Krzysztof Drewniak <Krzysztof.Drewniak at amd.com>
  Date:   2024-08-16 (Fri, 16 Aug 2024)

  Changed paths:
    M llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp
    M llvm/test/CodeGen/X86/bfloat.ll
    M llvm/test/CodeGen/X86/shuffle-half.ll
    M llvm/test/Transforms/ScalarizeMaskedMemIntrin/X86/expand-masked-load.ll
    M llvm/test/Transforms/ScalarizeMaskedMemIntrin/X86/expand-masked-store.ll

  Log Message:
  -----------
  [ScalarizeMaskedMemIntr] Optimize splat non-constant masks (#104537)

In cases (like the ones added in the tests) where the condition of a
masked load or store is a splat but not a constant (that is, a masked
operation is being used to implement patterns like "load if the current
lane is in-bounds, otherwise return 0"), optimize the 'scalarized' code
to perform an aligned vector load/store if the splat constant is true.

Additionally, take a few steps to preserve aliasing information and
names when nothing is scalarized while I'm here.

As motivation, some LLVM IR users will genatate masked load/store in
cases that map to this kind of predicated operation (where either the
vector is loaded/stored or it isn't) in order to take advantage of
hardware primitives, but on AMDGPU, where we don't have a masked load or
store, this pass would scalarize a load or store that was intended to be
- and can be - vectorized while also introducing expensive branches.

Fixes #104520

Pre-commit tests at #104527

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications