[all-commits] [llvm/llvm-project] 25d976: [ScalarizeMaskedMemIntr] Don't use a scalar mask o...

Thu Aug 22 17:03:06 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 25d976b45cb5b3d222d3a9cd94caa8a54031bbb7
      https://github.com/llvm/llvm-project/commit/25d976b45cb5b3d222d3a9cd94caa8a54031bbb7
  Author: Krzysztof Drewniak <Krzysztof.Drewniak at amd.com>
  Date:   2024-08-22 (Thu, 22 Aug 2024)

  Changed paths:
    M llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp
    M llvm/test/Transforms/ScalarizeMaskedMemIntrin/AMDGPU/expamd-masked-load.ll
    M llvm/test/Transforms/ScalarizeMaskedMemIntrin/AMDGPU/expand-masked-gather.ll
    M llvm/test/Transforms/ScalarizeMaskedMemIntrin/AMDGPU/expand-masked-scatter.ll
    M llvm/test/Transforms/ScalarizeMaskedMemIntrin/AMDGPU/expand-masked-store.ll

  Log Message:
  -----------
  [ScalarizeMaskedMemIntr] Don't use a scalar mask on GPUs (#104842)

ScalarizedMaskedMemIntr contains an optimization where the <N x i1> mask
is bitcast into an iN and then bit-tests with powers of two are used to
determine whether to load/store/... or not.

However, on machines with branch divergence (mainly GPUs), this is a
mis-optimization, since each i1 in the mask will be stored in a
condition register - that is, ecah of these "i1"s is likely to be a word
or two wide, making these bit operations counterproductive.

Therefore, amend this pass to skip the optimizaiton on targets that it
pessimizes.

Pre-commit tests #104645

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications