[all-commits] [llvm/llvm-project] c66620: [X86][Costmodel] getMaskedMemoryOpCost(): don't sc...

Mon May 24 10:10:18 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: c666208f63802cd104db2689cd72eb7a86e64a06
      https://github.com/llvm/llvm-project/commit/c666208f63802cd104db2689cd72eb7a86e64a06
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-05-24 (Mon, 24 May 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll

  Log Message:
  -----------
  [X86][Costmodel] getMaskedMemoryOpCost(): don't scalarize non-power-of-two vectors with legal element type

This follows in steps of similar `getMemoryOpCost()` changes, D100099/D100684.

Intel SDM, `VPMASKMOV — Conditional SIMD Integer Packed Loads and Stores`:
```
Faults occur only due to mask-bit required memory accesses that caused the faults. Faults will not occur due to
referencing any memory location if the corresponding mask bit for that memory location is 0. For example, no
faults will be detected if the mask bits are all zero.
```
I.e., if mask is all-zeros, any address is fine.

Masked load/store's prime use-case is e.g. tail masking the loop remainder,
where for the last iteration, only first some few elements of a vector exist.

So much similarly, i don't see why must we scalarize non-power-of-two vectors,
iff the element type is something we can masked- store/load.
We simply need to legalize it, widen the mask, and be done with it.
And we even already count the cost of widening the mask.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D102990