[all-commits] [llvm/llvm-project] 318797: [NFC][X86][Codegen] Add codegen test coverage for ...

Tue Dec 13 10:21:51 PST 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 31879765d838c1797e9efd54deb225096ab89f03
      https://github.com/llvm/llvm-project/commit/31879765d838c1797e9efd54deb225096ab89f03
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2022-12-13 (Tue, 13 Dec 2022)

  Changed paths:
    A llvm/test/CodeGen/X86/vector-replicaton-i1-mask.ll

  Log Message:
  -----------
  [NFC][X86][Codegen] Add codegen test coverage for i1 mask replication (AVX512 only)

Apparently i didn't add it when adding cost model coverage?

  Commit: ff5fcda43093630fb6342730163195591c5728f9
      https://github.com/llvm/llvm-project/commit/ff5fcda43093630fb6342730163195591c5728f9
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2022-12-13 (Tue, 13 Dec 2022)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/extend.ll
    M llvm/test/Analysis/CostModel/X86/min-legal-vector-width.ll
    M llvm/test/Analysis/CostModel/X86/trunc.ll

  Log Message:
  -----------
  [x86][Costmodel] AVX512VL: add missing costs for v8 i1<->i32 casts

This would come up as a regression in the follow-up Replication-of-i1 patch.

https://godbolt.org/z/fxr9Mzssr

  Commit: 64d46e141cfeaa1e6f4f68b8d526f1f712bba42b
      https://github.com/llvm/llvm-project/commit/64d46e141cfeaa1e6f4f68b8d526f1f712bba42b
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2022-12-13 (Tue, 13 Dec 2022)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/shuffle-replication-i1-codesize.ll
    M llvm/test/Analysis/CostModel/X86/shuffle-replication-i1-latency.ll
    M llvm/test/Analysis/CostModel/X86/shuffle-replication-i1-sizelatency.ll
    M llvm/test/Analysis/CostModel/X86/shuffle-replication-i1.ll

  Log Message:
  -----------
  [NFC][Costmodel][X86] Replication shuffle: AVX512F can promote i1 to i32.

As the added codegen test coverage shows,
there isn't that much difference between AVX512DQI and
baseline AVX512F codegen, DQI added `vpmovm2d`/`vpmovd2m`,
but with just the Foundation we can use `vpternlogd`/`vptestmd`
to do the same.

  Commit: c499e7a8a74e85d35b135d1b4c249503603827aa
      https://github.com/llvm/llvm-project/commit/c499e7a8a74e85d35b135d1b4c249503603827aa
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2022-12-13 (Tue, 13 Dec 2022)

  Changed paths:
    M llvm/include/llvm/MC/MCInst.h

  Log Message:
  -----------
  [NFC][MC] `MCInst`: `Operands` small size optimization: store 10, not 8, inline `MCOperand`

This improves the torture test of
```
./bin/llvm-exegesis -mcpu=znver3 -mode=inverse_throughput --opcode-index=-1 --benchmarks-file=/dev/null --dump-object-to-disk=0 --measurements-print-progress --skip-measurements
```
from ~2m16s to ~2min07s, and has the following effect on memory:

```
heaptrack stats:
        allocations:            100828624 -> 77362343 (-23.2%)
        leaked allocations:     1128
        temporary allocations:  24911300  ->  1576308 (-93.7%) !!!

peak heap memory consumption:
        78.2MB after 02.121s    ->   76.4MB after 01.985s (-2.3%)
peak RSS (including heaptrack overhead):
        193.4MB                 ->   192.6MB              (-0.4%)
```

The reasoning is that having more Operands than the SSO is costly,
because we go to global allocator, but having larger SSO is fine,
even if it's not always needed, because MCInst is hopefully pool-allocated.

I'm not sure who is the code owner of this component.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D139882

Compare: https://github.com/llvm/llvm-project/compare/5004320590ae...c499e7a8a74e