[PATCH] D148980: [X86] Machine combine vnni instruction.

Fri Apr 21 21:39:11 PDT 2023

LuoYuanke created this revision.
Herald added subscribers: pengfei, hiraditya.
Herald added a project: All.
LuoYuanke requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

"vpmaddwd + vpmaddwd" can be combined to vpdpwssd and the latency is
reduced after combination. However when vpmaddwd is in a critical path
the combination get less ILP. It happens when vpdpwssd in a loop, the
vpmaddwd can be executed in parallel in multi-iterations while vpmaddwd
has data dependency for each iterations. Since the latency of vpmaddwd
is less than vpdpwssd, it is profitable to split vpdpwssd into "vpmaddwd
+ vpmaddwd".
This patch is based on the machine combiner framework to acheive decision
on "vpmaddwd + vpmaddwd" combination. The typical example code is as
below.

  __m256i foo(int cnt, __m256i c, __m256i b, __m256i *p) {

      for (int i = 0; i < cnt; ++i) {
          __m256i a = p[i];
          __m256i m = _mm256_madd_epi16 (b, a);
          c = _mm256_add_epi32(m, c);
      }

      return c;
  }

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D148980

Files:
  llvm/include/llvm/CodeGen/MachineCombinerPattern.h
  llvm/lib/CodeGen/MachineCombiner.cpp
  llvm/lib/Target/X86/X86InstrInfo.cpp
  llvm/lib/Target/X86/X86InstrInfo.h
  llvm/test/CodeGen/X86/avxvnni-combine.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D148980.516025.patch
Type: text/x-patch
Size: 12615 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230422/b22c1318/attachment.bin>