[PATCH] D148980: [X86] Machine combine vnni instruction.
LuoYuanke via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 21 21:39:11 PDT 2023
LuoYuanke created this revision.
Herald added subscribers: pengfei, hiraditya.
Herald added a project: All.
LuoYuanke requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.
"vpmaddwd + vpmaddwd" can be combined to vpdpwssd and the latency is
reduced after combination. However when vpmaddwd is in a critical path
the combination get less ILP. It happens when vpdpwssd in a loop, the
vpmaddwd can be executed in parallel in multi-iterations while vpmaddwd
has data dependency for each iterations. Since the latency of vpmaddwd
is less than vpdpwssd, it is profitable to split vpdpwssd into "vpmaddwd
+ vpmaddwd".
This patch is based on the machine combiner framework to acheive decision
on "vpmaddwd + vpmaddwd" combination. The typical example code is as
below.
__m256i foo(int cnt, __m256i c, __m256i b, __m256i *p) {
for (int i = 0; i < cnt; ++i) {
__m256i a = p[i];
__m256i m = _mm256_madd_epi16 (b, a);
c = _mm256_add_epi32(m, c);
}
return c;
}
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D148980
Files:
llvm/include/llvm/CodeGen/MachineCombinerPattern.h
llvm/lib/CodeGen/MachineCombiner.cpp
llvm/lib/Target/X86/X86InstrInfo.cpp
llvm/lib/Target/X86/X86InstrInfo.h
llvm/test/CodeGen/X86/avxvnni-combine.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D148980.516025.patch
Type: text/x-patch
Size: 12615 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230422/b22c1318/attachment.bin>
More information about the llvm-commits
mailing list