[PATCH] D134477: [X86] Lower vector interleave into unpck and perm

Wed Oct 12 16:21:50 PDT 2022

zhuhan0 updated this revision to Diff 467293.
zhuhan0 added a comment.

Generalize to other 256-bit vector types; Limit change to AVX2 only.

I decided to put the changes in the same diff so that it's easier to review.
This change showed wins across all other 256-bit vector types as well, when
measured on an internal compression benchmark (basically a loop as shown in
https://godbolt.org/z/s17Kv1s9T). We saw 40.6% and 33.4% improvement on
v16i16 and v8i32 respectively. For v4i64 though, we see merely 0.9%
improvement as the current perm + blend codegen seems already very good. So
I don't think this change is worth it for the 64-bit types.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134477/new/

https://reviews.llvm.org/D134477

Files:
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/test/CodeGen/X86/slow-pmulld.ll
  llvm/test/CodeGen/X86/vector-interleave.ll
  llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-2.ll
  llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-2.ll
  llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-2.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D134477.467293.patch
Type: text/x-patch
Size: 43698 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20221012/10b1c1d8/attachment.bin>