[PATCH] D111938: [TTI][X86] Add SSE2 sub-128bit vXi16/32 and v2i64 stride 2 interleaved load costs

Sat Oct 16 08:11:45 PDT 2021

RKSimon added inline comments.

================
Comment at: llvm/lib/Target/X86/X86TargetTransformInfo.cpp:5224
-      {2, MVT::v2i16, 2},   // (load 4i16 and) deinterleave into 2 x 2i16
-      {2, MVT::v4i16, 2},   // (load 8i16 and) deinterleave into 2 x 4i16
       {2, MVT::v8i16, 6},   // (load 16i16 and) deinterleave into 2 x 8i16
----------------
lebedev.ri wrote:
> Looking at `llvm-project/llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-2.ll`,
> VF4 codegen is really different between SSE2 and AVX2.
nice catch!

================
Comment at: llvm/lib/Target/X86/X86TargetTransformInfo.cpp:5230
-      {2, MVT::v2i32, 2},   // (load 4i32 and) deinterleave into 2 x 2i32
       {2, MVT::v4i32, 2},   // (load 8i32 and) deinterleave into 2 x 4i32
       {2, MVT::v8i32, 4},   // (load 16i32 and) deinterleave into 2 x 8i32
----------------
lebedev.ri wrote:
> Looking at `llvm-project/llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-2.ll`,
> `@load_i32_stride2_vf4` also seems to match.
every little helps :)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111938/new/

https://reviews.llvm.org/D111938