[PATCH] D90445: [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic

Sat Oct 31 07:02:55 PDT 2020

RKSimon added inline comments.

================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll:17
+; CHECK-NEXT:    store <4 x i32> [[TMP11]], <4 x i32>* [[TMP12]], align 4
 ; CHECK-NEXT:    ret void
 ;
----------------
anton-afanasyev wrote:
> RKSimon wrote:
> > anton-afanasyev wrote:
> > > anton-afanasyev wrote:
> > > > RKSimon wrote:
> > > > > This doesn't look great in the final codegen: https://gcc.godbolt.org/z/vE9Yoe
> > > > > 
> > > > > Which suggests either the costs aren't correct or we're not correctly including the cost of something - the buildvector of the pointers? are we missing getelementptr vectorization?
> > > > Oops, thanks, it looks I've missed the buildvector cost.
> > > Hmm, investigated this: no, I was wrong, the calculated cost is correct (`insertelemets`s for gather instr are compesated by `insertelement`s for buildvector). Further investigating this... Looks like codegen issue more.
> > Hmm - does the X86 TTI handle buildvectors of pointers costs or does it fallback to the generic implementation (which is almost certainly lower)?
> Despite of codegen output for Skylake looking complicated I believe it's still more optimized, since the `gather` is cheaper than four `load`s from memory, isn't it?
My concern is that -march=avx512 does nothing so you're just getting raw sse2 costs here - hence why I updated the tests at rG1eeae4310771d8a6896fe09effe88883998f34e8

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90445/new/

https://reviews.llvm.org/D90445