[PATCH] D48936: [X86][SSE] Prefer BLEND(SHL(v, c1), SHL(v, c2)) over MUL(v, c3)

Sun Jul 8 10:20:40 PDT 2018

RKSimon added inline comments.

================
Comment at: test/CodeGen/X86/lower-vec-shift.ll:211-232
 define <8 x i16> @test9(<8 x i16> %a) {
 ; SSE-LABEL: test9:
 ; SSE:       # %bb.0:
-; SSE-NEXT:    movdqa %xmm0, %xmm1
-; SSE-NEXT:    psraw $3, %xmm1
 ; SSE-NEXT:    movdqa {{.*#+}} xmm2 = [65535,0,65535,65535,65535,0,0,0]
-; SSE-NEXT:    psraw $1, %xmm0
-; SSE-NEXT:    pand %xmm2, %xmm0
-; SSE-NEXT:    pandn %xmm1, %xmm2
-; SSE-NEXT:    por %xmm2, %xmm0
+; SSE-NEXT:    movdqa %xmm0, %xmm1
+; SSE-NEXT:    pand %xmm2, %xmm1
+; SSE-NEXT:    psraw $2, %xmm0
----------------
lebedev.ri wrote:
> Subj only talks about `mul`, but this is `div`.
> This is intended to be changed by this patch?
> If yes, there is no `lshr` test as far as i can tell.
> 
This is a side effect of only accepting v8i16 2shifts+blend on pre-SSE41 (no PBLENDW) if the shuffle can be widened to v4i32, as without PBLENDW we have to perform a bitmask with OR(ANDN,AND) - but for other shifts we'd end up doing that anyway - I suppose I could limit this to SHL cases only?

Repository:
  rL LLVM

https://reviews.llvm.org/D48936