[PATCH] D96206: [X86] Fold bitcast(logic(bitcast(X), Y)) --> logic'(X, bitcast(Y)) for int-int bitcasts

Sun Feb 7 02:21:36 PST 2021

RKSimon added inline comments.

================
Comment at: llvm/test/CodeGen/X86/sse2.ll:689
 ; X64-SSE:       # %bb.0:
-; X64-SSE-NEXT:    movq %xmm0, %rax
-; X64-SSE-NEXT:    movabsq $-4294967296, %rcx # imm = 0xFFFFFFFF00000000
-; X64-SSE-NEXT:    andq %rax, %rcx
-; X64-SSE-NEXT:    movq %rcx, %xmm1
-; X64-SSE-NEXT:    movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
+; X64-SSE-NEXT:    andps {{.*}}(%rip), %xmm0
 ; X64-SSE-NEXT:    retq
----------------
pengfei wrote:
> Should `pand` be better for integer vector?
it would use pand if there was any int domain instructions around (if it fed a shift etc.) - but otherwise the default is to use the ps domain if possible.

================
Comment at: llvm/test/CodeGen/X86/vec_saddo.ll:816
 ; SSE-NEXT:    pcmpgtd %xmm2, %xmm4
-; SSE-NEXT:    pshufd {{.*#+}} xmm5 = xmm4[0,0,2,2]
 ; SSE-NEXT:    pcmpeqd %xmm3, %xmm2
----------------
pengfei wrote:
> I haven't figured the math here out yet, but the removal of this doesn't look correct to me.
> The previous math is `xmm4[0] & xmm2[1] | xmm4[1]`, but we are changing it to `xmm4[1] & xmm2[1] | xmm4[1]`.
I think this is SimplifyDemandedElts (called from one of the shuffles below) being able to remove the shuffle now that it can see everything as <4 x i32> - but I haven't confirmed that yet.

================
Comment at: llvm/test/CodeGen/X86/vector-reduce-and.ll:560
 ; AVX1-NEXT:    vpermilps {{.*#+}} xmm1 = xmm0[1,1,1,1]
+; AVX1-NEXT:    vandps %xmm0, %xmm1, %xmm2
+; AVX1-NEXT:    vpsrld $16, %xmm2, %xmm2
----------------
pengfei wrote:
> Is this still regression?
Yes - I'm going to have a look at this

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96206/new/

https://reviews.llvm.org/D96206