[PATCH] D105390: [X86] Lower insertions into non-0'th 128-bit subvector as broadcast+blend (PR50971)

Sat Jul 3 06:16:55 PDT 2021

RKSimon added a comment.

I think the premise is sound, but creating variable shuffle/blend masks isn't great - its also uncovering a number of other poor codegen issues that need addressing.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:18966
+    // TODO: It is worthwhile to cast integer to floating point and back
+    // and incur a domain crossing penalty?
+    if (IdxVal * EltSizeInBits >= 128 &&
----------------
Yes - its very tricky to see the effect of a domain crossing penalty on targets capable of broadcasts, so casts are fine

================
Comment at: llvm/test/CodeGen/X86/avx512-insert-extract.ll:13
+; CHECK-NEXT:    vmovaps {{.*#+}} zmm0 = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,30,15]
+; CHECK-NEXT:    vpermi2ps %zmm1, %zmm2, %zmm0
 ; CHECK-NEXT:    retq
----------------
Is it really worth loading a variable shuffle mask?

================
Comment at: llvm/test/CodeGen/X86/insertelement-shuffle.ll:44
+; X64_AVX256-NEXT:    vmovq %xmm2, %rax
+; X64_AVX256-NEXT:    vmovq %rax, %xmm2
+; X64_AVX256-NEXT:    vpbroadcastq %xmm2, %ymm2
----------------
Any idea whats going on here?

================
Comment at: llvm/test/CodeGen/X86/masked_load.ll:5647
+; AVX2-NEXT:    vmovdqa {{.*#+}} ymm2 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]
+; AVX2-NEXT:    vpblendvb %ymm2, %ymm1, %ymm0, %ymm1
 ; AVX2-NEXT:    testl $131072, %eax ## imm = 0x20000
----------------
This definitely looks like a regression

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105390/new/

https://reviews.llvm.org/D105390