[PATCH] D63364: [x86] split 256-bit vector selects if operands are vector concats

Sat Jun 15 05:54:06 PDT 2019

RKSimon accepted this revision.
RKSimon added a comment.
This revision is now accepted and ready to land.

LGTM but there are a couple of cases that are bordering on regression that need investigating (llvm-mca comparisons, TODO comments, bug report, whatever).

@lebedev.ri The TTI costs try to include the extra costs of 256-bit integer vector ops for AVX1 but its often tricky to completely account for it - because the costs work on an individual instruction level many of the 'holistic' effects aren't considered at all. This is something that has made it difficult to make D46276 <https://reviews.llvm.org/D46276> actually useful - slightly better costs for individual instructions didn't help improve costs/codgen decisions for the entire sequence.

================
Comment at: llvm/test/CodeGen/X86/cast-vsel.ll:494
+; AVX1-NEXT:    vmovaps %xmm4, dj+4112(%rax)
+; AVX1-NEXT:    vmovaps %xmm5, dj+4096(%rax)
 ; AVX1-NEXT:    addq $32, %rax
----------------
This is a annoying - even though many AVX1 targets have 128-bit ALUs, we were avoiding xmm insertion/extraction completely which was the better option.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63364/new/

https://reviews.llvm.org/D63364