[PATCH] D63364: [x86] split 256-bit vector selects if operands are vector concats

Sat Jun 15 07:14:50 PDT 2019

spatel marked an inline comment as done.
spatel added a comment.

In D63364#1544585 <https://reviews.llvm.org/D63364#1544585>, @lebedev.ri wrote:

> Looks ok.
>  Is there some costmodel here, or do we always (well, when we see concatenation, we don't seem to introduce it
>  intentionally) want to do this, in the hope that two smaller ops are always at least as good as one wider op?

I'm expecting the existing concatenation in the match to arise from AVX1 legalization. So in the worst case, we're removing those 2 concats but adding a concat of the condition operand and the blend results.
As Simon mentioned, this is bordering on a heuristic decision. The part of this that we really have no way to model is the frequency throttling that can occur with wider vector ops - that gets eliminated by using 128-bit (xmm) ops.

================
Comment at: llvm/test/CodeGen/X86/cast-vsel.ll:494
+; AVX1-NEXT:    vmovaps %xmm4, dj+4112(%rax)
+; AVX1-NEXT:    vmovaps %xmm5, dj+4096(%rax)
 ; AVX1-NEXT:    addq $32, %rax
----------------
RKSimon wrote:
> This is a annoying - even though many AVX1 targets have 128-bit ALUs, we were avoiding xmm insertion/extraction completely which was the better option.
Agreed - we could limit the transform based on type of the select condition and/or whether it is extracted in addition to the true/false operands. I'll put a TODO here.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63364/new/

https://reviews.llvm.org/D63364