[PATCH] D14050: [X86][SSE] Shuffle blends with zero

Simon Pilgrim via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 28 08:33:49 PDT 2015


RKSimon added a comment.

Both isBuildVectorAllZeros and computeZeroableShuffleElements() treats undef lanes as zeroable - so we have a problem when the shuffle mask wants an actual zero input but the lane that we'd need to blend from is actually UNDEF:

  shufflevector <4 x float> %v, <4 x float><float 0.000000e+00, float undef, float undef, float undef>, <4 x i32> <i32 0, i32 4, i32 2, i32 4>

which to use BLENDPS we'd need to convert to:

  shufflevector <4 x float> %v, <4 x float> <float undef, float 0.000000e+00, float undef, float 0.000000e+00>, <4 x i32> <i32 0, i32 5, i32 2, i32 7>

But its easier if we just set the whole input vector as zero (since we know its zeroable anyhow).

I'll add an extra test for this example.

Now it could be that we have cases where we have a BUILD_VECTOR input with zero/nonzero constants that could be matched up (possibly by creating a new BUILD_VECTOR with reordered constants suitable for blending) but it'll be a much more involved change and I haven't seen any real world code that would benefit from this yet, so I just focussed on the zeroing which I do have examples of.


Repository:
  rL LLVM

http://reviews.llvm.org/D14050





More information about the llvm-commits mailing list