[PATCH] D14050: [X86][SSE] Shuffle blends with zero

Demikhovsky, Elena via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 28 10:54:29 PDT 2015


Hi Simon,

Your code  is fully correct. I just think that you miss some opportunities.

I'll take your example and change one element:

shufflevector <4 x float> %v, <4 x float><float 0.000000e+00, float undef, float undef, float 0.2>, <4 x i32> <i32 0, i32 4, i32 2, i32 7>

it is equal to the blend:

shufflevector <4 x float> %v, <4 x float><float 0.000000e+00, float 0.0, float undef, float 0.2>, <4 x i32> <i32 0, i32 5, i32 2, i32 7>

-  Elena


-----Original Message-----
From: Simon Pilgrim [mailto:llvm-dev at redking.me.uk] 
Sent: Wednesday, October 28, 2015 17:34
To: llvm-dev at redking.me.uk; spatel at rotateright.com; Andrea_DiBiagio at sn.scee.net; qcolombet at apple.com; Demikhovsky, Elena
Cc: llvm-commits at lists.llvm.org
Subject: Re: [PATCH] D14050: [X86][SSE] Shuffle blends with zero

RKSimon added a comment.

Both isBuildVectorAllZeros and computeZeroableShuffleElements() treats undef lanes as zeroable - so we have a problem when the shuffle mask wants an actual zero input but the lane that we'd need to blend from is actually UNDEF:

  shufflevector <4 x float> %v, <4 x float><float 0.000000e+00, float undef, float undef, float undef>, <4 x i32> <i32 0, i32 4, i32 2, i32 4>

which to use BLENDPS we'd need to convert to:

  shufflevector <4 x float> %v, <4 x float> <float undef, float 0.000000e+00, float undef, float 0.000000e+00>, <4 x i32> <i32 0, i32 5, i32 2, i32 7>

But its easier if we just set the whole input vector as zero (since we know its zeroable anyhow).

I'll add an extra test for this example.

Now it could be that we have cases where we have a BUILD_VECTOR input with zero/nonzero constants that could be matched up (possibly by creating a new BUILD_VECTOR with reordered constants suitable for blending) but it'll be a much more involved change and I haven't seen any real world code that would benefit from this yet, so I just focussed on the zeroing which I do have examples of.


Repository:
  rL LLVM

http://reviews.llvm.org/D14050



---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


More information about the llvm-commits mailing list