<div dir="ltr">Option (2) directly matches the capabilities of the shufflevector instruction in the LLVM IR. I have attached a patch that will allow -1 to become undef in the IR.<div><br></div><div>So</div><div><br></div><div>

__builtin_shufflevector( x, y, 0, 4, -1, 5 );<br></div><div><br></div><div>becomes</div><div><br></div><div>shufflevector <4 x float> %x, <4 x float> %y, <4 x i32> <i32 0, i32 4, i32 undef, i32 5></div>

</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Aug 2, 2013 at 6:15 PM, Katya Romanova <span dir="ltr"><<a href="mailto:katya_romanova@playstation.sony.com" target="_blank">katya_romanova@playstation.sony.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

Craig Topper <craig.topper@...> writes:<br>

<br>

><br>

><br>

> Ok so -1 isn't valid for indices, and i have even more questions about<br>

__builtin_shufflevector the more i look at it. See my message in cfe-dev.<br>

><br>

><br>

> On Thu, Jul 18, 2013 at 6:12 PM, Chandler Carruth<br>

<<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>> wrote:<br>

><br>

> On Thu, Jul 18, 2013 at 6:11 PM, Craig Topper<br>

<<a href="mailto:craig.topper@gmail.com">craig.topper@gmail.com</a>> wrote:<br>

><br>

><br>

><br>

><br>

><br>

><br>

> Would __builtin_shufflevector(__a, __a, 0, 1, -1, -1)  work?<br>

><br>

><br>

><br>

><br>

><br>

> Personally, I would prefer a defined way to produce an undef input in<br>

general... but if folks are worried about exposing such an interface, then<br>

sure, we could just allow the shuffle builtin itself to designate an "undef"<br>

input with goofy indices.<br>

><br>

>  <br>

><br>

><br>

><br>

><br>

> On Thu, Jul 18, 2013 at 5:42 PM, Chandler Carruth<br>

<<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>> wrote:<br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

> On Thu, Jul 18, 2013 at 5:32 PM, Katya Romanova<br>

<<a href="mailto:Katya_Romanova@playstation.sony.com">Katya_Romanova@playstation.sony.com</a>> wrote:-<br>

 __m128d __zero = _mm_setzero_pd();<br>

> -  return __builtin_shufflevector(__a, __zero, 0, 1, 2, 2);<br>

> +  return (__m256d)__builtin_ia32_pd256_pd((__v2df)__a);<br>

><br>

><br>

> I think this is the wrong approach.<br>

><br>

> Rather than switching these to use an x86-specific builtin, instead it<br>

would be better to provide some generic form to produce an undef input to a<br>

shufflevector. That is a generally useful and completely target independent<br>

concept.<br>

><br>

><br>

><br>

><br>

><br>

> _______________________________________________<br>

> cfe-commits mailing listcfe-commits <at><br>

cs.uiuc.eduhttp://<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits" target="_blank">lists.cs.uiuc.edu/mailman/listinfo/cfe-commits</a><br>

><br>

><br>

> -- ~Craig<br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

> -- ~Craig<br>

><br>

><br>

><br>

> _______________________________________________<br>

> cfe-commits mailing list<br>

> cfe-commits@...<br>

> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits</a><br>

><br>

<br>

<br>

<br>

I agree with Chandler that it's better to use a shuffle with undef input<br>

(which is target independent), even though we generate code for AVX<br>

intrinsics. The reason I initially ended up using a x86-specific builtin is<br>

because there I couldn't find a generic way to create "undef" input for a<br>

shuffle.<br>

<br>

I tried the following, but I didn't like it, because the compiler gives a<br>

warning when compiling avxintrin.h<br>

<br>

static __inline __m256d __attribute__((__always_inline__, __nodebug__))<br>

_mm256_castpd128_pd256(__m128d in)<br>

{<br>

  __m128d undef;<br>

  return __builtin_shufflevector(in, undef, 0, 1, 2, 2);<br>

}<br>

<br>

I tried this as well and I didn't like it either:<br>

<br>

static __inline __m256d __attribute__((__always_inline__, __nodebug__))<br>

_mm256_castpd128_pd256(__m128d in)<br>

{<br>

  __v2df __in = (__v2df) in;<br>

  __v4df ret;<br>

  ret[0]=in[0];<br>

  ret[1]=in[1];<br>

  return (__m256d)ret;<br>

}<br>

<br>

So, I ended up introducing a x86_64 builtin and lowered it later to a<br>

shuffle with undef (not a target-independent solution).<br>

<br>

static __inline __m256d __attribute__((__always_inline__, __nodebug__))<br>

 _mm256_castpd128_pd256(__m128d __a)  {<br>

  return (__m256d)__builtin_ia32_pd256_pd((__v2df)__a);<br>

}<br>

<br>

<br>

I've read Craig's proposal about using shuffle builtin with negative indeces<br>

(-1) to indicate shuffle with undef. This solution looks good. However, "-1"<br>

shuffle index is presently considered invalid. We need to discuss extending<br>

shuffle syntax/semantics and then implement this extension before I could<br>

use a shuffle with negative indices for AVX typecast builtins. It looks like<br>

it will take some time...<br>

<br>

I was wondering if it's possible to check in my current fix that is using<br>

x86_86 builtins (instead of a shuffle) for AVX typecast intrinsics for now.<br>

When shuffle learns to understand negative indices, I could easily replaces<br>

my changes with something like that:<br>

<br>

__builtin_shufflevector(__a, __a, 0, 1, -1, -1)<br>

<br>

If this interim solution doesn’t sound inappropriate, we should start a<br>

discussion about extending shuffle builtin functionality to understand<br>

negative indexes.<br>

<br>

Here are several ideas:<br>

<br>

We could use "unary" form of __builtin_shufflevector when negative indices<br>

are used.<br>

A "binary" form could be used with negive indexes as well, but semantic<br>

analysis should ensure that the first and the second parameter is actually<br>

the same vector. Here is the reason for this limitation:<br>

<br>

If negative indices specify "undef" and a binary form of<br>

__builtin_shufflevector is used with different first and second parameter,<br>

e.g. __builtin_shufflevector(a, b, 0, 1, 7, -1)<br>

then, in fact, we will be shuffling 3 vectors (a, b and undef). I don’t<br>

think that it’s a good idea to extend __builtin_shufflevector semantic to do<br>

that.<br>

<br>

<br>

 Which solution is preferred?<br>

(1) Support negative indices for unary form of __builtin_shufflevector only.<br>

(2) Support negative indices for binary form of __builtin_shufflevector only<br>

and ensure that the first and the second parameter is the same vector.<br>

(3) Support both (1) and (2).<br>

(4) Another possible (though very different from proposed above) solution<br>

that allows to use "undef" in shuffles would be adding a target-independent<br>

builtin (e.g __builtin_undef(vector a)), which creates an “undef” vector<br>

with the same type and the same number of elements as its vector argument.<br>

With this "undef" builtin, I could implement AVX typecast builtins like that:<br>

<br>

static __inline __m256d __attribute__((__always_inline__, __nodebug__))<br>

_mm256_castpd128_pd256(__m128d in)<br>

{<br>

  __m128d undef = __builtin_undef(in);<br>

  return __builtin_shufflevector(in, undef, 0, 1, 2, 2);<br>

}<br>

<br>

Thoughts?<br>

<br>

<br>

Thank you!<br>

Katya.<br>

<br>

<br>

<br>

_______________________________________________<br>

cfe-commits mailing list<br>

<a href="mailto:cfe-commits@cs.uiuc.edu">cfe-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits</a><br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br>~Craig

</div>