<div dir="ltr">I'll get to the rest of it, but for this Kit's question about O2: It's preferred to test such things in the back end.</div><br><div class="gmail_quote"><div dir="ltr">On Mon, Apr 4, 2016 at 10:51 AM Kit Barton <<a href="mailto:kbarton@ca.ibm.com">kbarton@ca.ibm.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">kbarton requested changes to this revision.<br>
This revision now requires changes to proceed.<br>
<br>
================<br>
Comment at: lib/Headers/altivec.h:7726-7736<br>
@@ -7725,7 +7725,13 @@<br>
vec_splat(vector signed int __a, unsigned const int __b) {<br>
- unsigned char b0 = (__b & 0x03) * 4;<br>
- unsigned char b1 = b0 + 1, b2 = b0 + 2, b3 = b0 + 3;<br>
- return vec_perm(__a, __a,<br>
- (vector unsigned char)(b0, b1, b2, b3, b0, b1, b2, b3, b0, b1,<br>
- b2, b3, b0, b1, b2, b3));<br>
+ const unsigned __elem = __b & 0x3;<br>
+ switch(__elem) {<br>
+ case 0:<br>
+ return __builtin_shufflevector(__a, __a, 0, 0, 0, 0);<br>
+ case 1:<br>
+ return __builtin_shufflevector(__a, __a, 1, 1, 1, 1);<br>
+ case 2:<br>
+ return __builtin_shufflevector(__a, __a, 2, 2, 2, 2);<br>
+ case 3:<br>
+ return __builtin_shufflevector(__a, __a, 3, 3, 3, 3);<br>
+ }<br>
}<br>
----------------<br>
nemanjai wrote:<br>
> nemanjai wrote:<br>
> > amehsan wrote:<br>
> > > amehsan wrote:<br>
> > > > Why do we need a switch case here (and in the ones below) instead of just returning __builtin_shufflevector(__a, __a, __elem, __elem, __elem, __elem)?<br>
> > > Apparently phabricator has a problem with underlines....<br>
> > The arguments to __builtin_shufflevector must be compile-time constants. These vec_splat functions will typically be called with compile-time constants anyway so at anything other than -O0, the unnecessary branches will be eliminated. And if they're called with non-constant arguments for the element index, the sequence is probably still better than building a mask vector and using vperm.<br>
> ><br>
> > Basically, at -O2 this form when the argument is variable will have a shift, compare, branch and xxspltw. The previous form has 16 stores (of each byte), a vector load (which includes a swap on LE) and a vperm.<br>
> Yeah, when entering code in phabricator comments, you can do one of two things:<br>
> 1. Put them in a code block:<br>
> ```<br>
> __builtin_shufflevector(__a, __a, 3, 3, 3, 3);<br>
> ```<br>
> 2. Use the monospaced tag around the text: `__builtin_shufflevector(__a, __a, 3, 3, 3, 3);`<br>
Could we add a test case (or extend an existing test case) to ensure that these extra branches are cleaned up at -O2 and above?<br>
The branches will hinder instruction scheduling (among other things) so we should ensure they are cleaned up.<br>
I agree the extra branches at -O0 are not a concern.<br>
<br>
<br>
Repository:<br>
rL LLVM<br>
<br>
<a href="http://reviews.llvm.org/D18593" rel="noreferrer" target="_blank">http://reviews.llvm.org/D18593</a><br>
<br>
<br>
<br>
</blockquote></div>