[PATCH] D18593: [PowerPC] Front end improvements for vec_splat

Nemanja Ivanovic via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 1 09:57:08 PDT 2016


nemanjai added inline comments.

================
Comment at: lib/Headers/altivec.h:7726-7736
@@ -7725,7 +7725,13 @@
 vec_splat(vector signed int __a, unsigned const int __b) {
-  unsigned char b0 = (__b & 0x03) * 4;
-  unsigned char b1 = b0 + 1, b2 = b0 + 2, b3 = b0 + 3;
-  return vec_perm(__a, __a,
-                  (vector unsigned char)(b0, b1, b2, b3, b0, b1, b2, b3, b0, b1,
-                                         b2, b3, b0, b1, b2, b3));
+  const unsigned __elem = __b & 0x3;
+  switch(__elem) {
+  case 0:
+    return __builtin_shufflevector(__a, __a, 0, 0, 0, 0);
+  case 1:
+    return __builtin_shufflevector(__a, __a, 1, 1, 1, 1);
+  case 2:
+    return __builtin_shufflevector(__a, __a, 2, 2, 2, 2);
+  case 3:
+    return __builtin_shufflevector(__a, __a, 3, 3, 3, 3);
+  }
 }
----------------
amehsan wrote:
> amehsan wrote:
> > Why do we need a switch case here (and in the ones below) instead of just returning __builtin_shufflevector(__a, __a, __elem, __elem, __elem, __elem)? 
> Apparently phabricator has a problem with underlines....
The arguments to __builtin_shufflevector must be compile-time constants. These vec_splat functions will typically be called with compile-time constants anyway so at anything other than -O0, the unnecessary branches will be eliminated. And if they're called with non-constant arguments for the element index, the sequence is probably still better than building a mask vector and using vperm.

Basically, at -O2 this form when the argument is variable will have a shift, compare, branch and xxspltw. The previous form has 16 stores (of each byte), a vector load (which includes a swap on LE) and a vperm.

================
Comment at: lib/Headers/altivec.h:7726-7736
@@ -7725,7 +7725,13 @@
 vec_splat(vector signed int __a, unsigned const int __b) {
-  unsigned char b0 = (__b & 0x03) * 4;
-  unsigned char b1 = b0 + 1, b2 = b0 + 2, b3 = b0 + 3;
-  return vec_perm(__a, __a,
-                  (vector unsigned char)(b0, b1, b2, b3, b0, b1, b2, b3, b0, b1,
-                                         b2, b3, b0, b1, b2, b3));
+  const unsigned __elem = __b & 0x3;
+  switch(__elem) {
+  case 0:
+    return __builtin_shufflevector(__a, __a, 0, 0, 0, 0);
+  case 1:
+    return __builtin_shufflevector(__a, __a, 1, 1, 1, 1);
+  case 2:
+    return __builtin_shufflevector(__a, __a, 2, 2, 2, 2);
+  case 3:
+    return __builtin_shufflevector(__a, __a, 3, 3, 3, 3);
+  }
 }
----------------
nemanjai wrote:
> amehsan wrote:
> > amehsan wrote:
> > > Why do we need a switch case here (and in the ones below) instead of just returning __builtin_shufflevector(__a, __a, __elem, __elem, __elem, __elem)? 
> > Apparently phabricator has a problem with underlines....
> The arguments to __builtin_shufflevector must be compile-time constants. These vec_splat functions will typically be called with compile-time constants anyway so at anything other than -O0, the unnecessary branches will be eliminated. And if they're called with non-constant arguments for the element index, the sequence is probably still better than building a mask vector and using vperm.
> 
> Basically, at -O2 this form when the argument is variable will have a shift, compare, branch and xxspltw. The previous form has 16 stores (of each byte), a vector load (which includes a swap on LE) and a vperm.
Yeah, when entering code in phabricator comments, you can do one of two things:
1. Put them in a code block: 
```
__builtin_shufflevector(__a, __a, 3, 3, 3, 3);
```
2. Use the monospaced tag around the text: `__builtin_shufflevector(__a, __a, 3, 3, 3, 3);`


Repository:
  rL LLVM

http://reviews.llvm.org/D18593





More information about the llvm-commits mailing list