[PATCH] D18593: [PowerPC] Front end improvements for vec_splat
Hal Finkel via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 4 16:19:12 PDT 2016
hfinkel added inline comments.
================
Comment at: lib/Headers/altivec.h:7726-7736
@@ -7725,7 +7725,13 @@
vec_splat(vector signed int __a, unsigned const int __b) {
- unsigned char b0 = (__b & 0x03) * 4;
- unsigned char b1 = b0 + 1, b2 = b0 + 2, b3 = b0 + 3;
- return vec_perm(__a, __a,
- (vector unsigned char)(b0, b1, b2, b3, b0, b1, b2, b3, b0, b1,
- b2, b3, b0, b1, b2, b3));
+ const unsigned __elem = __b & 0x3;
+ switch(__elem) {
+ case 0:
+ return __builtin_shufflevector(__a, __a, 0, 0, 0, 0);
+ case 1:
+ return __builtin_shufflevector(__a, __a, 1, 1, 1, 1);
+ case 2:
+ return __builtin_shufflevector(__a, __a, 2, 2, 2, 2);
+ case 3:
+ return __builtin_shufflevector(__a, __a, 3, 3, 3, 3);
+ }
}
----------------
echristo wrote:
> kbarton wrote:
> > nemanjai wrote:
> > > nemanjai wrote:
> > > > amehsan wrote:
> > > > > amehsan wrote:
> > > > > > Why do we need a switch case here (and in the ones below) instead of just returning __builtin_shufflevector(__a, __a, __elem, __elem, __elem, __elem)?
> > > > > Apparently phabricator has a problem with underlines....
> > > > The arguments to __builtin_shufflevector must be compile-time constants. These vec_splat functions will typically be called with compile-time constants anyway so at anything other than -O0, the unnecessary branches will be eliminated. And if they're called with non-constant arguments for the element index, the sequence is probably still better than building a mask vector and using vperm.
> > > >
> > > > Basically, at -O2 this form when the argument is variable will have a shift, compare, branch and xxspltw. The previous form has 16 stores (of each byte), a vector load (which includes a swap on LE) and a vperm.
> > > Yeah, when entering code in phabricator comments, you can do one of two things:
> > > 1. Put them in a code block:
> > > ```
> > > __builtin_shufflevector(__a, __a, 3, 3, 3, 3);
> > > ```
> > > 2. Use the monospaced tag around the text: `__builtin_shufflevector(__a, __a, 3, 3, 3, 3);`
> > Could we add a test case (or extend an existing test case) to ensure that these extra branches are cleaned up at -O2 and above?
> > The branches will hinder instruction scheduling (among other things) so we should ensure they are cleaned up.
> > I agree the extra branches at -O0 are not a concern.
> Another common workaround for these is to make the definitions macros and then you can use the mask as part of the macro arguments.
I agree; a macro can be used here.
Another option is to use __builtin_constant_p to choose between the switch statement (in the case where we know it gets optimized away), and a target-specific intrinsic for the cases we can't generically represent.
Repository:
rL LLVM
http://reviews.llvm.org/D18593
More information about the llvm-commits
mailing list