[PATCH] D34160: [Power9] Exploit vinserth instruction

Mon Jun 26 08:07:06 PDT 2017

gyiu added a comment.

- I'll open a separate item to address Nemenja's comments as I will not get a chance to do another enchancement.

> I don't really see why. Assume that you have something like this:
> 
> vector unsigned short test(vector unsigned short a) {
> 
>   a[5] = a[2];
> 
> }
> 
> I don't see why we can't codegen something like this for it:
> 
> vsldoi 3, 2, 2, 4
> vinserth 2, 3, 4
> 
> Forgive me if I didn't work out the immediates exactly correctly, but the point is the [lack of] need for the XXLOR. Of course, this does use an extra register, but so does the alternative (vperm).

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:7910
+      // to our expected order
+      if (MaskOneElt == VINSERTHSrcElem &&
+          (Mask & MaskOtherElts) == (TargetOrder & MaskOtherElts)) {
----------------
nemanjai wrote:
> gyiu wrote:
> > nemanjai wrote:
> > > gyiu wrote:
> > > > nemanjai wrote:
> > > > > So we don't want to shift if we're within the same register? Is there a specific reason for this?
> > > > I believe we have to add a xxlor to another VR if we want to shift the vector since we can't shift if both operands of the vector shuffle are the same vector.  Adding another two cycles to VECSHL+VECINSERT seems diminish its value versus load+vperm.
> > > I don't really see why. Assume that you have something like this:
> > > 
> > > ```
> > > vector unsigned short test(vector unsigned short a) {
> > >   a[5] = a[2];
> > > }
> > > ```
> > > I don't see why we can't codegen something like this for it:
> > > ```
> > > vsldoi 3, 2, 2, 4
> > > vinserth 2, 3, 4
> > > ```
> > > Forgive me if I didn't work out the immediates exactly correctly, but the point is the [lack of] need for the XXLOR. Of course, this does use an extra register, but so does the alternative (vperm).
> > Hmmm... Yep, you're right.  I guess I can simplify my code even further now.  I think this also means I have to fix up the code for the original xxinsertw lowering in a separate patch.
> Yes, as @echristo mentioned, you should do all the renaming of things in a separate patch that doesn't really require a review. You're just renaming stuff.
Actually, I'm not quite sure what you mean here.  The original code for xxinsertw has the limitation of only being able to insert element 3 if both input vectors to the vector_shuffle are the same.  I'll need to change that in a separate patch.  I'm not sure where the 'renaming of things' comes into play?

Repository:
  rL LLVM

https://reviews.llvm.org/D34160