[PATCH] D34160: [Power9] Exploit vinserth instruction

Wed Jun 21 06:18:35 PDT 2017

nemanjai added a comment.

In https://reviews.llvm.org/D34160#781301, @inouehrs wrote:

> In https://reviews.llvm.org/D34160#781261, @gyiu wrote:
>
> > In https://reviews.llvm.org/D34160#779972, @inouehrs wrote:
> >
> > > This patch (potentially) increase the number of vector instructions (permutation -> shift + insert). Is my understanding correct?
> >
> >
> > Yep.  Though I think with a vperm you still need to load the mask into a vector register first, whereas with vshift + vinsert we're saving on the load.
>
>
> I feel we should not increase the number of vector instructions within a loop (i.e. a common case for vector code) if we can load the mask into a vector register before the loop.
>  In case without an additional shift, it is nice to do opt in a loop for freeing up one vector register.

Although I definitely agree that we should take steps to ensure we don't introduce further instructions in loops, I'm not sure that avoiding a 2-instruction sequence for a shuffle is necessarily the right thing to do. This statement is predicated on the fact that we can hoist the constant pool load out of a loop. If register pressure prevents this, we will have a load in the loop. Furthermore, if the loop is large enough and has other memory operations, it is conceivable that the constant pool load could be a cache miss on every iteration. And it is conceivable that such large loops will be the ones for which register pressure prevents the hoisting of the load. Furthermore, if the GPR register pressure is also high, we might not even be able to hoist the address calculations outside the loop, which would make the `vperm` sequence 3-4 instructions.
I think that at ISEL time, we should favour shorter instruction sequences that don't involve loads. And perhaps if we can show that multi-instruction permute sequences in loops appear enough in real code, we might want to have a loop pass that simplifies them into a load outside the loop with a `vperm` in the loop in general.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.h:1066
+    /// essentially any shuffle of v8i16 vectors that just inserts one element
+    /// from one vector into the other. This function will also set a couple of
+    /// output parameters for how much the source vector needs to be shifted and
----------------
This is probably a remnant of a previous implementation. Please rewrite the comment.

Repository:
  rL LLVM

https://reviews.llvm.org/D34160