[PATCH] D34032: [Power9] Exploit vector extract with variable index

Mon Jun 12 15:03:21 PDT 2017

nemanjai added a comment.

In https://reviews.llvm.org/D34032#778271, @syzaara wrote:

> In https://reviews.llvm.org/D34032#776682, @nemanjai wrote:
>
> > I suspect that the total latency of an `LI, VEXTU[BH][LR]X` for extracting constant elements is probably less than the current set up when a shift in the vector element is required. We should probably use these new instructions for such extractions as well.
> >  When it comes to word extractions, I don't think it makes a difference, but halfword and byte ones are probably better off using the new instructions.
> >  I'm fine with that being a separate patch, but we shouldn't forget it.
>
>
> The LI was already being added when using an immediate value for the index. I added a new testcase to cover this case.

This is understandable. Of course, if you were to just add a pattern for all the possible element indices (like the current patterns), then the `LI` that you get won't need to be shifted/multiplied. I think we should probably do that.

================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:1909
+  def : Pat<(i64 (anyext (i32 (vector_extract v8i16:$S, i64:$Idx)))),
+            (VEXTUHRX (RLWINM8 $Idx, 1, 0, 30), $S)>;
+  def : Pat<(i64 (zext (i32 (vector_extract v4i32:$S, i64:$Idx)))),
----------------
I assumed this would be an `RLDICR` but this works just the same. In either case, I think the high-order bits should be cleared explicitly so the `RLWINM` should really have 1, 28, 30 as immediates (since the instruction takes its input in bits 60-63).

https://reviews.llvm.org/D34032