[PATCH] D12032: Vector element extraction without stack operations on Power 8

Tue Aug 18 18:11:03 PDT 2015

wschmidt added inline comments.

================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:1309
@@ +1308,3 @@
+  // LE variable byte
+  dag LE_VBYTE_PERM_VEC = (LVSL ZERO8, (ANDC8 (LI8 8), $Idx));
+  dag LE_VBYTE_PERMUTE = (VPERM $S, $S, LE_VBYTE_PERM_VEC);
----------------
wschmidt wrote:
> nemanjai wrote:
> > wschmidt wrote:
> > > This looks wrong.  If $Idx = 19, (ANDC8 (LI8 8), $Idx) will AND 0...001000 with 1...101100, giving 0...001000 = 8.  Basically this expression will always produce either 0 or 8, right?  That is surely not what you want.  This same issue seems repeated several places below, so I've stopped trying to figure these expressions out for now.
> > > 
> > > Have you done any execution testing to check that the code you're generating works?  I'm skeptical that it can.
> > I may be missing something here, but always getting a 0 or 8 is precisely what I was after. Basically, if the element is on the left half of the VSR, I don't need to shift it. If it is on the right side of the VSR, I do need to shift it and I need to shift it by EXACTLY 8 bytes. This applies on both LE and BE except that the element numbers that are on the left vs. right half of the VSR are exact opposites.
> > For LE, the bytes 0-7 need to be shifted before the MFVSR (so the expression you mentioned returns 8).
> > 
> > Subsequent to this shift, what is then on the left side of the VSR gets moved out to a GPR.
> > Now that I have the right half of the VSR in the GPR, I need to shift to the right by the correct number of bytes (0-7). So element zero, needs not shift at this point, whereas elements 1-7 need to be shifted by the equal number of bytes. Similarly, element 8 needs not shift, whereas elements 9-15 need to be shifted by $Idx & 7 bytes. Of course, the shift amount I need to specify to SRD is in bits, hence the reason for the left shift of 3.
> > 
> > Here's the complete shift sequence for LE (bytes):
> > 
> > ```
> > Elem    LShift in VSR    RShift in GPR (bytes)
> > 0       8                0
> > 1       8                1
> > 2       8                2
> > 3       8                3
> > 4       8                4
> > 5       8                5
> > 6       8                6
> > 7       8                7
> > 8       0                0
> > 9       0                1
> > 10      0                2
> > 11      0                3
> > 12      0                4
> > 13      0                5
> > 14      0                6
> > 15      0                7
> > ```
> > I have worked out similar sequences for the halfwords and for both on BE.
> > Also, I have an execution test case that prints each element of each size vector both as constant element values and as variables. In my testing, the test case was compiled with clang and with gcc and the results compared with diff. I have run this on P8 systems, both LE and BE.
> OK.  Again, this needs to be better documented.  Your commentary for the LShift portion is:
> "What we do is set up the index by masking off bits we don't need and shifting accordingly."  You're not actually masking off any bits for this case, which is what confused me.  Based on that commentary, this didn't appear to fulfill your intent.  I note that your subsequent explanation to me doesn't have any mention of masking off bits either.
> 
> So please document more clearly what is going on with these patterns.  Too much commentary is never a crime...an expression like
> 
>   dag BE_VBYTE_SHIFT = (EXTRACT_SUBREG (RLDICR (ANDC8 (LI8 7), $Idx), 3, 60),
>                                        sub_32);
> 
> without its own line of commentary is kind of a travesty.
> 
> If you don't mind, I would like to wait for a more heavily documented version before fully reviewing the variable extracts.
(I guess that converting to 0 or 8 is "masking off bits," but that really isn't very descriptive of what you're doing.  You're just isolating the containing doubleword.)

Repository:
  rL LLVM

http://reviews.llvm.org/D12032