[PATCH] D12032: Vector element extraction without stack operations on Power 8

Nemanja Ivanovic via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 18 16:32:36 PDT 2015


nemanjai added inline comments.

================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:1284
@@ +1283,3 @@
+  dag LE_BYTE_1 = (i32 (EXTRACT_SUBREG (RLDICL LE_DWORD_0, 56, 8), sub_32));
+  dag LE_BYTE_2 = (i32 (EXTRACT_SUBREG (RLDICL LE_DWORD_0, 48, 16), sub_32));
+  dag LE_BYTE_3 = (i32 (EXTRACT_SUBREG (RLDICL LE_DWORD_0, 40, 24), sub_32));
----------------
nemanjai wrote:
> wschmidt wrote:
> > wschmidt wrote:
> > > This code concerns me, starting with LE_BYTE_2.  I see from your test cases that it happens to work, but it looks fragile to me.
> > > 
> > > If you have bytes 7 6 5 4 3 2 1 0 and apply RLDICL 48, 16, you will get
> > > X X X X X X 3 2 where the Xs have been cleared to zero.  The correct answer is X X X X X X X 2.
> > > 
> > > For RLDICL 40, 24, you will get X X X X X 5 4 3, which is incorrect for both byte and halfword extraction.
> > > 
> > > The pattern you need is (RLDICL LE_DWORD_0 48, 8), followed by 40,8, etc.  Then you need separate patterns for LE_HWORD that uses 16 instead of 8.
> > > 
> > > Somehow in your tests you are ending up with extsb instructions that make this work, which I really don't understand based on the code you're specifying here (which just creates an i32, not an i8).  I am concerned that those are an artifact that could disappear.  Can you explain how those come to be generated?
> > Ah, I see why.  Your tests are returning an i8, which forces the conversion from i32 to i8 that generates the extsb.  Without that, the incorrect code generation would be exposed.
> I think this isn't clearly specified in the language reference, but as far as I can tell, the remainder of the wider register when a value is extracted from a vector is undefined and hence need not be cleared. Namely, when an i8 or i16 is extracted from a vector, it must be extended (sign or zero) since we have no such small registers. Also, I don't think this is necessarily restricted to vector extraction. However we happen to get an i8 or i16 into a register, the SDAG will have a sign_extend or a zero_extend.
> Even code like this:
> 
> ```
> signed char f(signed char c) {
>   return c;
> }
> ```
> 
> Will end up with the following SDAG nodes:
> 
> ```
> 0x1003ec39830: i8,ch = load 0x1003ec39708, 0x1003ec39390, 0x1003ec395e0<LD1[%c.addr]> [ORD=4]
> 0x1003ec3a8c0: i32 = sign_extend 0x1003ec39830 [ORD=5]
> ```
> 
> Similarly, with code like this:
> 
> ```
> signed char f(vector signed char c) {
>   return c[3];
> }
> ```
> 
> We end up with a very similar pair of SDAG nodes:
> 
> ```
> 0x10009e69b58: i8 = extract_vector_elt 0x10009e697e0, 0x10009e69a30 [ORD=5]
> 0x10009e69c80: i32 = sign_extend 0x10009e69b58 [ORD=6]
> ```
> And the sign_extend then becomes an extsb. Of course, this does not apply with fast-isel, but then again, neither does any of this patch since it is all based on the SDAG.
> I used signed char here just as an illustration since you mentioned the extsb instruction, but the equivalent zero_extend applies with the unsigned variants (in both cases).
> 
> So to summarize, I can add the clear instruction to each case, but I'm quite certain that it will be a reduntant instruction. It will also make the table gen logic more complex because I'd need a clear for the unsigned cases and a sign-extend for the signed cases.
This code:

```
int f(vector signed char c, int i) {
  return c[3];
}
```
Has the same extsb, which is then followed in this case by an extsw which is redundant, but understandable given the integral promotion.


Repository:
  rL LLVM

http://reviews.llvm.org/D12032





More information about the llvm-commits mailing list