[PATCH] D12032: Vector element extraction without stack operations on Power 8
Bill Schmidt via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 18 13:28:51 PDT 2015
wschmidt added inline comments.
================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:1284
@@ +1283,3 @@
+ dag LE_BYTE_1 = (i32 (EXTRACT_SUBREG (RLDICL LE_DWORD_0, 56, 8), sub_32));
+ dag LE_BYTE_2 = (i32 (EXTRACT_SUBREG (RLDICL LE_DWORD_0, 48, 16), sub_32));
+ dag LE_BYTE_3 = (i32 (EXTRACT_SUBREG (RLDICL LE_DWORD_0, 40, 24), sub_32));
----------------
wschmidt wrote:
> This code concerns me, starting with LE_BYTE_2. I see from your test cases that it happens to work, but it looks fragile to me.
>
> If you have bytes 7 6 5 4 3 2 1 0 and apply RLDICL 48, 16, you will get
> X X X X X X 3 2 where the Xs have been cleared to zero. The correct answer is X X X X X X X 2.
>
> For RLDICL 40, 24, you will get X X X X X 5 4 3, which is incorrect for both byte and halfword extraction.
>
> The pattern you need is (RLDICL LE_DWORD_0 48, 8), followed by 40,8, etc. Then you need separate patterns for LE_HWORD that uses 16 instead of 8.
>
> Somehow in your tests you are ending up with extsb instructions that make this work, which I really don't understand based on the code you're specifying here (which just creates an i32, not an i8). I am concerned that those are an artifact that could disappear. Can you explain how those come to be generated?
Ah, I see why. Your tests are returning an i8, which forces the conversion from i32 to i8 that generates the extsb. Without that, the incorrect code generation would be exposed.
Repository:
rL LLVM
http://reviews.llvm.org/D12032
More information about the llvm-commits
mailing list