[PATCH] D9804: Optimize scattered vector insert/extract pattern
Lawrence Hu
lawrence at codeaurora.org
Tue Jul 7 16:28:57 PDT 2015
Hi, Nadav:
Very sorry to get back to you so late.
I did more investigation on existing code, for the following code example:
%1 = load i32, i32* %arrayidx1
%conv1 = zext i32 %1 to i64
%2 = load i32, i32* %arrayidx2
%conv2 = zext i32 %2 to i64
%x0 = insertelement <2 x i64> undef, i64 %conv1, i32 0
%x1 = insertelement <2 x i64> %x0, i64 %conv2, i32 1
ret <2 x i64> %x1
The existing logic will generate the following IRs (I have to by pass the cost function to get this ), which is not efficient, probably that's why the cost function doesn't allow it:
%1 = load i32, i32* %arrayidx1
%2 = load i32, i32* %arrayidx2
%3 = insertelement <2 x i32> undef, i32 %1, i32 0
%4 = insertelement <2 x i32> %3, i32 %2, i32 1
%5 = zext <2 x i32> %4 to <2 x i64>
%6 = extractelement <2 x i64> %5, i32 0
%x0 = insertelement <2 x i64> undef, i64 %6, i32 0
%7 = extractelement <2 x i64> %5, i32 1
%x1 = insertelement <2 x i64> %x0, i64 %7, i32 1
ret <2 x i64> %x1
However, the following IRs are more much efficient:
%1 = load i32, i32* %arrayidx1
%2 = load i32, i32* %arrayidx2
%3 = insertelement <2 x i32> undef, i32 %1, i32 0
%4 = insertelement <2 x i32> %3, i32 %2, i32 1
%5 = zext <2 x i32> %4 to <2 x i64>
That's what our patches do.
Because our code is for this particular pattern, and it generate much more efficient code, I would think keeping our code is a reasonable choice.
What do you think?
Regards
Lawrence Hu
Repository:
rL LLVM
http://reviews.llvm.org/D9804
More information about the llvm-commits
mailing list