[PATCH] D9804: Optimize scattered vector insert/extract pattern

Lawrence Hu lawrence at codeaurora.org
Tue Jul 7 16:28:57 PDT 2015


Hi, Nadav:

Very sorry to get back to you so late.

I did more investigation on existing code,  for the following code example:

%1 = load i32, i32* %arrayidx1

  %conv1 = zext i32 %1 to i64
  %2 = load i32, i32* %arrayidx2
  %conv2 = zext i32 %2 to i64
  %x0 = insertelement <2 x i64> undef, i64 %conv1, i32 0
  %x1 = insertelement <2 x i64> %x0, i64 %conv2, i32 1
  ret <2 x i64> %x1

The existing logic will generate the following IRs (I have to by pass the cost function to get this ), which is not efficient, probably that's why the cost function doesn't allow it:

  %1 = load i32, i32* %arrayidx1
  %2 = load i32, i32* %arrayidx2
  %3 = insertelement <2 x i32> undef, i32 %1, i32 0
  %4 = insertelement <2 x i32> %3, i32 %2, i32 1
  %5 = zext <2 x i32> %4 to <2 x i64>
  %6 = extractelement <2 x i64> %5, i32 0
  %x0 = insertelement <2 x i64> undef, i64 %6, i32 0
  %7 = extractelement <2 x i64> %5, i32 1
  %x1 = insertelement <2 x i64> %x0, i64 %7, i32 1
  ret <2 x i64> %x1

However, the following IRs are more much efficient:

  %1 = load i32, i32* %arrayidx1
  %2 = load i32, i32* %arrayidx2

%3 = insertelement <2 x i32> undef, i32 %1, i32 0

  %4 = insertelement <2 x i32> %3, i32 %2, i32 1

%5 = zext <2 x i32> %4 to <2 x i64>

That's what our patches do.

Because our code is for this particular pattern, and it generate much more efficient code,  I would think keeping our code is a reasonable choice.

What do you think?

Regards

Lawrence Hu


Repository:
  rL LLVM

http://reviews.llvm.org/D9804







More information about the llvm-commits mailing list