[PATCH] D9804: Optimize scattered vector insert/extract pattern

Tue Jul 7 16:28:57 PDT 2015

Hi, Nadav:

Very sorry to get back to you so late.

I did more investigation on existing code,  for the following code example:

%1 = load i32, i32* %arrayidx1

  %conv1 = zext i32 %1 to i64
  %2 = load i32, i32* %arrayidx2
  %conv2 = zext i32 %2 to i64
  %x0 = insertelement <2 x i64> undef, i64 %conv1, i32 0
  %x1 = insertelement <2 x i64> %x0, i64 %conv2, i32 1
  ret <2 x i64> %x1

The existing logic will generate the following IRs (I have to by pass the cost function to get this ), which is not efficient, probably that's why the cost function doesn't allow it:

  %1 = load i32, i32* %arrayidx1
  %2 = load i32, i32* %arrayidx2
  %3 = insertelement <2 x i32> undef, i32 %1, i32 0
  %4 = insertelement <2 x i32> %3, i32 %2, i32 1
  %5 = zext <2 x i32> %4 to <2 x i64>
  %6 = extractelement <2 x i64> %5, i32 0
  %x0 = insertelement <2 x i64> undef, i64 %6, i32 0
  %7 = extractelement <2 x i64> %5, i32 1
  %x1 = insertelement <2 x i64> %x0, i64 %7, i32 1
  ret <2 x i64> %x1

However, the following IRs are more much efficient:

  %1 = load i32, i32* %arrayidx1
  %2 = load i32, i32* %arrayidx2

%3 = insertelement <2 x i32> undef, i32 %1, i32 0

  %4 = insertelement <2 x i32> %3, i32 %2, i32 1

%5 = zext <2 x i32> %4 to <2 x i64>

That's what our patches do.

Because our code is for this particular pattern, and it generate much more efficient code,  I would think keeping our code is a reasonable choice.

What do you think?

Regards

Lawrence Hu

Repository:
  rL LLVM

http://reviews.llvm.org/D9804