[LLVMdev] Improving SLPVectorizer for Julia

Mon Mar 17 14:38:04 PDT 2014

I'm working on some small improvements to SLPVectorizer.cpp so that it can deal with some tuple operations arising from Julia code.  Being fairly new to LLVM, I could use some advice, particular from those familiar with the internals of SLPVectorizer.

The motivation can be found in the Julia discussion https://github.com/JuliaLang/julia/issues/5857 .  Here is an example of the kind of LLVM code I wish to vectorize.

-------------------------------------------------------------

define <4 x float> @julia_foo111(<4 x float>, <4 x float>) {

top:

  %2 = extractelement <4 x float> %0, i32 0

  %3 = extractelement <4 x float> %1, i32 0

  %4 = fadd float %2, %3

  %5 = insertelement <4 x float> undef, float %4, i32 0

  %6 = extractelement <4 x float> %0, i32 1

  %7 = extractelement <4 x float> %1, i32 1

  %8 = fadd float %6, %7

  %9 = insertelement <4 x float> %5, float %8, i32 1

  %10 = extractelement <4 x float> %0, i32 2

  %11 = extractelement <4 x float> %1, i32 2

  %12 = fadd float %10, %11

  %13 = insertelement <4 x float> %9, float %12, i32 2

  %14 = extractelement <4 x float> %0, i32 3

  %15 = extractelement <4 x float> %1, i32 3

  %16 = fadd float %14, %15

  %17 = insertelement <4 x float> %13, float %16, i32 3

  ret <4 x float> %17

}

-------------------------------------------------------------

I want the fadd instructions to be vectorized.  I've been able to implement most of what I need (see attached patch), but with a fatal flaw: the uses of the vectorized result are  not moved as necessary.  Here is the current (and quite illegal) result:

-------------------------------------------------------------

top:

  %2 = extractelement <4 x float> %8, i32 0

  %3 = insertelement <4 x float> undef, float %2, i32 0

  %4 = extractelement <4 x float> %8, i32 1

  %5 = insertelement <4 x float> %3, float %4, i32 1

  %6 = extractelement <4 x float> %8, i32 2

  %7 = insertelement <4 x float> %5, float %6, i32 2

  %8 = fadd <4 x float> %0, %1

  %9 = extractelement <4 x float> %8, i32 3

  %10 = insertelement <4 x float> %7, float %9, i32 3

  ret <4 x float> %10

-------------------------------------------------------------

Instructions %3, %A5 and %7 need to be moved to after Instructions %8.  I'm wondering what is a good way to do this.  The relevant place in SLPVectorizer.cpp is around here:

-------------------------------------------------------------

    if (Cost < -SLPCostThreshold) {

      DEBUG(dbgs() << "SLP: Vectorizing list at cost:" << Cost << ".\n");

      R.vectorizeTree();

      // Move to the next bundle.

      i += VF - 1;

      Changed = true;

    }

-------------------------------------------------------------

Should I try to move the instructions before or after the call to R.vectorizeTree()?  Or maybe do it even later after all bundles have been vectorized?  Are there LLVM utilities for doing this kind of fixup within a basic block?

Any pointers/advice appreciated.

- Arch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140317/c4612128/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SLPVectorizer.cpp.patch
Type: application/octet-stream
Size: 4777 bytes
Desc: SLPVectorizer.cpp.patch
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140317/c4612128/attachment.obj>