[LLVMdev] Improving SLPVectorizer for Julia

Arnold Schwaighofer aschwaighofer at apple.com
Mon Mar 17 16:54:35 PDT 2014


Hi Arch,

Thanks for looking at this.

The reason the SLPVectorizer bails out on many cases that seem vectorizable is scheduling. It needs to produce a legal schedule. The way it does this is by making sure that it can move all vectorized instructions to the last instruction in a bundle. (Alternatively, you could build a dag, make sure that you don’t create cycles and then produce a topological sort, but this was not done out of compile time concerns).


If I understand your patch correctly you are disabling the above mentioned check if the vectorizer starts at an insertelement instruction? What about other users? You still need to detect that you can schedule them correctly.


define <4 x float> @julia_foo111(<4 x float>, <4 x float>) {
top:
  %2 = extractelement <4 x float> %0, i32 0
  %3 = extractelement <4 x float> %1, i32 0
  %4 = fadd float %2, %3
  %5 = insertelement <4 x float> undef, float %4, i32 0
  %6 = extractelement <4 x float> %0, i32 1
  %7 = extractelement <4 x float> %1, i32 1
  %8 = fadd float %6, %7
 
  %foo = operation which has a use of %8 that potentially feeds %12 but even if not all of its users now need to be move below %16 and we need to check all their users recursively …

  %9 = insertelement <4 x float> %5, float %8, i32 1
  %10 = extractelement <4 x float> %0, i32 2
  %11 = extractelement <4 x float> %1, i32 2
  %12 = fadd float %10, %11
  %13 = insertelement <4 x float> %9, float %12, i32 2
  %14 = extractelement <4 x float> %0, i32 3
  %15 = extractelement <4 x float> %1, i32 3
  %16 = fadd float %14, %15
  %17 = insertelement <4 x float> %13, float %16, i32 3
  ret <4 x float> %17
}

For your case of insertelements that start a vector tree you would get away keeping a set of “insertelement” instructions of of which trytoVectorizeList below started of.

if (InsertElementInst *IE = dyn_cast<InsertElementInst>(it)) {
      SmallVector<Value *, 8> Ops;
      if (!findBuildVector(IE, Ops))
        continue;
      // add insert elements to InsertVectorRoot. you would need to make sure that all ‘other’ uses of those insert elements are below the last insert.
      if (tryToVectorizeList(Ops, R))

Instead of checking “buildsVector”. You could check this set.

      if (RdxOps && RdxOps->count(UI))
         continue;
 
+      // This user is part of building a vector
+      if (buildsVector) // use something like: if (InsertVectorRoot.count(UI)) instead.
+        continue;
+

And this set would also contain the instructions that need to be moved.

Alternatively, we could teach the slp vectorizer how to ‘vectorize’ insertelements and start the vectorization tree with the insertelements instead of its operands. Then it would naturally work (because in tree users are considered safe).

Best,
Arnold

On Mar 17, 2014, at 2:38 PM, Robison, Arch <arch.robison at intel.com> wrote:

> define <4 x float> @julia_foo111(<4 x float>, <4 x float>) {
> top:
>   %2 = extractelement <4 x float> %0, i32 0
>   %3 = extractelement <4 x float> %1, i32 0
>   %4 = fadd float %2, %3
>   %5 = insertelement <4 x float> undef, float %4, i32 0
>   %6 = extractelement <4 x float> %0, i32 1
>   %7 = extractelement <4 x float> %1, i32 1
>   %8 = fadd float %6, %7
>   %9 = insertelement <4 x float> %5, float %8, i32 1
>   %10 = extractelement <4 x float> %0, i32 2
>   %11 = extractelement <4 x float> %1, i32 2
>   %12 = fadd float %10, %11
>   %13 = insertelement <4 x float> %9, float %12, i32 2
>   %14 = extractelement <4 x float> %0, i32 3
>   %15 = extractelement <4 x float> %1, i32 3
>   %16 = fadd float %14, %15
>   %17 = insertelement <4 x float> %13, float %16, i32 3
>   ret <4 x float> %17
> }





More information about the llvm-dev mailing list