[PATCH] [SLPVectorizer] Reorder operands of shufflevector if it can result in a vectorized code.

Fri Jan 9 00:15:37 PST 2015

Hi Michael,
Thanks for the inputs. Please find my comments inline.
Thanks
Karthik Bhat


REPOSITORY
  rL LLVM

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:442-444
@@ -441,2 +441,5 @@
 
+  void reorderAltShuffleOperands(ArrayRef<Value *> VL,
+                                 SmallVectorImpl<Value *> &Left,
+                                 SmallVectorImpl<Value *> &Right);
   /// \brief Perform LICM and CSE on the newly generated gather sequences.
----------------
mzolotukhin wrote:
> I don't think `reorderInputsAccordingToOpcode` currently handle it. I.e. it can accidentally handle it in some cases, but it doesn't do that always. For example the following code doesn't get vectorized:
> 
> ```
> define void @foo() #0 {
>   %1 = load i32* getelementptr inbounds ([1000 x i32]* @a, i32 0, i64 0), align 4
>   %2 = load i32* getelementptr inbounds ([1000 x i32]* @b, i32 0, i64 0), align 4
>   %3 = add nsw i32 %1, %2
>   store i32 %3, i32* getelementptr inbounds ([1000 x i32]* @c, i32 0, i64 0), align 4
>   %4 = load i32* getelementptr inbounds ([1000 x i32]* @a, i32 0, i64 1), align 4
>   %5 = load i32* getelementptr inbounds ([1000 x i32]* @b, i32 0, i64 1), align 4  
> 
>   ; Please note that %4 and %5 are swapped in the following line:
>   %6 = add nsw i32 %5, %4
> 
>   store i32 %6, i32* getelementptr inbounds ([1000 x i32]* @c, i32 0, i64 1), align 4
>   %7 = load i32* getelementptr inbounds ([1000 x i32]* @a, i32 0, i64 2), align 4
>   %8 = load i32* getelementptr inbounds ([1000 x i32]* @b, i32 0, i64 2), align 4
>   %9 = add nsw i32 %7, %8
>   store i32 %9, i32* getelementptr inbounds ([1000 x i32]* @c, i32 0, i64 2), align 4
>   %10 = load i32* getelementptr inbounds ([1000 x i32]* @a, i32 0, i64 3), align 4
>   %11 = load i32* getelementptr inbounds ([1000 x i32]* @b, i32 0, i64 3), align 4
>   %12 = add nsw i32 %10, %11
>   store i32 %12, i32* getelementptr inbounds ([1000 x i32]* @c, i32 0, i64 3), align 4
>   ret void
> }
> ```
> It might make sense to handle such cases explicitly, like you do for altShuffles.
Hi Michael,
Thanks for the inputs. I feel the reason the above code doesn't get vectorized is because on 64 bit machine the GVN pass combines the 2 32 bit load into a 64 bit load as a result the pattern match in SLP fails. You can reffer to D6654 for the same. If we run GVN pass after SLPVectorizer the above code gets vectorized.
SLP vectorizer not being able to vectorize widned load is a seperate issue and i plan to work on it shortly.

I will also try to see if i can see any example were we need to handle this case in reorderInputsAccordingToOpcode.

Actually I tried to add this pattern matching at the end of reorderInputsAccordingToOpcode but it results in few regressions in "operandorder.ll" by reordering "good" source order . I'm trying to debug this seperatly as well.

http://reviews.llvm.org/D6677

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/