[PATCH] [SLPVectorizer] Reorder operands of shufflevector if it can result in a vectorized code.
Michael Zolotukhin
mzolotukhin at apple.com
Fri Jan 9 09:51:53 PST 2015
Hi Karthik,
While the loads widening is a real problem, the example I wrote in previous comment doesn't need any GVN invocation at all - you can run slp (+basicaa) on it and see that SLP fails to vectorize it. To make it even clearer, we can use `double` instead of `i32`:
1. **Vectorized**:
Original code:
double a[1000], b[1000], c[1000];
void foo()
{
c[0] = a[0] + b[0];
c[1] = a[1] + b[1];
}
IR:
define void @foo() #0 {
%1 = load double* getelementptr inbounds ([1000 x double]* @a, i32 0, i64 0), align 4
%2 = load double* getelementptr inbounds ([1000 x double]* @b, i32 0, i64 0), align 4
%3 = fadd double %1, %2
store double %3, double* getelementptr inbounds ([1000 x double]* @c, i32 0, i64 0), align 4
%4 = load double* getelementptr inbounds ([1000 x double]* @a, i32 0, i64 1), align 4
%5 = load double* getelementptr inbounds ([1000 x double]* @b, i32 0, i64 1), align 4
%6 = fadd double %4, %5
store double %6, double* getelementptr inbounds ([1000 x double]* @c, i32 0, i64 1), align 4
ret void
}
IR after SLP:
define void @foo() #0 {
%1 = load <2 x double>* bitcast ([1000 x double]* @a to <2 x double>*), align 16, !tbaa !2
%2 = load <2 x double>* bitcast ([1000 x double]* @b to <2 x double>*), align 16, !tbaa !2
%3 = fadd <2 x double> %1, %2
store <2 x double> %3, <2 x double>* bitcast ([1000 x double]* @c to <2 x double>*), align 16, !tbaa !2
ret void
}
2. **Not vectorized**:
Original code:
double a[1000], b[1000], c[1000];
void foo()
{
c[0] = a[0] + b[0];
c[1] = b[1] + a[1]; // a[1] and b[1] are swapped
}
IR:
define void @foo() #0 {
%1 = load double* getelementptr inbounds ([1000 x double]* @a, i32 0, i64 0), align 4
%2 = load double* getelementptr inbounds ([1000 x double]* @b, i32 0, i64 0), align 4
%3 = fadd double %1, %2
store double %3, double* getelementptr inbounds ([1000 x double]* @c, i32 0, i64 0), align 4
%4 = load double* getelementptr inbounds ([1000 x double]* @a, i32 0, i64 1), align 4
%5 = load double* getelementptr inbounds ([1000 x double]* @b, i32 0, i64 1), align 4
%6 = fadd double %5, %4 ; %4 and %5 are swapped
store double %6, double* getelementptr inbounds ([1000 x double]* @c, i32 0, i64 1), align 4
ret void
}
IR after SLP:
define void @foo() #0 {
%1 = load double* getelementptr inbounds ([1000 x double]* @a, i64 0, i64 0), align 16, !tbaa !2
%2 = load double* getelementptr inbounds ([1000 x double]* @b, i64 0, i64 0), align 16, !tbaa !2
%3 = load double* getelementptr inbounds ([1000 x double]* @b, i64 0, i64 1), align 8, !tbaa !2
%4 = load double* getelementptr inbounds ([1000 x double]* @a, i64 0, i64 1), align 8, !tbaa !2
%5 = insertelement <2 x double> undef, double %1, i32 0
%6 = insertelement <2 x double> %5, double %3, i32 1
%7 = insertelement <2 x double> undef, double %2, i32 0
%8 = insertelement <2 x double> %7, double %4, i32 1
%9 = fadd <2 x double> %6, %8
store <2 x double> %9, <2 x double>* bitcast ([1000 x double]* @c to <2 x double>*), align 16, !tbaa !2
ret void
}
REPOSITORY
rL LLVM
http://reviews.llvm.org/D6677
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list