[PATCH] D29449: [SLP] Generalization of vectorization of CmpInst operands, NFC.
Michael Kuperstein via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 6 14:49:54 PST 2017
mkuper added a comment.
Sorry, I'm failing to communicate the example I have in mind.
Here it is, concretely:
declare void @bar(i1)
define void @foo(i32* %A, i32 %k, i32 %n) {
%idx0 = getelementptr inbounds i32, i32* %A, i64 0
%idx4 = getelementptr inbounds i32, i32* %A, i64 4
%load0 = load i32, i32* %idx0, align 8
%load4 = load i32, i32* %idx4, align 8
%mul0 = mul i32 %load0, %k
%mul4 = mul i32 %load4, %k
%res = add i32 %mul0, %mul4
%cmp = icmp eq i32 %res, %n
call void @bar(i1 %cmp)
ret void
}
With the current code, we get:
$ bin/opt -slp-vectorizer < ~/llvm/temp/cmpslp.ll -S -o - -slp-threshold=-10
declare void @bar(i1)
define void @foo(i32* %A, i32 %k, i32 %n) {
%idx0 = getelementptr inbounds i32, i32* %A, i64 0
%idx4 = getelementptr inbounds i32, i32* %A, i64 4
%load0 = load i32, i32* %idx0, align 8
%load4 = load i32, i32* %idx4, align 8
%1 = insertelement <2 x i32> undef, i32 %k, i32 0
%2 = insertelement <2 x i32> %1, i32 %k, i32 1
%3 = insertelement <2 x i32> undef, i32 %load0, i32 0
%4 = insertelement <2 x i32> %3, i32 %load4, i32 1
%5 = mul <2 x i32> %2, %4
%6 = extractelement <2 x i32> %5, i32 0
%7 = extractelement <2 x i32> %5, i32 1
%res = add i32 %6, %7
%cmp = icmp eq i32 %res, %n
call void @bar(i1 %cmp)
ret void
}
The new code will not be able to vectorize this.
I agree with you that (a) what we do now is generally pretty bad, and (b) we handle this case more or less by accident.
But this patch is not NFC, and has the potential to regress this kind of cases.
https://reviews.llvm.org/D29449
More information about the llvm-commits
mailing list