[PATCH] D29449: [SLP] Generalization of vectorization of CmpInst operands, NFC.

Alexey Bataev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 7 01:00:24 PST 2017


ABataev added a comment.

In https://reviews.llvm.org/D29449#668395, @mkuper wrote:

> Sorry, I'm failing to communicate the example I have in mind.
>  Here it is, concretely:
>
>   declare void @bar(i1)
>  
>   define void @foo(i32* %A, i32 %k, i32 %n) {
>     %idx0 = getelementptr inbounds i32, i32* %A, i64 0
>     %idx4 = getelementptr inbounds i32, i32* %A, i64 4
>     %load0 = load i32, i32* %idx0, align 8
>     %load4 = load i32, i32* %idx4, align 8
>     %mul0 = mul i32 %load0, %k
>     %mul4 = mul i32 %load4, %k
>     %res = add i32 %mul0, %mul4
>     %cmp = icmp eq i32 %res, %n
>     call void @bar(i1 %cmp)
>     ret void
>   }
>  
>
>
> With the current code, we get:
>  $ bin/opt -slp-vectorizer < ~/llvm/temp/cmpslp.ll -S -o - -slp-threshold=-10
>
>   declare void @bar(i1)
>  
>   define void @foo(i32* %A, i32 %k, i32 %n) {
>     %idx0 = getelementptr inbounds i32, i32* %A, i64 0
>     %idx4 = getelementptr inbounds i32, i32* %A, i64 4
>     %load0 = load i32, i32* %idx0, align 8
>     %load4 = load i32, i32* %idx4, align 8
>     %1 = insertelement <2 x i32> undef, i32 %k, i32 0
>     %2 = insertelement <2 x i32> %1, i32 %k, i32 1
>     %3 = insertelement <2 x i32> undef, i32 %load0, i32 0
>     %4 = insertelement <2 x i32> %3, i32 %load4, i32 1
>     %5 = mul <2 x i32> %2, %4
>     %6 = extractelement <2 x i32> %5, i32 0
>     %7 = extractelement <2 x i32> %5, i32 1
>     %res = add i32 %6, %7
>     %cmp = icmp eq i32 %res, %n
>     call void @bar(i1 %cmp)
>     ret void
>   }
>
>
> The new code will not be able to vectorize this.
>
> I agree with you that (a) what we do now is generally pretty bad, and (b) we handle this case more or less by accident.
>  But this patch is not NFC, and has the potential to regress this kind of cases.


Michael, I understand this.
What should I do then? Prepare a patch with vectorization of CallInst args and then prepare an NFC patch for CmpInst? Or you have something different in your mind?


https://reviews.llvm.org/D29449





More information about the llvm-commits mailing list