[PATCH] D29449: [SLP] Generalization of vectorization of CmpInst operands, NFC.

Mon Feb 6 14:49:54 PST 2017

mkuper added a comment.

Sorry, I'm failing to communicate the example I have in mind.
Here it is, concretely:

  declare void @bar(i1)

  define void @foo(i32* %A, i32 %k, i32 %n) {
    %idx0 = getelementptr inbounds i32, i32* %A, i64 0
    %idx4 = getelementptr inbounds i32, i32* %A, i64 4
    %load0 = load i32, i32* %idx0, align 8
    %load4 = load i32, i32* %idx4, align 8
    %mul0 = mul i32 %load0, %k
    %mul4 = mul i32 %load4, %k
    %res = add i32 %mul0, %mul4
    %cmp = icmp eq i32 %res, %n
    call void @bar(i1 %cmp)
    ret void
  }

With the current code, we get:
$ bin/opt -slp-vectorizer < ~/llvm/temp/cmpslp.ll -S -o - -slp-threshold=-10

  declare void @bar(i1)

  define void @foo(i32* %A, i32 %k, i32 %n) {
    %idx0 = getelementptr inbounds i32, i32* %A, i64 0
    %idx4 = getelementptr inbounds i32, i32* %A, i64 4
    %load0 = load i32, i32* %idx0, align 8
    %load4 = load i32, i32* %idx4, align 8
    %1 = insertelement <2 x i32> undef, i32 %k, i32 0
    %2 = insertelement <2 x i32> %1, i32 %k, i32 1
    %3 = insertelement <2 x i32> undef, i32 %load0, i32 0
    %4 = insertelement <2 x i32> %3, i32 %load4, i32 1
    %5 = mul <2 x i32> %2, %4
    %6 = extractelement <2 x i32> %5, i32 0
    %7 = extractelement <2 x i32> %5, i32 1
    %res = add i32 %6, %7
    %cmp = icmp eq i32 %res, %n
    call void @bar(i1 %cmp)
    ret void
  }

The new code will not be able to vectorize this.

I agree with you that (a) what we do now is generally pretty bad, and (b) we handle this case more or less by accident.
But this patch is not NFC, and has the potential to regress this kind of cases.

https://reviews.llvm.org/D29449