[LLVMdev] Efficient Pattern matching in Instruction Combine

Fri Aug 8 13:34:09 PDT 2014

Hi Duncan, David, Sean.

Thanks for your reply.

> It'd be interesting if you could find a design that also treated these
> the same:
>
>   (B ^ A) | ((A ^ B) ^ C) -> (A ^ B) | C
>   (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C
>   (B ^ A) | ((C ^ A) ^ B) -> (A ^ B) | C
>
> I.e., `^` is also associative.

Agree with Duncan on including associative operation too.

> Can we handle this by just having a canonical ordering? Or is that too
difficult to maintain through various instcombines?

Yes, its the easiest way to do that. If i am not wrong, what Sean is
suggesting is that if we get

something like

(B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C

and we have written pass for pattern

(A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C

we could just swap 'B' and 'A' before matching the pattern i.e. we do
canonical re-ordering
(Sean, Correct me if i am wrong on my understanding of what you meant by
canonical ordering).

I have seen in the "instcombine" module that many times std::swap was used
for the same.

But it becomes too much repetitive and somewhat irritating to keep doing
this for every pattern.
Shouldn't we do this in more generic way where we do not have to worry
about swapping operands?
(How to do it - i don't know for now. I looked at the matchers
implementation for modification/improvement,
but didn't find any clue on it).

> It seems to me that we could rejigger Reassociate to reduce the number of
possibilities that InstCombine could expect.

>
Agree with David that infact it should be reassociate pass which should
handle this.

But following is something interesting and rather worrying things i have
found :
(I have omitted unimportant code and highlighted important one in following
example)
*e.x. *

((~A & B) | A)  -->  (A | B) ; Code is implemented for this already

*C code :*

suyog at suyog-Inspiron-N5010:~$ cat 1.c
#include<stdio.h>
int cal(int a, int b) {
*return ((~a & b) | a);*
}

int main(){
int a, b;
scanf("%d %d", &a, &b);
return cal(a,b);
}

LLVM IR at O0 :

suyog at suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O0 -emit-llvm 1.c

; Function Attrs: nounwind

*define i32 @cal(i32 %a, i32 %b) #0 {  %1 = xor i32 %a, -1  %2 = and i32
%1, %b  %3 = or i32 %2, %a  ret i32 %3*
}

; Function Attrs: nounwind
define i32 @main() #0 {
  ..
  ..
  %4 = call i32 @cal(i32 %2, i32 %3)
  ret i32 %4
}

*LLVM IR at instcombine :*

suyog at suyog-Inspiron-N5010:~$ Open/rbuild/bin/opt -S -instcombine 1.ll

*; Function Attrs: nounwinddefine i32 @cal(i32 %a, i32 %b) #0 {  %1 = or
i32 %a, %b  ret i32 %1}*
..
..
..

which is OK as per expected transform.

*LLVM IR at reassociate and instcombine :*suyog at suyog-Inspiron-N5010:~$
Open/rbuild/bin/opt -S -reassociate -instcombine 2.ll

*; Function Attrs: nounwinddefine i32 @cal(i32 %a, i32 %b) #0 {  %1 = xor
i32 %a, -1  %2 = and i32 %b, %1  %3 = or i32 %2, %a  ret i32 %3}*

..
..

Reassociate converted (~a&b) to (b&~a) and instruction combine failed to
optimize.

As evident, sometimes, reassociate pass actually harms the optimization
opportunity !!

Some more interesting findings on how can this affect the final machine
code:

*Test case :*
1.c :

#include<stdio.h>
int cal(int a, int b) {
*return ((~a & b) | a);*
}

int main(){
int a, b;
scanf("%d %d", &a, &b);
return cal(a,b);
}

*X86 .s file with clang at O2 for above program :*

suyog at suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O2 1.c

main:                                   # @main
# BB#0:
    subl    $28, %esp
    leal    20(%esp), %eax
    movl    %eax, 8(%esp)
    leal    24(%esp), %eax
    movl    %eax, 4(%esp)
    movl    $.L.str, (%esp)
    calll    __isoc99_scanf
    movl    20(%esp), %eax
*    orl    24(%esp), %eax*
    addl    $28, %esp
    retl

As seen, optimization happened at IR level itself reflected in .s file.

*GCC output for the same:*

suyog at suyog-Inspiron-N5010:~$ gcc -S -O2 1.c

main:
.LFB23:
    .cfi_startproc
    pushl    %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    andl    $-16, %esp
    subl    $32, %esp
    leal    28(%esp), %eax
    movl    %eax, 8(%esp)
    leal    24(%esp), %eax
    movl    %eax, 4(%esp)
    movl    $.LC0, (%esp)
    call    __isoc99_scanf
    movl    24(%esp), %eax
*    orl    28(%esp), %eax*
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc

GCC also did the optimization.

Now we just *slightly flip* the test case :

*1.c Test case:*#include<stdio.h>
int cal(int a, int b) {
*return ((b & ~a) | a);*
}

int main(){
int a, b;
scanf("%d %d", &a, &b);
return cal(a,b);
}

*X86 .s file with clang at O2 for above program :*suyog at suyog-Inspiron-N5010:~$
Open/rbuild/bin/clang -S -O2 1.c

main:                                   # @main
# BB#0:
    subl    $28, %esp
    leal    20(%esp), %eax
    movl    %eax, 8(%esp)
    leal    24(%esp), %eax
    movl    %eax, 4(%esp)
    movl    $.L.str, (%esp)
    calll    __isoc99_scanf
    movl    24(%esp), %ecx
    movl    %ecx, %eax

*notl    %eax    andl    20(%esp), %eax    orl    %ecx, %eax*
    addl    $28, %esp
    retl

We lost the Optimization opportunity here !!

*GCC output for the same:*

suyog at suyog-Inspiron-N5010:~$ gcc -S -O2 1.c

main:
.LFB23:
    .cfi_startproc
    pushl    %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    andl    $-16, %esp
    subl    $32, %esp
    leal    28(%esp), %eax
    movl    %eax, 8(%esp)
    leal    24(%esp), %eax
    movl    %eax, 4(%esp)
    movl    $.LC0, (%esp)
    call    __isoc99_scanf
    movl    24(%esp), %eax
*    orl    28(%esp), %eax*
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc

GCC still optimized it !!

Clearly evident that llvm is losing because of naive approach of pattern
matching.

How can we improve this so that LLVM also recognizes
commutative/associative
pattern  efficiently with single 'matchers' call? I am not having any idea
so far.
I will think more on it. Any help would be really helpful.

Any suggestions/comments/rectification are most awaited :)

-- 
With regards,
Suyog Sarda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140809/418a1055/attachment.html>