[LLVMdev] Efficient Pattern matching in Instruction Combine
suyog sarda
sardask01 at gmail.com
Fri Aug 8 13:34:09 PDT 2014
Hi Duncan, David, Sean.
Thanks for your reply.
> It'd be interesting if you could find a design that also treated these
> the same:
>
> (B ^ A) | ((A ^ B) ^ C) -> (A ^ B) | C
> (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C
> (B ^ A) | ((C ^ A) ^ B) -> (A ^ B) | C
>
> I.e., `^` is also associative.
Agree with Duncan on including associative operation too.
> Can we handle this by just having a canonical ordering? Or is that too
difficult to maintain through various instcombines?
Yes, its the easiest way to do that. If i am not wrong, what Sean is
suggesting is that if we get
something like
(B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C
and we have written pass for pattern
(A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C
we could just swap 'B' and 'A' before matching the pattern i.e. we do
canonical re-ordering
(Sean, Correct me if i am wrong on my understanding of what you meant by
canonical ordering).
I have seen in the "instcombine" module that many times std::swap was used
for the same.
But it becomes too much repetitive and somewhat irritating to keep doing
this for every pattern.
Shouldn't we do this in more generic way where we do not have to worry
about swapping operands?
(How to do it - i don't know for now. I looked at the matchers
implementation for modification/improvement,
but didn't find any clue on it).
> It seems to me that we could rejigger Reassociate to reduce the number of
possibilities that InstCombine could expect.
>
Agree with David that infact it should be reassociate pass which should
handle this.
But following is something interesting and rather worrying things i have
found :
(I have omitted unimportant code and highlighted important one in following
example)
*e.x. *
((~A & B) | A) --> (A | B) ; Code is implemented for this already
*C code :*
suyog at suyog-Inspiron-N5010:~$ cat 1.c
#include<stdio.h>
int cal(int a, int b) {
*return ((~a & b) | a);*
}
int main(){
int a, b;
scanf("%d %d", &a, &b);
return cal(a,b);
}
LLVM IR at O0 :
suyog at suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O0 -emit-llvm 1.c
; Function Attrs: nounwind
*define i32 @cal(i32 %a, i32 %b) #0 { %1 = xor i32 %a, -1 %2 = and i32
%1, %b %3 = or i32 %2, %a ret i32 %3*
}
; Function Attrs: nounwind
define i32 @main() #0 {
..
..
%4 = call i32 @cal(i32 %2, i32 %3)
ret i32 %4
}
*LLVM IR at instcombine :*
suyog at suyog-Inspiron-N5010:~$ Open/rbuild/bin/opt -S -instcombine 1.ll
*; Function Attrs: nounwinddefine i32 @cal(i32 %a, i32 %b) #0 { %1 = or
i32 %a, %b ret i32 %1}*
..
..
..
which is OK as per expected transform.
*LLVM IR at reassociate and instcombine :*suyog at suyog-Inspiron-N5010:~$
Open/rbuild/bin/opt -S -reassociate -instcombine 2.ll
*; Function Attrs: nounwinddefine i32 @cal(i32 %a, i32 %b) #0 { %1 = xor
i32 %a, -1 %2 = and i32 %b, %1 %3 = or i32 %2, %a ret i32 %3}*
..
..
Reassociate converted (~a&b) to (b&~a) and instruction combine failed to
optimize.
As evident, sometimes, reassociate pass actually harms the optimization
opportunity !!
Some more interesting findings on how can this affect the final machine
code:
*Test case :*
1.c :
#include<stdio.h>
int cal(int a, int b) {
*return ((~a & b) | a);*
}
int main(){
int a, b;
scanf("%d %d", &a, &b);
return cal(a,b);
}
*X86 .s file with clang at O2 for above program :*
suyog at suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O2 1.c
main: # @main
# BB#0:
subl $28, %esp
leal 20(%esp), %eax
movl %eax, 8(%esp)
leal 24(%esp), %eax
movl %eax, 4(%esp)
movl $.L.str, (%esp)
calll __isoc99_scanf
movl 20(%esp), %eax
* orl 24(%esp), %eax*
addl $28, %esp
retl
As seen, optimization happened at IR level itself reflected in .s file.
*GCC output for the same:*
suyog at suyog-Inspiron-N5010:~$ gcc -S -O2 1.c
main:
.LFB23:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $32, %esp
leal 28(%esp), %eax
movl %eax, 8(%esp)
leal 24(%esp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call __isoc99_scanf
movl 24(%esp), %eax
* orl 28(%esp), %eax*
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
GCC also did the optimization.
Now we just *slightly flip* the test case :
*1.c Test case:*#include<stdio.h>
int cal(int a, int b) {
*return ((b & ~a) | a);*
}
int main(){
int a, b;
scanf("%d %d", &a, &b);
return cal(a,b);
}
*X86 .s file with clang at O2 for above program :*suyog at suyog-Inspiron-N5010:~$
Open/rbuild/bin/clang -S -O2 1.c
main: # @main
# BB#0:
subl $28, %esp
leal 20(%esp), %eax
movl %eax, 8(%esp)
leal 24(%esp), %eax
movl %eax, 4(%esp)
movl $.L.str, (%esp)
calll __isoc99_scanf
movl 24(%esp), %ecx
movl %ecx, %eax
*notl %eax andl 20(%esp), %eax orl %ecx, %eax*
addl $28, %esp
retl
We lost the Optimization opportunity here !!
*GCC output for the same:*
suyog at suyog-Inspiron-N5010:~$ gcc -S -O2 1.c
main:
.LFB23:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $32, %esp
leal 28(%esp), %eax
movl %eax, 8(%esp)
leal 24(%esp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call __isoc99_scanf
movl 24(%esp), %eax
* orl 28(%esp), %eax*
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
GCC still optimized it !!
Clearly evident that llvm is losing because of naive approach of pattern
matching.
How can we improve this so that LLVM also recognizes
commutative/associative
pattern efficiently with single 'matchers' call? I am not having any idea
so far.
I will think more on it. Any help would be really helpful.
Any suggestions/comments/rectification are most awaited :)
--
With regards,
Suyog Sarda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140809/418a1055/attachment.html>
More information about the llvm-dev
mailing list