[PATCH] D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.
Jatin Bhateja via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 26 08:33:41 PDT 2017
jbhateja added inline comments.
================
Comment at: test/CodeGen/X86/lea-opt-csebb.ll:1
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+
----------------
lsaba wrote:
> RKSimon wrote:
> > jbhateja wrote:
> > > lsaba wrote:
> > > > can you please add a test case that covers scale >1 cases
> > >
> > > This commit if you see has two parts
> > > 1/ pattern matching based on addressing mode (which is limited currently).
> > > 2/ factoring of LEAs which is generic.
> > >
> > > Checking in incremental changes should be fine I guess.
> > >
> > > Generic pattern will need to be brought out of addessing mode based selection as I described in following link
> > > https://groups.google.com/forum/#!topic/llvm-dev/x2LDXpON500
> > >
> > > Please comment in the thread.
> > >
> > Please can you commit this test file to trunk with current codegen and update the patch to show the diff
> I am not sure i understand what you mean by "Generic pattern will need to be brought out of addessing mode" , as far as i understand, for the following C code:
>
> int foo(int a, int b) {
> int x = a + 2*b + 4;
> int y = a + 4*b + 4;
> int c = x*y ;
> return c;
> }
>
> the currently generated IR:
> define i32 @foo(i32 %a, i32 %b) local_unnamed_addr #0 {
> entry:
> %mul = shl i32 %b, 1
> %add = add i32 %a, 4
> %add1 = add i32 %add, %mul
> %mul2 = shl i32 %b, 2
> %add4 = add i32 %add, %mul2
> %mul5 = mul nsw i32 %add1, %add4
> ret i32 %mul5
> }
>
>
> the currently generated asm:
>
> leal 4(%rdi,%rsi,2), %ecx
> leal 4(%rdi,%rsi,4), %eax
> imull %ecx, %eax
> retq
>
> this will be refactored by this optimization in this current commit (not a future commit) to:
> leal 4(%rdi,%rsi,2), %ecx
> leal (%ecx,%rsi,2), %eax
> imull %ecx, %eax
> retq
>
>
> please correct me if im wrong
>
>
Hi Lama,
By generic patten handling I meant LEA folding into complex LEAs which is currently restrictive.
Consider following case
%struct.SA = type { i32 , i32 , i32 , i32 , i32};
define void @foo(%struct.SA* nocapture %ctx, i32 %n) local_unnamed_addr #0 {
entry:
%h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 0
%0 = load i32, i32* %h0, align 8
%h3 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 3
%h4 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 4
%1 = load i32, i32* %h4, align 8
%add = add i32 %0, 1
%add4 = add i32 %add, %1
%add5 = add i32 %add4, %1
store i32 %add5, i32* %h3, align 4
%add10 = add i32 %add5, %1
%add29 = add i32 %add10, %1
store i32 %add29, i32* %h4, align 8
ret void
}
ASM :
foo: # @foo
.cfi_startproc
# BB#0: # %entry
movl (%rdi), %eax
movl 16(%rdi), %ecx
leal (%rax,%rcx,2), %edx
leal 1(%rax,%rcx,2), %eax
movl %eax, 12(%rdi)
leal 1(%rdx,%rcx,2), %eax
movl %eax, 16(%rdi)
It could be further optimized to following:
movl (%rdi), %eax
movl 16(%rdi), %ecx
leal 1(%rax,%rcx,2), %edx
movl %eax, 12(%rdi)
leal (%rdx,%rcx,2), %eax
movl %eax, 16(%rdi)
Folding is currently being done as a part of addressing mode matcher, I feel that efficient
folding can only be done as a separate MI pass, that is what I explained in the proposal (http://lists.llvm.org/pipermail/llvm-dev/2017-July/115182.html).
Thanks for your example I will add it to the test cases , it demonstrates generic ness of Factorization.
https://reviews.llvm.org/D35014
More information about the llvm-commits
mailing list