[PATCH] D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.

Wed Jul 26 08:33:41 PDT 2017

jbhateja added inline comments.

================
Comment at: test/CodeGen/X86/lea-opt-csebb.ll:1
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+
----------------
lsaba wrote:
> RKSimon wrote:
> > jbhateja wrote:
> > > lsaba wrote:
> > > > can you please add a test case that covers scale >1 cases 
> > > 
> > > This commit if you see has two parts
> > > 1/ pattern matching based on addressing mode (which is limited currently).
> > > 2/ factoring of LEAs which is generic.
> > > 
> > > Checking in incremental changes should be fine I guess.
> > > 
> > > Generic pattern will need to be brought out of addessing mode based selection as I described in following link 
> > > https://groups.google.com/forum/#!topic/llvm-dev/x2LDXpON500
> > > 
> > > Please comment in the thread. 
> > > 
> > Please can you commit this test file to trunk with current codegen and update the patch to show the diff
> I am not sure i understand what you mean by "Generic pattern will need to be brought out of addessing mode" , as far as i understand, for the following C code:
> 
> int foo(int a, int b) {
>   int x = a + 2*b + 4; 
>   int y = a + 4*b + 4; 
>   int c = x*y ;
>   return c; 
> }
> 
> the currently  generated IR:
> define i32 @foo(i32 %a, i32 %b) local_unnamed_addr #0 {
> entry:
>   %mul = shl i32 %b, 1
>   %add = add i32 %a, 4
>   %add1 = add i32 %add, %mul
>   %mul2 = shl i32 %b, 2
>   %add4 = add i32 %add, %mul2
>   %mul5 = mul nsw i32 %add1, %add4
>   ret i32 %mul5
> }
> 
> 
> the currently generated asm: 
> 
> 	leal	4(%rdi,%rsi,2), %ecx
> 	leal	4(%rdi,%rsi,4), %eax
> 	imull	%ecx, %eax
> 	retq
> 
> this will be refactored by this optimization in this current commit (not a future commit) to: 
> 	leal	4(%rdi,%rsi,2), %ecx
> 	leal	 (%ecx,%rsi,2), %eax
> 	imull	%ecx, %eax
> 	retq
> 
> 
> please correct me if im wrong
> 
> 
Hi Lama,

By generic patten handling I meant LEA folding into complex LEAs which is currently restrictive.

Consider following case 

%struct.SA = type { i32 , i32 , i32 , i32 , i32};

define void @foo(%struct.SA* nocapture %ctx, i32 %n) local_unnamed_addr #0 {
 entry:
   %h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 0
   %0 = load i32, i32* %h0, align 8
   %h3 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 3
   %h4 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 4
   %1 = load i32, i32* %h4, align 8
   %add = add i32 %0, 1
   %add4 = add i32 %add, %1
   %add5 = add i32 %add4, %1
   store i32 %add5, i32* %h3, align 4
   %add10 = add i32 %add5, %1
   %add29 = add i32 %add10, %1
   store i32 %add29, i32* %h4, align 8
   ret void
}

ASM :

foo:                                    # @foo
 .cfi_startproc
# BB#0:                                 # %entry
 movl (%rdi), %eax
 movl 16(%rdi), %ecx
 leal (%rax,%rcx,2), %edx
 leal 1(%rax,%rcx,2), %eax
 movl %eax, 12(%rdi)
 leal 1(%rdx,%rcx,2), %eax
 movl %eax, 16(%rdi)

It could be further optimized to following:

 movl (%rdi), %eax
 movl 16(%rdi), %ecx
 leal 1(%rax,%rcx,2), %edx
 movl %eax, 12(%rdi)
 leal (%rdx,%rcx,2), %eax
 movl %eax, 16(%rdi)

Folding is currently being done as a part of addressing mode matcher,  I feel that efficient 
folding can only be done as a separate MI pass, that is what I explained in the proposal (http://lists.llvm.org/pipermail/llvm-dev/2017-July/115182.html).

Thanks for your example I will add it to the test cases , it demonstrates generic ness of Factorization. 

https://reviews.llvm.org/D35014