[PATCH] D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.

Thu Jul 27 00:49:09 PDT 2017

lsaba added inline comments.

================
Comment at: test/CodeGen/X86/lea-opt-csebb.ll:1
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+
----------------
jbhateja wrote:
> lsaba wrote:
> > RKSimon wrote:
> > > jbhateja wrote:
> > > > lsaba wrote:
> > > > > can you please add a test case that covers scale >1 cases 
> > > > 
> > > > This commit if you see has two parts
> > > > 1/ pattern matching based on addressing mode (which is limited currently).
> > > > 2/ factoring of LEAs which is generic.
> > > > 
> > > > Checking in incremental changes should be fine I guess.
> > > > 
> > > > Generic pattern will need to be brought out of addessing mode based selection as I described in following link 
> > > > https://groups.google.com/forum/#!topic/llvm-dev/x2LDXpON500
> > > > 
> > > > Please comment in the thread. 
> > > > 
> > > Please can you commit this test file to trunk with current codegen and update the patch to show the diff
> > I am not sure i understand what you mean by "Generic pattern will need to be brought out of addessing mode" , as far as i understand, for the following C code:
> > 
> > int foo(int a, int b) {
> >   int x = a + 2*b + 4; 
> >   int y = a + 4*b + 4; 
> >   int c = x*y ;
> >   return c; 
> > }
> > 
> > the currently  generated IR:
> > define i32 @foo(i32 %a, i32 %b) local_unnamed_addr #0 {
> > entry:
> >   %mul = shl i32 %b, 1
> >   %add = add i32 %a, 4
> >   %add1 = add i32 %add, %mul
> >   %mul2 = shl i32 %b, 2
> >   %add4 = add i32 %add, %mul2
> >   %mul5 = mul nsw i32 %add1, %add4
> >   ret i32 %mul5
> > }
> > 
> > 
> > the currently generated asm: 
> > 
> > 	leal	4(%rdi,%rsi,2), %ecx
> > 	leal	4(%rdi,%rsi,4), %eax
> > 	imull	%ecx, %eax
> > 	retq
> > 
> > this will be refactored by this optimization in this current commit (not a future commit) to: 
> > 	leal	4(%rdi,%rsi,2), %ecx
> > 	leal	 (%ecx,%rsi,2), %eax
> > 	imull	%ecx, %eax
> > 	retq
> > 
> > 
> > please correct me if im wrong
> > 
> > 
> Hi Lama,
> 
> By generic patten handling I meant LEA folding into complex LEAs which is currently restrictive.
> 
> Consider following case 
> 
> %struct.SA = type { i32 , i32 , i32 , i32 , i32};
> 
> define void @foo(%struct.SA* nocapture %ctx, i32 %n) local_unnamed_addr #0 {
>  entry:
>    %h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 0
>    %0 = load i32, i32* %h0, align 8
>    %h3 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 3
>    %h4 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 4
>    %1 = load i32, i32* %h4, align 8
>    %add = add i32 %0, 1
>    %add4 = add i32 %add, %1
>    %add5 = add i32 %add4, %1
>    store i32 %add5, i32* %h3, align 4
>    %add10 = add i32 %add5, %1
>    %add29 = add i32 %add10, %1
>    store i32 %add29, i32* %h4, align 8
>    ret void
> }
> 
> ASM :
> 
> foo:                                    # @foo
>  .cfi_startproc
> # BB#0:                                 # %entry
>  movl (%rdi), %eax
>  movl 16(%rdi), %ecx
>  leal (%rax,%rcx,2), %edx
>  leal 1(%rax,%rcx,2), %eax
>  movl %eax, 12(%rdi)
>  leal 1(%rdx,%rcx,2), %eax
>  movl %eax, 16(%rdi)
>  
> 
> It could be further optimized to following:
> 
>  movl (%rdi), %eax
>  movl 16(%rdi), %ecx
>  leal 1(%rax,%rcx,2), %edx
>  movl %eax, 12(%rdi)
>  leal (%rdx,%rcx,2), %eax
>  movl %eax, 16(%rdi)
>  
> Folding is currently being done as a part of addressing mode matcher,  I feel that efficient 
> folding can only be done as a separate MI pass, that is what I explained in the proposal (http://lists.llvm.org/pipermail/llvm-dev/2017-July/115182.html).
> 
> Thanks for your example I will add it to the test cases , it demonstrates generic ness of Factorization. 
> 
> 
Hi,

Thanks, I understand the need for a more generic pattern matching and I agree.
This is unrelated to my comment which refers solely to the Factorize LEA optimization which needs more testing, for example covering different Scale values (like the example i provided) and testing factorizing  LEAs cross Basic Blocks.

https://reviews.llvm.org/D35014