[PATCH] D35014: [X86] Improvement in CodeGen instruction selection for LEAs.

Sun Oct 29 09:55:39 PDT 2017

jbhateja added inline comments.

================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:1178
+  if ((!EnableAggressiveFolding && Depth > 5) ||
+        (EnableAggressiveFolding && Depth > 8) )
     return matchAddressBase(N, AM);
----------------
RKSimon wrote:
> RKSimon wrote:
> > Two hard coded depths like this is weird - better to have a getMaxOperandFoldingDepth() helper?
> My comment still stands - try to avoid hard coded values embedded in the source - add a getMaxOperandFoldingDepth() helper.
Helper added.

================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:105

+    bool isLegalScale() {
+      return (Scale == 1 || Scale == 2 || Scale == 4 || Scale == 8);
----------------
RKSimon wrote:
> ```
> bool isLegalScale() const {
> ```
Fixed.

================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:199

+    void setAggressiveOperandFolding(bool val = false) {
+      OptForAggressingFolding = val;
----------------
RKSimon wrote:
> Is a default argument for a setter a good idea? Especially one that is the inverse of what the setter says it is.
Fixed. We do not need a default argument here, both the calls to this routines is passing an explicit argument.

================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:1537
+      } else if (AM.IndexReg == N) {
+        AM.Scale++;
+        return false;
----------------
RKSimon wrote:
> These AM.Scale increments are scary - better to set it with AM.Scale = 2?
Increments are triggered only in aggressive folding mode and can fold upto 8 operands (which is a max legal scale). This was intentionally done, initial change was only working for AM.scale = 2 and was very restrictive. Aggressive operand folding is done only for LEAs currently and is enabled instantiating and RAII object of X86AggressiveOperandFolding class.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:839
+               MachineInstr *NewMI = 
+                  BuildMI(*(const_cast<MachineBasicBlock *>(&MBB)),
+                     &LI1,DL,TII->get(LI1.getOpcode()))
----------------
RKSimon wrote:
> lsaba wrote:
> > This could end up in an assertion failure if LI1 is at the beginning of the BB, need to handle it separately, for example in this reproducer :
> > 
> >  ; ModuleID = 'bugpoint-reduced-simplified.bc'
> >  source_filename = "bugpoint-output-2ef2e5d.bc"
> >  target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> >  target triple = "x86_64-unknown-linux-gnu"
> >  
> >  ; Function Attrs: norecurse nounwind readnone uwtable
> >  define i32 @foo(i32 %a, i32 %b, i32 %d, i32 %y, i32 %x) local_unnamed_addr #0 {
> >  entry:
> >    %mul1 = shl i32 %b, 1
> >    %add2 = add i32 %a, 4
> >    %add3 = add i32 %add2, %mul1
> >    %mul4 = shl i32 %b, 2
> >    %add6 = add i32 %add2, %mul4
> >    br label %for.body
> >  
> >  for.cond.cleanup:                                 ; preds = %for.body
> >    ret i32 %add
> >  
> >  for.body:                                         ; preds = %for.body, %entry
> >    %x.addr.015 = phi i32 [ %x, %entry ], [ %add3, %for.body ]
> >    %y.addr.014 = phi i32 [ %y, %entry ], [ %add6, %for.body ]
> >    %mul = mul nsw i32 %x.addr.015, %y.addr.014
> >    %add = add nsw i32 0, %mul
> >    %exitcond = icmp eq i32 undef, %d
> >    br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !1
> >  }
> >  
> >  attributes #0 = { norecurse nounwind readnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-       math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+   mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
> >  
> >  !llvm.ident = !{!0}
> >  
> >  !0 = !{!"clang version 6.0.0 (cfe/trunk 309511)"}
> >  !1 = distinct !{!1, !2}
> >  !2 = !{!"llvm.loop.unroll.disable"}
> > 
> Has @lsaba test been added to the patch? I couldn't see it.
We have a similare test case for loop lea-opt-cse2.ll. We are not doing any factorization inside loops, only simplifyLEA can kick in.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:839
+               MachineInstr *NewMI = 
+                  BuildMI(*(const_cast<MachineBasicBlock *>(&MBB)),
+                     &LI1,DL,TII->get(LI1.getOpcode()))
----------------
jbhateja wrote:
> RKSimon wrote:
> > lsaba wrote:
> > > This could end up in an assertion failure if LI1 is at the beginning of the BB, need to handle it separately, for example in this reproducer :
> > > 
> > >  ; ModuleID = 'bugpoint-reduced-simplified.bc'
> > >  source_filename = "bugpoint-output-2ef2e5d.bc"
> > >  target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> > >  target triple = "x86_64-unknown-linux-gnu"
> > >  
> > >  ; Function Attrs: norecurse nounwind readnone uwtable
> > >  define i32 @foo(i32 %a, i32 %b, i32 %d, i32 %y, i32 %x) local_unnamed_addr #0 {
> > >  entry:
> > >    %mul1 = shl i32 %b, 1
> > >    %add2 = add i32 %a, 4
> > >    %add3 = add i32 %add2, %mul1
> > >    %mul4 = shl i32 %b, 2
> > >    %add6 = add i32 %add2, %mul4
> > >    br label %for.body
> > >  
> > >  for.cond.cleanup:                                 ; preds = %for.body
> > >    ret i32 %add
> > >  
> > >  for.body:                                         ; preds = %for.body, %entry
> > >    %x.addr.015 = phi i32 [ %x, %entry ], [ %add3, %for.body ]
> > >    %y.addr.014 = phi i32 [ %y, %entry ], [ %add6, %for.body ]
> > >    %mul = mul nsw i32 %x.addr.015, %y.addr.014
> > >    %add = add nsw i32 0, %mul
> > >    %exitcond = icmp eq i32 undef, %d
> > >    br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !1
> > >  }
> > >  
> > >  attributes #0 = { norecurse nounwind readnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-       math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+   mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
> > >  
> > >  !llvm.ident = !{!0}
> > >  
> > >  !0 = !{!"clang version 6.0.0 (cfe/trunk 309511)"}
> > >  !1 = distinct !{!1, !2}
> > >  !2 = !{!"llvm.loop.unroll.disable"}
> > > 
> > Has @lsaba test been added to the patch? I couldn't see it.
> We have a similare test case for loop lea-opt-cse2.ll. We are not doing any factorization inside loops, only simplifyLEA can kick in.
We have a test case for loops lea-opt-cse2.ll, so not added this.
We are not doing any factorization inside loops, only simplifyLEA can kick in.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:327
+static bool isDefCopyLike(MachineRegisterInfo *MRI, const MachineOperand &Opr) {
+  bool isInstrErased = Opr.isReg() && Opr.getParent()->getParent() ? 0 : 1;
+  if (!Opr.isReg() || isInstrErased ||
----------------
RKSimon wrote:
> Do you mean:
> ```
> bool isInstrErased = !(Opr.isReg() && Opr.getParent()->getParent()); 
> ```
fixed.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:1013
+  // Legal scale value (1,2,4 & 8) vector.
+  int LegalScale[9] = {0, 1, 1, 0, 1, 0, 0, 0, 1};
+
----------------
RKSimon wrote:
> Really don't like this - write a helper instead like you did in X86ISelDAGToDAG.cpp
> ```
> auto IsLegalScale = [](int S) {
>   return S == 1 || S == 2 || S == 4 || S == 8;
> };
> ```
Fixed

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:1019
+      return false;
+    return true;
+  };
----------------
RKSimon wrote:
> ```
> return Arg1->getOperand(2).getImm() >= Arg2->getOperand(2).getImm();
> ```
Fixed

================
Comment at: test/CodeGen/X86/lea-opt-cse4.ll:3
+; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+slow-3ops-lea | FileCheck %s -check-prefix=X64
+; RUN: llc < %s -mtriple=i686-unknown -mattr=+slow-3ops-lea | FileCheck %s -check-prefix=X86

----------------
RKSimon wrote:
> Why have you changed these tests?
FixupLEAPass down the pipeline transforms some complex  LEA ptterns to simple with add. Optimization, with changes in the patch we will have following

 leal 1(%rax,%rcx,4), %eax 

which after FixupLEAPass will get converted to 

 leal (%rax,%rcx,4), %eax
 addl $1, %eax

https://reviews.llvm.org/D35014