[LLVMdev] Question about shouldMergeGEPs in InstructionCombining

Fri Mar 13 10:13:38 PDT 2015

On Thu, Mar 12, 2015 at 2:34 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> It is not clear to me at all that preventing the merging is the right
> solution. There are a large number of analysis, including alias analysis,
> and optimizations that use GetUnderlyingObject, and related routines to
> search back through GEPs. They only do this up to some small finite depth
> (six, IIRC). So reducing the GEP depth is likely the right solution for
> InstCombine (which has the job of canonicalizing the IR).
>
> We should, however, pull these apart somewhere, and probably in some way
> that is address-mode aware. I'd recommend trying to split non-free (via the
> addressing-mode) loop-invariant parts of GEPs out of loops in CodeGenPrep.
>

Thanks, Hal.  I'll have a look at CGP to see how this might be done.  It's
a little more complicated than just pulling the GEP apart, there needs to
be a loop-invariant-aware reassociation to undo the damage done by the
initial merge.

Mark

>
>  -Hal
>
> ----- Original Message -----
> > From: "Mark Heffernan" <meheff at google.com>
> > To: "Francois Pichet" <pichet2000 at gmail.com>
> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "LLVM Developers Mailing List" <
> llvmdev at cs.uiuc.edu>
> > Sent: Thursday, March 12, 2015 4:19:56 PM
> > Subject: Re: [LLVMdev] Question about shouldMergeGEPs in
> InstructionCombining
> >
> >
> >
> >
> > On Thu, Mar 12, 2015 at 2:03 PM, Francois Pichet <
> > pichet2000 at gmail.com > wrote:
> >
> >
> >
> >
> > I think it would make sense for (1) and (2). I am not sure if (3) is
> > feasible in instcombine . (I am not too familiar with LoopInfo)
> >
> >
> > LoopInfo should be available for at least one of the instcombine
> > invocations. In the -O2 pipeline it looks like instcombine is called
> > 7 times and it should have loopinfo for the third invocation (called
> > immediate after some loop passes). Here's output from
> > --debug-pass=Structure:
> >
> >
> >
> > Pass Arguments: -tti -no- aa -tbaa -scoped-noalias
> > -assumption-cache-tracker -targetlibinfo -basicaa -verify
> > -simplifycfg -domtree -sroa -early-cse -lower-expect
> > Target Transform Information
> > No Alias Analysis (always returns 'may' alias)
> > Type-Based Alias Analysis
> > Scoped NoAlias Alias Analysis
> > Assumption Cache Tracker
> > Target Library Information
> > Basic Alias Analysis (stateless AA impl)
> > FunctionPass Manager
> > Module Verifier
> > Simplify the CFG
> > Dominator Tree Construction
> > SROA
> > Early CSE
> > Lower 'expect' Intrinsics
> > Pass Arguments: -targetlibinfo -tti -no-aa -tbaa -scoped-noalias
> > -assumption-cache-tracker -basicaa -verify-di -ipsccp -globalopt
> > -deadargelim -domtree -instcombine -simplifycfg -basiccg -prune-eh
> > -inline-cost -i
> > nline -functionattrs -sroa -domtree -early-cse -lazy-value-info
> > -jump-threadi ng -correlated-propagation -simplifycfg -domtree
> > -instcombine -tailcallelim -simplifycfg -reassociate -domtree -loops
> > -loop-simplify -lc
> > ssa -loop-rotate -licm -loop-unswitch -instcombine -scalar-evolution
> > -loop-simplify -lcssa -indvars -loop-idiom -loop-deletion
> > -loop-unroll -memdep -mldst-motion -domtree -memdep -gvn -memdep
> > -memcpyopt -sccp -dom
> > tree -bdce -instcombine -lazy-value-info -jump-threading
> > -correlated-propagation -domtree -memdep -dse -loops -loop-simplify
> > -lcssa -licm -adce -simplifycfg -domtree -instcombine -barrier
> > -domtree -loops -loop-sim
> > plify -lcssa -loop-rotate -branch-prob -block-freq -scalar-evolution
> > -loop-accesses -loop-vectorize -instcombine -scalar-evolution
> > -slp-vectorizer -simplifycfg -domtree -instcombine -loops
> > -loop-simplify -lcssa -s
> > calar-evolution -loop-unroll -alignment-from-assumptions
> > -strip-dead-prototypes -globaldce -constmerge -verify -verify-di
> > Target Library Information
> > Tar get Transform Information
> > No Alias Analysis (always returns 'may' alias)
> > Type-Based Alias Analysis
> > Scoped NoAlias Alias Analysis
> > Assumption Cache Tracker
> > Basic Alias Analysis (stateless AA impl)
> > ModulePass Manager
> > Debug Info Verifier
> > Interprocedural Sparse Conditional Constant Propagation
> > Global Variable Optimizer
> > Dead Argument Elimination
> > FunctionPass Manager
> > Dominator Tree Construction
> > Combine redundant instructions
> > Simplify the CFG
> > CallGraph Construction
> > Call Graph SCC Pass Manager
> > Remove unused exception handling info
> > Inline Cost Analysis
> > Function Integration/Inlining
> > Deduce function attributes
> > FunctionPass Manager
> > SROA
> > Dominator Tree Construction
> > Early CSE
> > Lazy Value Information Analysis
> > Jump Threading
> > Value Propagation
> > Simplify the CFG
> > Dominator Tree Construction
> > Combine redundant instructions
> > Tail Call Elimination
> > Simplify the CFG
> > Reassociate expressions
> > Dominator Tree Construction
> > Natural Loop Information
> > Canonicalize natural loops
> > Loop-Closed SSA Form Pass
> > Loop Pass Manager
> > Rotate Loops
> > Loop Invariant Code Motion
> > Unswitch loops
> >
> > Combine redundant instructions
> > Scalar Evolution Analysis
> > Canonicalize natural loops
> > Loop-Closed SSA Form Pass
> > Loop Pass Manager
> > Induction Variable Simplification
> > Recognize loop idioms
> > Delete dead loops
> > Unroll loops
> > Memory Dependence Analysis
> > MergedLoadStoreMotion
> > Dominator Tree Construction
> > Memory Dependence Analysis
> > Global Value Numbering
> > Memory Dependence Analysis
> > MemCpy Optimization
> > Sparse Conditional Constant Propagation
> > Dominator Tree Construction
> > Bit-Tracking Dead Code Elimination
> > Combine redundant instructions
> > Lazy Value Information Analysis
> > Jump Threading
> > Value Propagation
> > Dominator Tree Construction
> > Memory Dependence Analysis
> > Dead Store Elimination
> > Natural Loop Information
> > Canonicalize natural loops
> > Loop-Closed SSA Form Pass
> > Loop Pass Manager
> > Loop Invariant Code Motion
> > Aggressive Dead Code Elimination
> > Simplify the CFG
> > Dominator Tree Construction
> > Combine redundant instructions
> > A No-Op Barrier Pass
> > FunctionPass Manager
> > Dominator Tree Construction
> > Natural Loop Information
> > Canonicalize natural loops
> > Loop-Closed SSA Form Pass
> > Loop Pass Manager
> > Rotate Loops
> > Branch Probability Analysis
> > Block Frequency Analysis
> > Scalar Evolution Analysis
> > Loop Access Analysis
> > Loop Vectorization
> > Combine redundant instructions
> > Scalar Evolution Analysis
> > SLP Vectorizer
> > Simplify the CFG
> > Dominator Tree Construction
> > Combine redundant instructions
> > Natural Loop Information
> > Canonicalize natural loops
> > Loop-Closed SSA Form Pass
> > Scalar Evolution Analysis
> > Loop Pass Manager
> > Unroll loops
> > Alignment from assumptions
> >
> > Strip Unused Function Prototypes
> > Dead Global Elimination
> > Merge Duplicate Global Constants
> > FunctionPass Manager
> > Module Verifier
> > Debug Info Verifier Bitcode Writer
> >
> >
> >
> >
> >
> >
> > Mark
> >
> >
> >
> >
> >
> >
> >
> > For the Octasic's Opus platform, I modified shouldMergeGEPs in our
> > fork to:
> >
> >
> >
> > if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() &&
> > !Src.hasOneUse())
> > return false;
> >
> >
> > return Src.hasAllConstantIndices(); // was return false;
> >
> >
> >
> > Following that change, I noticed some performance gain for a few
> > specific tests and no regression at all in our (admittedly limited)
> > benchmarks suite.
> >
> >
> > Regards,
> > Francois Pichet, Octasic.
> >
> >
> >
> >
> > On Thu, Mar 12, 2015 at 4:14 PM, Mark Heffernan < meheff at google.com >
> > wrote:
> >
> >
> >
> > Coincidentally, I just ran into this same issue on some of our
> > benchmarks for the NVPTX backend. You have something like this
> > before instcombine:
> >
> >
> > %tmp = getelementptr inbounds i32, i32* %input, i64 %offset
> >
> > loop:
> >
> > %loop_variant = ...
> > %ptr = getelementptr inbounds i32, i32* %tmp, i64 %loop_variant
> >
> >
> > Which gets transformed to:
> >
> >
> >
> > loop:
> >
> > %loop_variant = ...
> > %sum = add nsw i64 %loop_variant, %offset
> >
> > %ptr = getelementptr inbounds i32, i32* %input, i64 %sum
> >
> >
> >
> > The merge essentially reassociates the loop-variant term
> > (%loop_variant) and loop-invariant terms (%input and %offset) in
> > such a way that LICM can't remove it.
> >
> >
> >
> >
> > One idea is to only perform this style of gep merge if at least one
> > of the following conditions is true:
> > (1) both index terms in the GEP are constant. In this case no new add
> > instruction is created, instead the constants are folded.
> > (2) the GEPs are in the same BB.
> > (3) LoopInfo is available, and we know we're not creating a new
> > instruction in a (deeper) loop.
> >
> >
> >
> > What do you think?
> >
> >
> > Mark
> >
> >
> >
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150313/5cbc95f8/attachment.html>