[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
Mark Heffernan
meheff at google.com
Thu Mar 12 14:19:56 PDT 2015
On Thu, Mar 12, 2015 at 2:03 PM, Francois Pichet <pichet2000 at gmail.com>
wrote:
> I think it would make sense for (1) and (2). I am not sure if (3) is
> feasible in instcombine. (I am not too familiar with LoopInfo)
>
LoopInfo should be available for at least one of the instcombine
invocations. In the -O2 pipeline it looks like instcombine is called 7
times and it should have loopinfo for the third invocation (called
immediate after some loop passes). Here's output from
--debug-pass=Structure:
Pass Arguments: -tti -no-aa -tbaa -scoped-noalias
-assumption-cache-tracker -targetlibinfo -basicaa -verify -simplifycfg
-domtree -sroa -early-cse -lower-expect
Target Transform Information
No Alias Analysis (always returns 'may' alias)
Type-Based Alias Analysis
Scoped NoAlias Alias Analysis
Assumption Cache Tracker
Target Library Information
Basic Alias Analysis (stateless AA impl)
FunctionPass Manager
Module Verifier
Simplify the CFG
Dominator Tree Construction
SROA
Early CSE
Lower 'expect' Intrinsics
Pass Arguments: -targetlibinfo -tti -no-aa -tbaa -scoped-noalias
-assumption-cache-tracker -basicaa -verify-di -ipsccp -globalopt
-deadargelim -domtree -instcombine -simplifycfg -basiccg -prune-eh
-inline-cost -i
nline -functionattrs -sroa -domtree -early-cse -lazy-value-info
-jump-threading -correlated-propagation -simplifycfg -domtree -instcombine
-tailcallelim -simplifycfg -reassociate -domtree -loops -loop-simplify -lc
ssa -loop-rotate -licm -loop-unswitch -instcombine -scalar-evolution
-loop-simplify -lcssa -indvars -loop-idiom -loop-deletion -loop-unroll
-memdep -mldst-motion -domtree -memdep -gvn -memdep -memcpyopt -sccp -dom
tree -bdce -instcombine -lazy-value-info -jump-threading
-correlated-propagation -domtree -memdep -dse -loops -loop-simplify -lcssa
-licm -adce -simplifycfg -domtree -instcombine -barrier -domtree -loops
-loop-sim
plify -lcssa -loop-rotate -branch-prob -block-freq -scalar-evolution
-loop-accesses -loop-vectorize -instcombine -scalar-evolution
-slp-vectorizer -simplifycfg -domtree -instcombine -loops -loop-simplify
-lcssa -s
calar-evolution -loop-unroll -alignment-from-assumptions
-strip-dead-prototypes -globaldce -constmerge -verify -verify-di
Target Library Information
Target Transform Information
No Alias Analysis (always returns 'may' alias)
Type-Based Alias Analysis
Scoped NoAlias Alias Analysis
Assumption Cache Tracker
Basic Alias Analysis (stateless AA impl)
ModulePass Manager
Debug Info Verifier
Interprocedural Sparse Conditional Constant Propagation
Global Variable Optimizer
Dead Argument Elimination
FunctionPass Manager
Dominator Tree Construction
*Combine redundant instructions*
Simplify the CFG
CallGraph Construction
Call Graph SCC Pass Manager
Remove unused exception handling info
Inline Cost Analysis
Function Integration/Inlining
Deduce function attributes
FunctionPass Manager
SROA
Dominator Tree Construction
Early CSE
Lazy Value Information Analysis
Jump Threading
Value Propagation
Simplify the CFG
Dominator Tree Construction
*Combine redundant instructions*
Tail Call Elimination
Simplify the CFG
Reassociate expressions
Dominator Tree Construction
Natural Loop Information
Canonicalize natural loops
Loop-Closed SSA Form Pass
Loop Pass Manager
Rotate Loops
Loop Invariant Code Motion
Unswitch loops
*Combine redundant instructions*
Scalar Evolution Analysis
Canonicalize natural loops
Loop-Closed SSA Form Pass
Loop Pass Manager
Induction Variable Simplification
Recognize loop idioms
Delete dead loops
Unroll loops
Memory Dependence Analysis
MergedLoadStoreMotion
Dominator Tree Construction
Memory Dependence Analysis
Global Value Numbering
Memory Dependence Analysis
MemCpy Optimization
Sparse Conditional Constant Propagation
Dominator Tree Construction
Bit-Tracking Dead Code Elimination
*Combine redundant instructions*
Lazy Value Information Analysis
Jump Threading
Value Propagation
Dominator Tree Construction
Memory Dependence Analysis
Dead Store Elimination
Natural Loop Information
Canonicalize natural loops
Loop-Closed SSA Form Pass
Loop Pass Manager
Loop Invariant Code Motion
Aggressive Dead Code Elimination
Simplify the CFG
Dominator Tree Construction
*Combine redundant instructions*
A No-Op Barrier Pass
FunctionPass Manager
Dominator Tree Construction
Natural Loop Information
Canonicalize natural loops
Loop-Closed SSA Form Pass
Loop Pass Manager
Rotate Loops
Branch Probability Analysis
Block Frequency Analysis
Scalar Evolution Analysis
Loop Access Analysis
Loop Vectorization
*Combine redundant instructions*
Scalar Evolution Analysis
SLP Vectorizer
Simplify the CFG
Dominator Tree Construction
*Combine redundant instructions*
Natural Loop Information
Canonicalize natural loops
Loop-Closed SSA Form Pass
Scalar Evolution Analysis
Loop Pass Manager
Unroll loops
Alignment from assumptions
Strip Unused Function Prototypes
Dead Global Elimination
Merge Duplicate Global Constants
FunctionPass Manager
Module Verifier
Debug Info Verifier
Bitcode Writer
Mark
For the Octasic's Opus platform, I modified shouldMergeGEPs in our fork to:
>
> if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() &&
> !Src.hasOneUse())
> return false;
>
> return Src.hasAllConstantIndices(); // was return false;
>
> Following that change, I noticed some performance gain for a few specific
> tests and no regression at all in our (admittedly limited) benchmarks suite.
>
> Regards,
> Francois Pichet, Octasic.
>
> On Thu, Mar 12, 2015 at 4:14 PM, Mark Heffernan <meheff at google.com> wrote:
>
>> Coincidentally, I just ran into this same issue on some of our benchmarks
>> for the NVPTX backend. You have something like this before instcombine:
>>
>> %tmp = getelementptr inbounds i32, i32* %input, i64 %offset
>> loop:
>> %loop_variant = ...
>> %ptr = getelementptr inbounds i32, i32* %tmp, i64 %loop_variant
>>
>> Which gets transformed to:
>>
>> loop:
>> %loop_variant = ...
>> %sum = add nsw i64 %loop_variant, %offset
>> %ptr = getelementptr inbounds i32, i32* %input, i64 %sum
>>
>> The merge essentially reassociates the loop-variant term (%loop_variant)
>> and loop-invariant terms (%input and %offset) in such a way that LICM can't
>> remove it.
>>
>> One idea is to only perform this style of gep merge if at least one of
>> the following conditions is true:
>> (1) both index terms in the GEP are constant. In this case no new add
>> instruction is created, instead the constants are folded.
>> (2) the GEPs are in the same BB.
>> (3) LoopInfo is available, and we know we're not creating a new
>> instruction in a (deeper) loop.
>>
>> What do you think?
>>
>> Mark
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150312/72a0fbf4/attachment.html>
More information about the llvm-dev
mailing list