[llvm] [FunctionAttrs] deduce attr `cold` on functions if all CG paths call a `cold` function (PR #101298)
via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 31 09:05:21 PDT 2024
goldsteinn wrote:
> > Is this potentially acceptable?
>
> Generally yes, second-order regressions are fine. Of course, it depends on the details. How does the presence of cold attributes affect things -- is this "just" because of substantially different inlining decisions, or something else?
So looking at the diff in stats from before/after (diff below) seems like some small changes across the board, no number really jumps out. Inlining does have a notable (ish) change with less inlines after the change. Although not sure how to read the numbers to interpret whats leading to the compile time regression.
```diff
@@ -1,6 +1,6 @@
- 1289338 aa - Number of MayAlias results
- 174178 aa - Number of MustAlias results
- 5817260 aa - Number of NoAlias results
+ 1289144 aa - Number of MayAlias results
+ 174197 aa - Number of MustAlias results
+ 5816357 aa - Number of NoAlias results
31029 abstract-call-sites - Number of direct abstract call sites created
96 abstract-call-sites - Number of invalid abstract call sites created (no callback)
392 abstract-call-sites - Number of invalid abstract call sites created (unknown use)
@@ -11,9 +11,9 @@
220 argpromotion - Number of dead pointer args eliminated
1275 argpromotion - Number of pointer arguments promoted
33178 assume-queries - Number of Queries into an assume assume bundles
- 7841528 basicaa - Number of times a GEP is decomposed
+ 7843168 basicaa - Number of times a GEP is decomposed
13619 basicaa - Number of times the limit to decompose GEPs is reached
- 3895 bdce - Number of instructions removed (unused)
+ 3899 bdce - Number of instructions removed (unused)
22 bdce - Number of instructions trivialized (dead bits)
194 bdce - Number of sign extension instructions converted to zero extension
2091 build-libcalls - Number of arguments inferred as nocapture
@@ -31,64 +31,64 @@
614 build-libcalls - Number of functions inferred as willreturn
130 build-libcalls - Number of functions inferred as writeonly
44 callsite-splitting - Number of call-site split
- 59752 capture-tracking - Number of pointers maybe captured
- 20192 capture-tracking - Number of pointers maybe captured before
- 12565 capture-tracking - Number of pointers not captured
- 8428 capture-tracking - Number of pointers not captured before
- 861 constraint-elimination - Number of instructions removed
+ 59767 capture-tracking - Number of pointers maybe captured
+ 20202 capture-tracking - Number of pointers maybe captured before
+ 12570 capture-tracking - Number of pointers not captured
+ 8431 capture-tracking - Number of pointers not captured before
+ 860 constraint-elimination - Number of instructions removed
69 correlated-value-propagation - Number of ands removed
- 49 correlated-value-propagation - Number of ashr converted to lshr
+ 46 correlated-value-propagation - Number of ashr converted to lshr
6 correlated-value-propagation - Number of bound udiv's/urem's expanded
- 420 correlated-value-propagation - Number of comparisons propagated
- 8595 correlated-value-propagation - Number of function pointer arguments marked non-null
+ 424 correlated-value-propagation - Number of comparisons propagated
+ 8591 correlated-value-propagation - Number of function pointer arguments marked non-null
61 correlated-value-propagation - Number of llvm.[us]{min,max} intrinsics removed
6 correlated-value-propagation - Number of llvm.abs intrinsics removed
73 correlated-value-propagation - Number of llvm.s{min,max} intrinsics simplified to unsigned
- 977 correlated-value-propagation - Number of no-signed-wrap deductions
- 718 correlated-value-propagation - Number of no-signed-wrap deductions for add
+ 976 correlated-value-propagation - Number of no-signed-wrap deductions
+ 717 correlated-value-propagation - Number of no-signed-wrap deductions for add
23 correlated-value-propagation - Number of no-signed-wrap deductions for mul
121 correlated-value-propagation - Number of no-signed-wrap deductions for shl
115 correlated-value-propagation - Number of no-signed-wrap deductions for sub
- 1177 correlated-value-propagation - Number of no-unsigned-wrap deductions
- 805 correlated-value-propagation - Number of no-unsigned-wrap deductions for add
+ 1176 correlated-value-propagation - Number of no-unsigned-wrap deductions
+ 804 correlated-value-propagation - Number of no-unsigned-wrap deductions for add
26 correlated-value-propagation - Number of no-unsigned-wrap deductions for mul
243 correlated-value-propagation - Number of no-unsigned-wrap deductions for shl
103 correlated-value-propagation - Number of no-unsigned-wrap deductions for sub
- 2149 correlated-value-propagation - Number of no-wrap deductions
- 1523 correlated-value-propagation - Number of no-wrap deductions for add
+ 2147 correlated-value-propagation - Number of no-wrap deductions
+ 1521 correlated-value-propagation - Number of no-wrap deductions for add
49 correlated-value-propagation - Number of no-wrap deductions for mul
364 correlated-value-propagation - Number of no-wrap deductions for shl
213 correlated-value-propagation - Number of no-wrap deductions for sub
306 correlated-value-propagation - Number of phis deleted via common incoming value
- 5790 correlated-value-propagation - Number of phis propagated
+ 5780 correlated-value-propagation - Number of phis propagated
43 correlated-value-propagation - Number of sdiv converted to udiv
41 correlated-value-propagation - Number of sdivs/srems whose width was decreased
839 correlated-value-propagation - Number of selects propagated
- 1235 correlated-value-propagation - Number of sext converted to zext
+ 1234 correlated-value-propagation - Number of sext converted to zext
432 correlated-value-propagation - Number of signed icmp preds simplified to unsigned
23 correlated-value-propagation - Number of sitofp converted to uitofp
1 correlated-value-propagation - Number of srem converted to urem
- 82 correlated-value-propagation - Number of switch cases removed
+ 81 correlated-value-propagation - Number of switch cases removed
42 correlated-value-propagation - Number of udivs/urems whose width was decreased
364 correlated-value-propagation - Number of zext/uitofp non-negative deductions
- 645 count-visits - Max number of times we visited a function
- 27 deadargelim - Number of unread args removed
+ 646 count-visits - Max number of times we visited a function
+ 29 deadargelim - Number of unread args removed
135 deadargelim - Number of unread args replaced with poison
36 deadargelim - Number of unused return values removed
- 214229 dse - Number iterations check for reads in getDomMemoryDef
+ 214203 dse - Number iterations check for reads in getDomMemoryDef
464 dse - Number of other instrs removed
- 179 dse - Number of redundant stores deleted
+ 180 dse - Number of redundant stores deleted
201 dse - Number of stores dead by later partials
5173 dse - Number of stores deleted
5673 dse - Number of stores modified
- 195126 dse - Number of stores remaining after DSE
+ 195185 dse - Number of stores remaining after DSE
7139 dse - Number of times a valid candidate is returned from getDomMemoryDef
- 84533 early-cse - Number of GEP instructions CSE'd
+ 84529 early-cse - Number of GEP instructions CSE'd
32 early-cse - Number of call instructions CSE'd
342 early-cse - Number of compare instructions CVP'd
- 37737 early-cse - Number of instructions CSE'd
+ 37735 early-cse - Number of instructions CSE'd
129710 early-cse - Number of instructions simplified or DCE'd
- 69942 early-cse - Number of load instructions CSE'd
+ 69899 early-cse - Number of load instructions CSE'd
2318 early-cse - Number of trivial dead stores removed
56845 file-search - Number of #includes skipped due to the multi-include optimization.
162113 file-search - Number of attempted #includes.
@@ -97,9 +97,10 @@
6621 function-attrs - Number of arguments marked readonly
907 function-attrs - Number of arguments marked returned
2196 function-attrs - Number of arguments marked writeonly
- 435 function-attrs - Number of function returns marked noalias
+ 437 function-attrs - Number of function returns marked noalias
447 function-attrs - Number of function returns marked nonnull
793 function-attrs - Number of function returns marked noundef
+ 84 function-attrs - Number of functions marked as cold
3442 function-attrs - Number of functions marked as nofree
5949 function-attrs - Number of functions marked as norecurse
6060 function-attrs - Number of functions marked as nosync
@@ -127,87 +128,87 @@
1772 globalsmodref-aa - Number of global vars without address taken
23 globalsmodref-aa - Number of indirect global objects
4928 gvn - Number of blocks merged
- 7133 gvn - Number of blocks speculated as available in IsValueFullyAvailableInBlock(), max
- 2022 gvn - Number of equalities propagated
+ 7138 gvn - Number of blocks speculated as available in IsValueFullyAvailableInBlock(), max
+ 2020 gvn - Number of equalities propagated
856 gvn - Number of instructions PRE'd
- 62052 gvn - Number of instructions deleted
- 22145 gvn - Number of instructions simplified
- 7134 gvn - Number of loads PRE'd
- 11697 gvn - Number of loads deleted
+ 62128 gvn - Number of instructions deleted
+ 22152 gvn - Number of instructions simplified
+ 7142 gvn - Number of loads PRE'd
+ 11741 gvn - Number of loads deleted
822 gvn - Number of loads moved to predecessor of a critical edge in PRE
128 gvn - Number of loop loads PRE'd
232 indvars - Number of IV comparisons eliminated
26 indvars - Number of IV identities eliminated
2 indvars - Number of IV remainder operations eliminated
- 8702 indvars - Number of IV sign/zero extends eliminated
+ 8704 indvars - Number of IV sign/zero extends eliminated
1 indvars - Number of IV signed remainder operations converted to unsigned remainder
8 indvars - Number of IV users folded into a constant
6990 indvars - Number of congruent IVs eliminated
570 indvars - Number of exit values replaced
7122 indvars - Number of indvars widened
5410 indvars - Number of loop exit tests replaced
- 27730 inline - Number of functions deleted because all callers found
- 106114 inline - Number of functions inlined
- 132759 inline-cost - Number of call sites analyzed
+ 27703 inline - Number of functions deleted because all callers found
+ 105963 inline - Number of functions inlined
+ 132883 inline-cost - Number of call sites analyzed
27 instcombine - Negator: How many negations did we retrieve/reuse from cache
99 instcombine - Negator: Maximal number of new instructions created during negation attempt
611 instcombine - Negator: Maximal number of values ever visited while attempting to sink negation
177 instcombine - Negator: Maximal traversal depth ever reached while attempting to sink negation
- 89540 instcombine - Negator: Number of negations attempted to be sinked
+ 89555 instcombine - Negator: Number of negations attempted to be sinked
197 instcombine - Negator: Number of negations successfully sinked
225 instcombine - Negator: Number of new negated instructions created in successful negation sinking attempts
262 instcombine - Negator: Number of new negated instructions created, total
- 94420 instcombine - Negator: Total number of values visited during attempts to sink negation
+ 94435 instcombine - Negator: Total number of values visited during attempts to sink negation
2655 instcombine - Number of PHI's that got CSE'd
34 instcombine - Number of allocas copied from constant global
335 instcombine - Number of constant folds
- 111540 instcombine - Number of dead inst eliminated
+ 111544 instcombine - Number of dead inst eliminated
220 instcombine - Number of dead stores eliminated
13 instcombine - Number of expansions
199 instcombine - Number of factorizations
- 163239 instcombine - Number of functions with one iteration
- 52240 instcombine - Number of functions with two iterations
- 215479 instcombine - Number of instruction combining iterations performed
+ 163257 instcombine - Number of functions with one iteration
+ 52234 instcombine - Number of functions with two iterations
+ 215491 instcombine - Number of instruction combining iterations performed
11604 instcombine - Number of instructions sunk
- 560025 instcombine - Number of insts combined
+ 560006 instcombine - Number of insts combined
2322 instcombine - Number of library calls simplified
4530 instcombine - Number of phi-of-extractvalue turned into extractvalue-of-phi
2 instcombine - Number of phi-of-insertvalue turned into insertvalue-of-phis
- 1717 instcombine - Number of reassociations
+ 1719 instcombine - Number of reassociations
24 instcombine - Number of select opts
458 instsimplify - Number of expansions
- 17844 instsimplify - Number of reassociations
- 2425103 ipt - Number of insts scanned while updating ibt
- 102509 ir - Number of renumberings across all blocks
+ 17846 instsimplify - Number of reassociations
+ 2424546 ipt - Number of insts scanned while updating ibt
+ 102517 ir - Number of renumberings across all blocks
140 jump-threading - Number of branch blocks duplicated to eliminate phi
- 8776 jump-threading - Number of jumps threaded
- 6600 jump-threading - Number of terminators folded
- 51881 lcssa - Number of live out of a loop variables
+ 8779 jump-threading - Number of jumps threaded
+ 6606 jump-threading - Number of terminators folded
+ 51874 lcssa - Number of live out of a loop variables
5 licm - Number of add/subtract expressions reassociated and hoisted out of the loop
304 licm - Number of call insts hoisted or sunk
446 licm - Number of geps reassociated and hoisted out of the loop
- 59211 licm - Number of instructions hoisted out of loop
+ 59221 licm - Number of instructions hoisted out of loop
1300 licm - Number of instructions sunk out of loop
221 licm - Number of invariant BinaryOp expressions reassociated and hoisted out of the loop
2 licm - Number of invariant int expressions reassociated and hoisted out of the loop
564 licm - Number of load and store promotions
- 10957 licm - Number of load insts hoisted or sunk
+ 10966 licm - Number of load insts hoisted or sunk
531 licm - Number of load-only promotions
8 licm - Number of min/max expressions hoisted out of the loop
2658 licm - Number of promotion candidates
1314 local - Number of PHI's that got CSE'd
- 5025 local - Number of unreachable basic blocks removed
+ 4864 local - Number of unreachable basic blocks removed
1768 loop-delete - Number of loops deleted
94 loop-delete - Number of loops for which we managed to break the backedge
156 loop-idiom - Number of memcpy's formed from loop load+stores
1 loop-idiom - Number of memmove's formed from loop load+stores
457 loop-idiom - Number of memset's formed from loop stores
1 loop-idiom - Number of uncountable loops recognized as 'shift until zero' idiom
- 1849 loop-instsimplify - Number of redundant instructions simplified
+ 1846 loop-instsimplify - Number of redundant instructions simplified
62 loop-peel - Number of loops peeled
28409 loop-rotate - Number of instructions cloned into loop preheader
28 loop-rotate - Number of loops not rotated due to the header size
- 10892 loop-rotate - Number of loops rotated
+ 10891 loop-rotate - Number of loops rotated
75 loop-simplify - Number of nested loops split out
62 loop-simplifycfg - Number of loop blocks deleted
14 loop-simplifycfg - Number of loop exiting edges deleted
@@ -225,19 +226,19 @@
43 memcpyopt - Number of memcpys converted to memset
5 memcpyopt - Number of memmoves converted to memcpy
1601 memcpyopt - Number of memsets inferred
- 29829 memdep - Number of block queries that were completely cached
+ 29835 memdep - Number of block queries that were completely cached
174 memdep - Number of cached, but dirty, non-local ptr responses
- 2283554 memdep - Number of fully cached non-local ptr responses
+ 2282925 memdep - Number of fully cached non-local ptr responses
147 memdep - Number of fully cached non-local responses
- 1423784 memdep - Number of uncached non-local ptr responses
+ 1424029 memdep - Number of uncached non-local ptr responses
188 memdep - Number of uncached non-local responses
28622 memory-builtins - Number of arguments with unsolved size and offset
- 29952 memory-builtins - Number of load instructions with unsolved size and offset
+ 29962 memory-builtins - Number of load instructions with unsolved size and offset
41 reassociate - Number of expr tree annihilated
- 28562 reassociate - Number of insts reassociated
+ 28556 reassociate - Number of insts reassociated
9 reassociate - Number of multiplies factored
- 14865 scalar-evolution - Number of loop exits with predictable exit counts
- 28529 scalar-evolution - Number of loop exits without predictable exit counts
+ 14863 scalar-evolution - Number of loop exits with predictable exit counts
+ 28544 scalar-evolution - Number of loop exits without predictable exit counts
283 scalar-evolution - Number of loops with trip counts computed by force
299 sccp - Number of arguments constant propagated
3544 sccp - Number of basic blocks unreachable
@@ -249,7 +250,7 @@
3 simple-loop-unswitch - Number of switches unswitched
88 simple-loop-unswitch - Number of unswitch candidates that had their cost multiplier skipped
39 simple-loop-unswitch - Number of unswitches that are trivial
- 166289 simplifycfg - Number of blocks simplified
+ 166262 simplifycfg - Number of blocks simplified
5572 simplifycfg - Number of branches folded into predecessor basic block
4033 simplifycfg - Number of common instruction 'blocks' hoisted up to the begin block
1967 simplifycfg - Number of common instruction 'blocks' sunk down to the end block
@@ -265,7 +266,7 @@
27496 sroa - Maximum number of uses of a partition
767455 sroa - Number of alloca partition uses rewritten
162506 sroa - Number of alloca partitions formed
- 190692 sroa - Number of allocas analyzed for replacement
+ 190686 sroa - Number of allocas analyzed for replacement
160979 sroa - Number of allocas promoted to SSA values
778893 sroa - Number of instructions deleted
6 sroa - Number of loads rewritten into predicated loads to allow promotion
```
https://github.com/llvm/llvm-project/pull/101298
More information about the llvm-commits
mailing list