[llvm] [FunctionAttrs] deduce attr `cold` on functions if all CG paths call a `cold` function (PR #101298)

via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 31 09:05:21 PDT 2024


goldsteinn wrote:

> > Is this potentially acceptable?
> 
> Generally yes, second-order regressions are fine. Of course, it depends on the details. How does the presence of cold attributes affect things -- is this "just" because of substantially different inlining decisions, or something else?

So looking at the diff in stats from before/after (diff below) seems like some small changes across the board, no number really jumps out. Inlining does have a notable (ish) change with less inlines after the change. Although not sure how to read the numbers to interpret whats leading to the compile time regression. 

```diff
@@ -1,6 +1,6 @@
-         1289338 aa                           - Number of MayAlias results
-          174178 aa                           - Number of MustAlias results
-         5817260 aa                           - Number of NoAlias results
+         1289144 aa                           - Number of MayAlias results
+          174197 aa                           - Number of MustAlias results
+         5816357 aa                           - Number of NoAlias results
            31029 abstract-call-sites          - Number of direct abstract call sites created
               96 abstract-call-sites          - Number of invalid abstract call sites created (no callback)
              392 abstract-call-sites          - Number of invalid abstract call sites created (unknown use)
@@ -11,9 +11,9 @@
              220 argpromotion                 - Number of dead pointer args eliminated
             1275 argpromotion                 - Number of pointer arguments promoted
            33178 assume-queries               - Number of Queries into an assume assume bundles
-         7841528 basicaa                      - Number of times a GEP is decomposed
+         7843168 basicaa                      - Number of times a GEP is decomposed
            13619 basicaa                      - Number of times the limit to decompose GEPs is reached
-            3895 bdce                         - Number of instructions removed (unused)
+            3899 bdce                         - Number of instructions removed (unused)
               22 bdce                         - Number of instructions trivialized (dead bits)
              194 bdce                         - Number of sign extension instructions converted to zero extension
             2091 build-libcalls               - Number of arguments inferred as nocapture
@@ -31,64 +31,64 @@
              614 build-libcalls               - Number of functions inferred as willreturn
              130 build-libcalls               - Number of functions inferred as writeonly
               44 callsite-splitting           - Number of call-site split
-           59752 capture-tracking             - Number of pointers maybe captured
-           20192 capture-tracking             - Number of pointers maybe captured before
-           12565 capture-tracking             - Number of pointers not captured
-            8428 capture-tracking             - Number of pointers not captured before
-             861 constraint-elimination       - Number of instructions removed
+           59767 capture-tracking             - Number of pointers maybe captured
+           20202 capture-tracking             - Number of pointers maybe captured before
+           12570 capture-tracking             - Number of pointers not captured
+            8431 capture-tracking             - Number of pointers not captured before
+             860 constraint-elimination       - Number of instructions removed
               69 correlated-value-propagation - Number of ands removed
-              49 correlated-value-propagation - Number of ashr converted to lshr
+              46 correlated-value-propagation - Number of ashr converted to lshr
                6 correlated-value-propagation - Number of bound udiv's/urem's expanded
-             420 correlated-value-propagation - Number of comparisons propagated
-            8595 correlated-value-propagation - Number of function pointer arguments marked non-null
+             424 correlated-value-propagation - Number of comparisons propagated
+            8591 correlated-value-propagation - Number of function pointer arguments marked non-null
               61 correlated-value-propagation - Number of llvm.[us]{min,max} intrinsics removed
                6 correlated-value-propagation - Number of llvm.abs intrinsics removed
               73 correlated-value-propagation - Number of llvm.s{min,max} intrinsics simplified to unsigned
-             977 correlated-value-propagation - Number of no-signed-wrap deductions
-             718 correlated-value-propagation - Number of no-signed-wrap deductions for add
+             976 correlated-value-propagation - Number of no-signed-wrap deductions
+             717 correlated-value-propagation - Number of no-signed-wrap deductions for add
               23 correlated-value-propagation - Number of no-signed-wrap deductions for mul
              121 correlated-value-propagation - Number of no-signed-wrap deductions for shl
              115 correlated-value-propagation - Number of no-signed-wrap deductions for sub
-            1177 correlated-value-propagation - Number of no-unsigned-wrap deductions
-             805 correlated-value-propagation - Number of no-unsigned-wrap deductions for add
+            1176 correlated-value-propagation - Number of no-unsigned-wrap deductions
+             804 correlated-value-propagation - Number of no-unsigned-wrap deductions for add
               26 correlated-value-propagation - Number of no-unsigned-wrap deductions for mul
              243 correlated-value-propagation - Number of no-unsigned-wrap deductions for shl
              103 correlated-value-propagation - Number of no-unsigned-wrap deductions for sub
-            2149 correlated-value-propagation - Number of no-wrap deductions
-            1523 correlated-value-propagation - Number of no-wrap deductions for add
+            2147 correlated-value-propagation - Number of no-wrap deductions
+            1521 correlated-value-propagation - Number of no-wrap deductions for add
               49 correlated-value-propagation - Number of no-wrap deductions for mul
              364 correlated-value-propagation - Number of no-wrap deductions for shl
              213 correlated-value-propagation - Number of no-wrap deductions for sub
              306 correlated-value-propagation - Number of phis deleted via common incoming value
-            5790 correlated-value-propagation - Number of phis propagated
+            5780 correlated-value-propagation - Number of phis propagated
               43 correlated-value-propagation - Number of sdiv converted to udiv
               41 correlated-value-propagation - Number of sdivs/srems whose width was decreased
              839 correlated-value-propagation - Number of selects propagated
-            1235 correlated-value-propagation - Number of sext converted to zext
+            1234 correlated-value-propagation - Number of sext converted to zext
              432 correlated-value-propagation - Number of signed icmp preds simplified to unsigned
               23 correlated-value-propagation - Number of sitofp converted to uitofp
                1 correlated-value-propagation - Number of srem converted to urem
-              82 correlated-value-propagation - Number of switch cases removed
+              81 correlated-value-propagation - Number of switch cases removed
               42 correlated-value-propagation - Number of udivs/urems whose width was decreased
              364 correlated-value-propagation - Number of zext/uitofp non-negative deductions
-             645 count-visits                 - Max number of times we visited a function
-              27 deadargelim                  - Number of unread args removed
+             646 count-visits                 - Max number of times we visited a function
+              29 deadargelim                  - Number of unread args removed
              135 deadargelim                  - Number of unread args replaced with poison
               36 deadargelim                  - Number of unused return values removed
-          214229 dse                          - Number iterations check for reads in getDomMemoryDef
+          214203 dse                          - Number iterations check for reads in getDomMemoryDef
              464 dse                          - Number of other instrs removed
-             179 dse                          - Number of redundant stores deleted
+             180 dse                          - Number of redundant stores deleted
              201 dse                          - Number of stores dead by later partials
             5173 dse                          - Number of stores deleted
             5673 dse                          - Number of stores modified
-          195126 dse                          - Number of stores remaining after DSE
+          195185 dse                          - Number of stores remaining after DSE
             7139 dse                          - Number of times a valid candidate is returned from getDomMemoryDef
-           84533 early-cse                    - Number of GEP instructions CSE'd
+           84529 early-cse                    - Number of GEP instructions CSE'd
               32 early-cse                    - Number of call instructions CSE'd
              342 early-cse                    - Number of compare instructions CVP'd
-           37737 early-cse                    - Number of instructions CSE'd
+           37735 early-cse                    - Number of instructions CSE'd
           129710 early-cse                    - Number of instructions simplified or DCE'd
-           69942 early-cse                    - Number of load instructions CSE'd
+           69899 early-cse                    - Number of load instructions CSE'd
             2318 early-cse                    - Number of trivial dead stores removed
            56845 file-search                  - Number of #includes skipped due to the multi-include optimization.
           162113 file-search                  - Number of attempted #includes.
@@ -97,9 +97,10 @@
             6621 function-attrs               - Number of arguments marked readonly
              907 function-attrs               - Number of arguments marked returned
             2196 function-attrs               - Number of arguments marked writeonly
-             435 function-attrs               - Number of function returns marked noalias
+             437 function-attrs               - Number of function returns marked noalias
              447 function-attrs               - Number of function returns marked nonnull
              793 function-attrs               - Number of function returns marked noundef
+              84 function-attrs               - Number of functions marked as cold
             3442 function-attrs               - Number of functions marked as nofree
             5949 function-attrs               - Number of functions marked as norecurse
             6060 function-attrs               - Number of functions marked as nosync
@@ -127,87 +128,87 @@
             1772 globalsmodref-aa             - Number of global vars without address taken
               23 globalsmodref-aa             - Number of indirect global objects
             4928 gvn                          - Number of blocks merged
-            7133 gvn                          - Number of blocks speculated as available in IsValueFullyAvailableInBlock(), max
-            2022 gvn                          - Number of equalities propagated
+            7138 gvn                          - Number of blocks speculated as available in IsValueFullyAvailableInBlock(), max
+            2020 gvn                          - Number of equalities propagated
              856 gvn                          - Number of instructions PRE'd
-           62052 gvn                          - Number of instructions deleted
-           22145 gvn                          - Number of instructions simplified
-            7134 gvn                          - Number of loads PRE'd
-           11697 gvn                          - Number of loads deleted
+           62128 gvn                          - Number of instructions deleted
+           22152 gvn                          - Number of instructions simplified
+            7142 gvn                          - Number of loads PRE'd
+           11741 gvn                          - Number of loads deleted
              822 gvn                          - Number of loads moved to predecessor of a critical edge in PRE
              128 gvn                          - Number of loop loads PRE'd
              232 indvars                      - Number of IV comparisons eliminated
               26 indvars                      - Number of IV identities eliminated
                2 indvars                      - Number of IV remainder operations eliminated
-            8702 indvars                      - Number of IV sign/zero extends eliminated
+            8704 indvars                      - Number of IV sign/zero extends eliminated
                1 indvars                      - Number of IV signed remainder operations converted to unsigned remainder
                8 indvars                      - Number of IV users folded into a constant
             6990 indvars                      - Number of congruent IVs eliminated
              570 indvars                      - Number of exit values replaced
             7122 indvars                      - Number of indvars widened
             5410 indvars                      - Number of loop exit tests replaced
-           27730 inline                       - Number of functions deleted because all callers found
-          106114 inline                       - Number of functions inlined
-          132759 inline-cost                  - Number of call sites analyzed
+           27703 inline                       - Number of functions deleted because all callers found
+          105963 inline                       - Number of functions inlined
+          132883 inline-cost                  - Number of call sites analyzed
               27 instcombine                  - Negator: How many negations did we retrieve/reuse from cache
               99 instcombine                  - Negator: Maximal number of new instructions created during negation attempt
              611 instcombine                  - Negator: Maximal number of values ever visited while attempting to sink negation
              177 instcombine                  - Negator: Maximal traversal depth ever reached while attempting to sink negation
-           89540 instcombine                  - Negator: Number of negations attempted to be sinked
+           89555 instcombine                  - Negator: Number of negations attempted to be sinked
              197 instcombine                  - Negator: Number of negations successfully sinked
              225 instcombine                  - Negator: Number of new negated instructions created in successful negation sinking attempts
              262 instcombine                  - Negator: Number of new negated instructions created, total
-           94420 instcombine                  - Negator: Total number of values visited during attempts to sink negation
+           94435 instcombine                  - Negator: Total number of values visited during attempts to sink negation
             2655 instcombine                  - Number of PHI's that got CSE'd
               34 instcombine                  - Number of allocas copied from constant global
              335 instcombine                  - Number of constant folds
-          111540 instcombine                  - Number of dead inst eliminated
+          111544 instcombine                  - Number of dead inst eliminated
              220 instcombine                  - Number of dead stores eliminated
               13 instcombine                  - Number of expansions
              199 instcombine                  - Number of factorizations
-          163239 instcombine                  - Number of functions with one iteration
-           52240 instcombine                  - Number of functions with two iterations
-          215479 instcombine                  - Number of instruction combining iterations performed
+          163257 instcombine                  - Number of functions with one iteration
+           52234 instcombine                  - Number of functions with two iterations
+          215491 instcombine                  - Number of instruction combining iterations performed
            11604 instcombine                  - Number of instructions sunk
-          560025 instcombine                  - Number of insts combined
+          560006 instcombine                  - Number of insts combined
             2322 instcombine                  - Number of library calls simplified
             4530 instcombine                  - Number of phi-of-extractvalue turned into extractvalue-of-phi
                2 instcombine                  - Number of phi-of-insertvalue turned into insertvalue-of-phis
-            1717 instcombine                  - Number of reassociations
+            1719 instcombine                  - Number of reassociations
               24 instcombine                  - Number of select opts
              458 instsimplify                 - Number of expansions
-           17844 instsimplify                 - Number of reassociations
-         2425103 ipt                          - Number of insts scanned while updating ibt
-          102509 ir                           - Number of renumberings across all blocks
+           17846 instsimplify                 - Number of reassociations
+         2424546 ipt                          - Number of insts scanned while updating ibt
+          102517 ir                           - Number of renumberings across all blocks
              140 jump-threading               - Number of branch blocks duplicated to eliminate phi
-            8776 jump-threading               - Number of jumps threaded
-            6600 jump-threading               - Number of terminators folded
-           51881 lcssa                        - Number of live out of a loop variables
+            8779 jump-threading               - Number of jumps threaded
+            6606 jump-threading               - Number of terminators folded
+           51874 lcssa                        - Number of live out of a loop variables
                5 licm                         - Number of add/subtract expressions reassociated and hoisted out of the loop
              304 licm                         - Number of call insts hoisted or sunk
              446 licm                         - Number of geps reassociated and hoisted out of the loop
-           59211 licm                         - Number of instructions hoisted out of loop
+           59221 licm                         - Number of instructions hoisted out of loop
             1300 licm                         - Number of instructions sunk out of loop
              221 licm                         - Number of invariant BinaryOp expressions reassociated and hoisted out of the loop
                2 licm                         - Number of invariant int expressions reassociated and hoisted out of the loop
              564 licm                         - Number of load and store promotions
-           10957 licm                         - Number of load insts hoisted or sunk
+           10966 licm                         - Number of load insts hoisted or sunk
              531 licm                         - Number of load-only promotions
                8 licm                         - Number of min/max expressions hoisted out of the loop
             2658 licm                         - Number of promotion candidates
             1314 local                        - Number of PHI's that got CSE'd
-            5025 local                        - Number of unreachable basic blocks removed
+            4864 local                        - Number of unreachable basic blocks removed
             1768 loop-delete                  - Number of loops deleted
               94 loop-delete                  - Number of loops for which we managed to break the backedge
              156 loop-idiom                   - Number of memcpy's formed from loop load+stores
                1 loop-idiom                   - Number of memmove's formed from loop load+stores
              457 loop-idiom                   - Number of memset's formed from loop stores
                1 loop-idiom                   - Number of uncountable loops recognized as 'shift until zero' idiom
-            1849 loop-instsimplify            - Number of redundant instructions simplified
+            1846 loop-instsimplify            - Number of redundant instructions simplified
               62 loop-peel                    - Number of loops peeled
            28409 loop-rotate                  - Number of instructions cloned into loop preheader
               28 loop-rotate                  - Number of loops not rotated due to the header size
-           10892 loop-rotate                  - Number of loops rotated
+           10891 loop-rotate                  - Number of loops rotated
               75 loop-simplify                - Number of nested loops split out
               62 loop-simplifycfg             - Number of loop blocks deleted
               14 loop-simplifycfg             - Number of loop exiting edges deleted
@@ -225,19 +226,19 @@
               43 memcpyopt                    - Number of memcpys converted to memset
                5 memcpyopt                    - Number of memmoves converted to memcpy
             1601 memcpyopt                    - Number of memsets inferred
-           29829 memdep                       - Number of block queries that were completely cached
+           29835 memdep                       - Number of block queries that were completely cached
              174 memdep                       - Number of cached, but dirty, non-local ptr responses
-         2283554 memdep                       - Number of fully cached non-local ptr responses
+         2282925 memdep                       - Number of fully cached non-local ptr responses
              147 memdep                       - Number of fully cached non-local responses
-         1423784 memdep                       - Number of uncached non-local ptr responses
+         1424029 memdep                       - Number of uncached non-local ptr responses
              188 memdep                       - Number of uncached non-local responses
            28622 memory-builtins              - Number of arguments with unsolved size and offset
-           29952 memory-builtins              - Number of load instructions with unsolved size and offset
+           29962 memory-builtins              - Number of load instructions with unsolved size and offset
               41 reassociate                  - Number of expr tree annihilated
-           28562 reassociate                  - Number of insts reassociated
+           28556 reassociate                  - Number of insts reassociated
                9 reassociate                  - Number of multiplies factored
-           14865 scalar-evolution             - Number of loop exits with predictable exit counts
-           28529 scalar-evolution             - Number of loop exits without predictable exit counts
+           14863 scalar-evolution             - Number of loop exits with predictable exit counts
+           28544 scalar-evolution             - Number of loop exits without predictable exit counts
              283 scalar-evolution             - Number of loops with trip counts computed by force
              299 sccp                         - Number of arguments constant propagated
             3544 sccp                         - Number of basic blocks unreachable
@@ -249,7 +250,7 @@
                3 simple-loop-unswitch         - Number of switches unswitched
               88 simple-loop-unswitch         - Number of unswitch candidates that had their cost multiplier skipped
               39 simple-loop-unswitch         - Number of unswitches that are trivial
-          166289 simplifycfg                  - Number of blocks simplified
+          166262 simplifycfg                  - Number of blocks simplified
             5572 simplifycfg                  - Number of branches folded into predecessor basic block
             4033 simplifycfg                  - Number of common instruction 'blocks' hoisted up to the begin block
             1967 simplifycfg                  - Number of common instruction 'blocks' sunk down to the end block
@@ -265,7 +266,7 @@
            27496 sroa                         - Maximum number of uses of a partition
           767455 sroa                         - Number of alloca partition uses rewritten
           162506 sroa                         - Number of alloca partitions formed
-          190692 sroa                         - Number of allocas analyzed for replacement
+          190686 sroa                         - Number of allocas analyzed for replacement
           160979 sroa                         - Number of allocas promoted to SSA values
           778893 sroa                         - Number of instructions deleted
                6 sroa                         - Number of loads rewritten into predicated loads to allow promotion


```


https://github.com/llvm/llvm-project/pull/101298


More information about the llvm-commits mailing list