[llvm] [SHT_LLVM_FUNC_MAP][ObjectYaml]Introduce function address map section and emit dynamic instruction count(ObjectYaml part) (PR #124332)

Lei Wang via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 11 21:08:58 PST 2025


================
@@ -535,6 +535,27 @@ Example of BBAddrMap with PGO data:
    .uleb128  1000                         # BB_3 basic block frequency (only when enabled)
    .uleb128  0                            # BB_3 successors count (only enabled with branch probabilities)
 
+``SHT_LLVM_FUNC_MAP`` Section (function address map)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+This section stores the mapping from the binary address of function to its
+related metadata features. It is used to emit function-level analysis data and
+can be enabled through ``--func-map`` option. The fields are encoded in the
+following format:
+
+#. A version number byte used for backward compatibility.
----------------
wlei-llvm wrote:

>Yes, this is my revised suggestion as an alternative. This may not be ideal, because it impedes size reductions through gc-sections/COMDAT deduplication etc as it leaves dead entries in the data.

>Whether you adopt either of these approaches or stick with the original design really needs to be a decision that you as clients of the functionality make. Keep in mind that having more data will make it slower to read and write the data. Functionality like gc-sections can help improve this, at a cost about what the section format might look like.

@jh7370  Sorry for late reply and thank you for the detailed clarification, that was super helpful!
I've now done the single section approach you suggested, it does work to emit the good data!(IIUC, the first suggestion might rely on features that aren’t ready yet) 
Then to understand the tradeoffs, I ran some experiments to compare the original design(duplicated version filed) vs the one single section design. I ran them on one of our top services(big size binary, contains 1M+ functions), I noticed one significant diff in finial binary's section size.

- Original design: 25MB.
- Single section design: 69MB. 

It's 2~3X more size, which I think that's due to the dead entries(missing gc-sections). For other overheads, I think that's not a significant factor for our system. For build time, as for our major services, the build time could take 30mins+ time, the extra linking time for the section is too small to measure. And the disk/network overhead is fine for the small intermediate elf obj size increase. But for the finial binary size, given we could extend more data, that means for each additional data, it would cost 2 ~ 3X more(dead entry) size, which I feel could be a problem for long run. Given this, I'm leaning towards the original design. What do you think?

https://github.com/llvm/llvm-project/pull/124332


More information about the llvm-commits mailing list