<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/155766>155766</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[BOLT][DWARF] DWO files size bloating when BOLT updates DWOs via DWP
</td>
</tr>
<tr>
<th>Labels</th>
<td>
debuginfo,
BOLT
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Jinjie-Huang
</td>
</tr>
</table>
<pre>
Sorry for the interruption. We are currently trying to adopt BOLT for internal production use, so we may reach out for BOLT-related discussions more frequently in the near future...
This issue mainly aims to illustrate a case we encountered regarding the strategy BOLT uses when updating debuginfo based on DWP. Currently, it seems that the .debug_str.dwo section inside each DWO is directly copied from the DWP ([code](https://github.com/llvm/llvm-project/blob/main/bolt/lib/Rewrite/DWARFRewriter.cpp#L1772)), unlike other sections that are handled via "getSliceData / getOverridenSection". When there are many DWO files, this can cause significant bloat in their sizes, and re-generating the DWP using llvm-dwp also becomes much more time-consuming.
A case we met:
- The project contains ~1000 source code files, and the size of final dwp file we get is 718MB (mainly containing **229MB of ".debug_str.dwo"** + 355MB of ".debug_info.dwo" + 134MB of others).
- After BOLT updates the debuginfo via this DWP file, we end up with ~1000 ".dwo.dwo" files of almost the same size, each about 229 MB(**229MB of ".debug_str.dwo"** + several KB of ".debug_info.dwo" and others). In total, this adds up to over 200 GB(318 x) of additional disk space usage (1000 × 229 MB).
I believe that if the project has more source files, the bloating could become even more severe. So can we consider the following options:
1. Ideally, identify the debug_str references in each DWO, slice them, and then emit them accordingly. But this could be very time-consuming?
2. Another idea might be to use an in-memory llvm-dwp, though there seem to be implementation challenges — as far as I know, llvm-dwp currently doesn’t provide an in-memory serialization interface.
3. Perhaps we could try emitting .debug_str.dwo only in the first DWO, skipping the copies in the subsequent DWOs, and finally rely on llvm-dwp to merge them back into the DWP?
@ayermolo @rafaelauler @dwblaikie Do you have any comments on this? Thank you.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyUVk2P47gR_TX0pdCCTFm2dfDBbsfJJDuYwU4DfQwosiRxmyIdftjxHPLbg6Jke2cOCwzQQINGFVnv1atXEiHo3iLuWH1g9XEhUhyc3_1T2z80vvwjCdsvWqduu2_O-xt0zkMcELSN6H06R-1sAe8IwiPI5D3aaG4Q_U3bHqIDodw5wuHLb285N-dZYeDsnUqS0iEFZPwVgoMrwihu4FHIAVyKOYVyXzwaEVGB0kGmELSzAUbnETqP_0nTo9rm0iwKD12KyWNRFKzcs3L_NugAOoRED2hrbiD0GKg-bUwK0YuIIECKgFQEWukSVYoKPPbCq4xmQJhC-9uEKAUMcB3QQjorESlIYZt6bTsHrQiowFk4vn8t4PXODUHVEQIiFTCImO8tct6_Q_SFujoIOFGjbdAKIfNxfP8COoDSHiXBle6sUUHn3ZivOL5_Bca3rD5Ip5DVR8a3Q4znwKo94yfGT72OQ2oL6UbGT8Zc7v9ezt79gTIyfmqNaxk_EUl0coZ-NJp--x2vXkdk_HR83_9-P_pCns-MV78tNxvOeJP_XiFZoz8QXBzQ39HMcEkpg7DKoIKLFsA47zF-M1riUUQ6n6DH-OWC3muF9tuUzTgv4J24pjsnwY3C3jIvnTYY6N1IjZbCghQpIJC2daelsBFa40ScRaI9BP19ShGWmvzSo0U_9fDOZgp0ygyp6xmECQ5alG7EAGOSw6TAqEd8kc6GNGrbz4LbP7Q0YqQGlHt4gbcBYeYapLNRaBvgf8uyLCG45CUCte6JhkrLqtPfEVwHnabRoVoohG7vMZIoNsvt5wN1f1b3fDmVz_ie8T3nzecDXUE0_qA1xvkUAowfAKq6_jmQ1DxH5phltZpCcncD400xwdt3Ef08GTQPGHL1z5GgducOEbsEgUDmeVOQznDVcZjpyK9fH89mRuhJYUYXppEJYpyYoUvyhIiWPIPzBj4faBJ-CXnAC3ph4F9_gZ768QQNnyxEF4V5CE8oFQhIdOAu6IGXJfydSqmWW_gv402GoJQmQVMndfiAcBYSIQXRI3Vwgv9asWbzgNIUMMnqE7RoNF5wGiXdZSbukhrEbIqzmP40FTjJnwQhXTJq1jHgBe2cQ_ixgG8uz8-VtJjdZzL8zhnjrpTvsumHSdTLAj4pFGb2NYU26u72bDtxDR479GglBhq_u5llz6epp-jxT3K3gKPOPR5BSOmy-5pbAYcU5wGfIcAF_e2nAWTViZV7XsDeTu6jFQoYdT9EyoiOXBsEeevLiKPzt8eET1S51A-zx5BJU0aLoMezwRFtFNmY5SCMQdtjAPY3zrYla1YgAnTC079P8GHdle57uMdzOSqHwd6zmkjtu5DL_1BTQK-F0d_FvAci-k5IpEmrCviKfhDnMHWJuIj-llnLHf5pmTj7XI6d9iE-2P_Q5_Pd8PI6Cfe4kNowLVYKfphRNiBDG9rcaLU90EUHI_p-6iW0Qn5Qze5upVNT6G9Vihv60RkHbFV60Qk0Ihn0dFTX1gj9oRGODm4uwSAuxAsZ2kjkB3qUJMCqE7wNwn5QWLFQu0o1VSMWuFtu6nW1rupVtRh2Ta1kJZu23rR8qeR6jdtm1a3LVSl5u8XNQu94yetyy7flelWvqqJp-UZtu3rdVG0jVpKtShyFNgVBLZzvF_kzYres6816vTCiRRPyxxPnD6PLzvLKOCcvpEN9XPhdJqtNfWCr0ugQw_POqKPJn2A5oT6y-pCXLKuPz_02rYHHHOfvjh_MljqVPfb4_nWRvNn98vrP2ALjpxneZcf_HwAA__9y1mdf">