[PATCH] D128382: [LLD] Two tweaks to symbol ordering scheme
YongKang Zhu via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jul 8 14:09:23 PDT 2022
yozhu added a comment.
In D128382#3639492 <https://reviews.llvm.org/D128382#3639492>, @MaskRay wrote:
> In D128382#3639411 <https://reviews.llvm.org/D128382#3639411>, @yozhu wrote:
>
>> In D128382#3639367 <https://reviews.llvm.org/D128382#3639367>, @MaskRay wrote:
>>
>>> In D128382#3639333 <https://reviews.llvm.org/D128382#3639333>, @smeenai wrote:
>>>
>>>> In D128382#3637828 <https://reviews.llvm.org/D128382#3637828>, @yozhu wrote:
>>>>
>>>>> In D128382#3637822 <https://reviews.llvm.org/D128382#3637822>, @MaskRay wrote:
>>>>>
>>>>>> In D128382#3623746 <https://reviews.llvm.org/D128382#3623746>, @yozhu wrote:
>>>>>>
>>>>>>> In D128382#3620661 <https://reviews.llvm.org/D128382#3620661>, @MaskRay wrote:
>>>>>>>
>>>>>>>> In D128382#3611295 <https://reviews.llvm.org/D128382#3611295>, @yozhu wrote:
>>>>>>>>
>>>>>>>>> In D128382#3611207 <https://reviews.llvm.org/D128382#3611207>, @MaskRay wrote:
>>>>>>>>>
>>>>>>>>>> In D128382#3611193 <https://reviews.llvm.org/D128382#3611193>, @yozhu wrote:
>>>>>>>>>>
>>>>>>>>>>> In D128382#3610276 <https://reviews.llvm.org/D128382#3610276>, @MaskRay wrote:
>>>>>>>>>>>
>>>>>>>>>>>> In D128382#3609648 <https://reviews.llvm.org/D128382#3609648>, @lanza wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> One interesting note Shoaib and I just discovered is that ~16.5% of the libraries in one of our major apps have `.text` sections starting on the first page of the dso. Placing hot sections at the front of the `.text` section would give us a slice of a "free page" that, with the current algorithm, would just be cold code that's less likely to be used. I imagine this isn't a substantial difference but it's probably slightly in favor of placing hot code first here.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks. This is fine, even if not that large. The patch should use `osec->flags & SHF_EXECINSTR` instead of `executableInputSections` for simplification. Then the complexity/benefit trade-off will look fine.
>>>>>>>>>>>
>>>>>>>>>>> I didn't check output section's attribute but instead check on each input section because:
>>>>>>>>>>>
>>>>>>>>>>> - It could be that the output section contains .text (RX) contributions as well as .rdata (R) contributions, like read-only data is merged into text section.
>>>>>>>>>>
>>>>>>>>>> This implies a linker script placing rodata into an executable output section.
>>>>>>>>>> This is a corner case and the operation here does not matter. Just be simple.
>>>>>>>>>
>>>>>>>>> Even if it is a rare or corner case, it is still a valid case. Counting `executableInputSections` doesn't add many cycles. Why not just cover all the possible cases?
>>>>>>>>
>>>>>>>> For input rodata in an output executable section, since the for loop doesn't precisely track whether an input section is executable or not, it is as if all rodata are executable as well.
>>>>>>>> Your `executableInputSections` rule doesn't cover more cases.
>>>>>>>
>>>>>>> Input section's attribute can be precisely tracked since the linker has no reason to change input section's attribute once it is read from input OBJ. For example, if we have source code like
>>>>>>>
>>>>>>> __attribute__((section(".text.rox")))
>>>>>>> const int x = 5;
>>>>>>>
>>>>>>> __attribute__((section(".text.row")))
>>>>>>> const int w = 5;
>>>>>>>
>>>>>>> int foo() { return x; }
>>>>>>> int bar() { return w; }
>>>>>>>
>>>>>>> and we compile it with `-fdata-sections -ffunction-sections` and then link the OBJ using a linker script like
>>>>>>>
>>>>>>> SECTIONS
>>>>>>> {
>>>>>>> ttt 0x1000 :
>>>>>>> {
>>>>>>> tt.o (.text.*)
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> Then in the `for` loop we will see two input sections are executable and the other two are not.
>>>>>>
>>>>>> I have mentioned that this is a corner case we don't necessarily try to optimize. It takes code.
>>>>>
>>>>> As @lanza mentioned above that `~16.5% of the libraries in one of our major apps have .text sections starting on the first page of the dso`, this is actually **not** a corner case for us. And given how much code are already there for symbol ordering in the linker, this change doesn't add that much overhead or complexity.
>>>>>
>>>>>> Note: even if some linker scripts combine rodata and code sections in one output section, such an output section is typically far smaller than `target->getThunkSectionSpacing()`
>>>>>
>>>>> Why typically it has to be far smaller? The linker doesn't restrict the use of linker scripts for combining RO and X sections to form a **large** output section. And typically code plus read-only data takes significant percentage of binary size.
>>>
>>> My comment discusses how likely a user combines rodata and code in one output section. It's not about the extent of all rodata and code output sections.
>>
>> Should we cover all the valid scenarios if possible? I don't think it is good practice to leave something to just cover most, especially the cost of covering all isn't that much as one would reasonably be concerned.
>
> We are using heuristics here. And I don't think there is a heuristic which can cover all valid scenarios.
> `totalSize >= target->getThunkSectionSpacing()` is a useful condition. That is what (@smeenai mentioned) I don't fundamentally object to the idea.
> When that is satisfied, if the output section has code sections, it is almost assured that the number of input code sections is greater than 1, so it's not really necessary to test `executableInputSections > 1`.
> Again, there are linker scripts combining rodata and code sections, but they only do this for very small output sections. I use my experience reading many linker scripts here.
Heuristics doesn't have to be perfect, I agree with that. However, if with small tweaks we can make the heuristics cover more valid cases, then I think it is worth the effort. ELF linker supports multiple executable output sections, and supports merging differently named input sections into one output section. Why we can't make the linker bullet proof for extra scenarios (even if they seem to be less common) with **simple and low overhead** tweak?
Your bar of complexity and simplicity seems unreasonably high.
>>>>>> I don't think it is useful to special case `executableInputSections > 1`
>>>>>
>>>>> Well, it may not be useful to you or Google, but this is useful to us (as mentioned above) and this is the linker for open source community (not just you or your team). What are the rules here to determine what is useful and what not?
>>>>
>>>> To clarify, I don't think there's an objection to the change in general now (per @MaskRay's earlier comment after @lanza posted the 16.5% number). It's just a question of whether to count individual executable input sections or use the output section's executable-ness to make the ordering determination.
>>>
>>> Right. The code can spend fewer lines to address the issue. This is one of the moments I just wanted to commandeer the patch and satisfying the user needs instead of wasting more and more time arguing.
>>
>> Indeed we have spent quite some time arguing about this. I don't agree with you on "corner case", "not useful", or "fewer lines of code"; and I really don't understand why you push back so hard on this change. Before you requested another revision one week ago the only remaining unsettled point was whether we need to track each input section because you think their attributes cannot be reliably tracked, then I showed you with an example that this is not the case. Then I think it should be all good then.
>
> Answered in the previous paragraph.
>
>> (from a previous comment) Well, it may not be useful to you or Google, but this is useful to us (as mentioned above) and this is the linker for open source community (not just you or your team). What are the rules here to determine what is useful and what not?
>>
>> It is simply fixing a flaw in the original ordering scheme and the new additional logic doesn't add too much complexity or overhead compared to what is already there for mixing cold and hot contributions. Please don't abuse the administrator rights.
>
> I don't view the problem this way as I just see a patch with unneeded complexity. I judge a patch by its own merit, not where the author comes from. If you check many patches I am reviewing/have reviewed, it's very common for my to "Request Changes" even if the author works at Google.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D128382/new/
https://reviews.llvm.org/D128382
More information about the llvm-commits
mailing list