[PATCH] D98061: [InstrProfiling] Generate runtime hook for ELF platforms

Tue Mar 9 16:37:27 PST 2021

vsk added a comment.

In D98061#2615386 <https://reviews.llvm.org/D98061#2615386>, @phosek wrote:

> In D98061#2615334 <https://reviews.llvm.org/D98061#2615334>, @vsk wrote:
>
>> In D98061#2615250 <https://reviews.llvm.org/D98061#2615250>, @phosek wrote:
>>
>>> In D98061#2615239 <https://reviews.llvm.org/D98061#2615239>, @vsk wrote:
>>>
>>>> @ributzka may have stronger thoughts about when -fprofile-instr-generate must imply that a known set of symbols appear with external visibility. Up until now, the answer has been "always", and this is what tapi enforces for MachO. It's awkward to have this be inconsistent between MachO/ELF, but if there's a compelling performance reason then I think it's fine.
>>>
>>> From the perspective of Fuchsia, we've seen a noticeable impact of this change when using `-fprofile-instr-generate` together `-fprofile-list` to apply instrumentation selectively only to modified files.
>>
>> What kind of impact do you see? If #counters > 0, is it mostly binary size cost? If #counters == 0, what's the avg. overhead of writing out the full profile?
>
> It depends a bit on the runtime and the platform. In Fuchsia where we always use the continuous mode, there's quite a bit of upfront cost (see https://github.com/llvm/llvm-project/blob/75f3f778052cdcd89e79f7a42a50915ee5ce2281/compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c#L109), we need to allocate a memory object, map it into address space, publish it, write some additional information into the log.
>
> While it may not be so bad for a single binary, over hundreds of tests it adds up and with this change we saw the total test execution time go down from 30 to 18 minutes. There are other steps we're taking, like eliminating the need for logging, but that's unlikely to eliminate all of the overhead.
>
>> Can it be fixed by doing an early-exit in the runtime initializer, writing out an empty .profraw?
>
> I considered that initially, but that's less efficient than the approach implemented here, especially when it comes to binary size.

I don't follow where the binary size cost comes from (is it the cost of writing out many empty .profraw headers?), but it sounds like the 30 -> 18min speed up is achieved by not registering essentially an empty profile with the VM, so it does follow that unconditionally writing out an empty .profraw won't work as well as disabling the runtime initializer entirely.

>> That raises a question about tooling support: some workflows (like the Xcode coverage plugin) might assume that a program compiled with -fprofile-instr-generate always creates a .profraw. If there's no profile written at all for the #counters == 0 case, that could be a breaking change.
>
> That's a good point, would it be better to put this behind a (frontend or backend) flag?

I don't think it should be an option because I have doubts about how discoverable it'd be. My preference would be to add a section to the clang coverage docs explaining the different guarantees for .profraw emission.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98061/new/

https://reviews.llvm.org/D98061