[PATCH] D85628: [HotColdSplitting] Add command line options for supplying cold function names via user input.

Thu Aug 13 08:00:26 PDT 2020

hiraditya added a comment.

In D85628#2214119 <https://reviews.llvm.org/D85628#2214119>, @jfb wrote:

> In D85628#2213940 <https://reviews.llvm.org/D85628#2213940>, @rjf wrote:
>
>> In D85628#2213919 <https://reviews.llvm.org/D85628#2213919>, @vsk wrote:
>>
>>> I’m not convinced this is a good idea. In what use case is it not possible to mark up relevant functions? It doesn’t make sense to me to make alternations to standard library functions within the compiler. It seems better to simply patch the standard library. In some cases llvm does infer function attributes for library functions, but these are generally lower level attributes that can’t be specified at the source level, and the attribute is made available to other passes in the pipeline.
>>
>> Do you mean this patch isn't a good idea in general, or the recent revision isn't a good idea? For the latter, I'm not sure if you meant we should not outline declarations or we should not split the original loop into two (e.g. marking as cold before outlining). IMO splitting the loop into two simply addresses what the original intent of what we're doing, which is to mark certain functions as cold before outlining. Whereas, if we don't outline declarations via user-provided input, it renders @hiraditya 's proposed testcase useless. Alternatively, we don't have to make the testcase involving standard library functions if that's what you want :).
>
> My understanding is that today code can be considered "cold" based on the following:
>
> 1. Attribute on the function
> 2. Likely / unlikely annotations
> 3. Profile information
> 4. Other compiler heuristics
>
> This adds another way to do it, but it's kind of a side-injection and it doesn't seem particularly principled. Presumably the list you're feeding through the command-line comes from a profile? Why isn't it provided as profile information?

Let me try to formulate the problem statement to motivate this work. I'm happy to work on a better approach.
Let's consider a repository which builds multiple applications. For App1 we have set of cold callsites (CS1), and cold function declarations (FD1); similarly for App2 we have CS2 and FD2. These sets have the following properties:

- CS1 and CS2 may have some intersection but one may not be necessarily a subset of another. a non-intersecting example would be: calling std::lower_bound in a loop vs. calling in an isolated instance. std::lower_bound could be cold in the latter case.
- FD1 and FD2 may have some intersection but one may not be necessarily a subset of another. a non-intersecting example would be: constructing std::unordered_map<string, string> vs. constructing a std::string. std::string::string() could be cold in the latter case.

It may not be possible to get profile information with sample-profile or instrumented-profile (e.g., mobile phone apps), however, product developers would know hotness/coldness of many call-sites based on domain knowledge.
In order to optimize these call-sites, how do we tell compiler the about these FDs and CSs? Adding annotations like `__attribute__((cold))` to FD1 could regress App2 and vice-versa.
Supplying a human readable/editable file would be ideal.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85628/new/

https://reviews.llvm.org/D85628