[PATCH] D85628: [HotColdSplitting] Add command line options for supplying cold function names via user input.

Fri Aug 14 13:28:29 PDT 2020

vsk added a comment.

In D85628#2214733 <https://reviews.llvm.org/D85628#2214733>, @wenlei wrote:

> I have two questions for this patch:
>
> - Why do we need another way to explicitly tell compiler that some functions are cold, rather than using existing mechanisms?
>   - If this is for a random cold function from user code, then using profile or in-source annotation should be the way to go, which is more scalable and more sustainable.
>   - If this is for special (generated) functions, e.g. __cxa_guard_acquire, then we shouldn't even need to tell the compiler. It's like we don't have to tell compiler EH pad, noreturn functions are cold, instead compiler should figure this out by itself.  _cxa_guard_acquire is a generated function with very specific semantic, so I think it's similar to noreturn, EH pads in that regard.
> - If we have to tell compiler extra hotness/coldness info, why do we do it just HCS? hotness is very general, and could benefit many opts. Introducing a channel to tell specific optimization about hotness is not good design in general (imagine 10 passes each has its own way of getting hotness through a bunch of switches). If we really have to do this, I'd say we should do it in the framework, e.g. the static analysis part of BFI.

+ 1. Thanks for writing this up, this reflects my thinking about the patch.

> In order to optimize these call-sites, how do we tell compiler the about these FDs and CSs? Adding annotations like __attribute__((cold)) to FD1 could regress App2 and vice-versa. Supplying a human readable/editable file would be ideal.

I don't think supplying a list of cold functions to a compiler invocation is a scalable way to approach this problem. This looks like an attempt to hand-annotate individual call sites via a side channel, and it's likely too big (& imprecise) of a hammer to apply.  E.g., after adding a cold function list to a program's cflags, performance may regress if the program picks up a hot use of a function in the list. It seems like an actual profile would be a better fit for your use case (if instrumentation/sampling is not possible, then it may be possible to construct a synthetic/fake profile as a pre-processing step?, but that has the same issues with source drift).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85628/new/

https://reviews.llvm.org/D85628