[PATCH] D42212: [ThinLTO] Add call edges' relative block frequency to per-module summary.

Fri Jan 19 16:08:47 PST 2018

tejohnson added inline comments.

================
Comment at: include/llvm/IR/ModuleSummaryIndex.h:58
   HotnessType Hotness = HotnessType::Unknown;
+  uint64_t RelBlockFreq = 0;

----------------
eraman wrote:
> tejohnson wrote:
> > Do we need 64 bits in practice? Especially since these are scaled down by the entry freq. Concerned about the size increase for large indices that have huge numbers of edges. Also, since we'll never have both the Hotness and the RelBlockFreq, it would be great if we could union them, although that would require a flag on the index and I suppose we might want to support mixing modules built with and without profile data? Perhaps some other way to combine them. I.e. use a single field and reserve the lowest 3 bits for the hotness, shifting the relative frequency into the upper n-3 bits...
> From our offline discussions I thought the use of 64 bits doesn't matter as they get compressed. Is the concern about in-memory size? I can reduce the size to, say, 29 bits (uint32_t with 3 bits reserved for the enum). Another thing to note is that only the per-module summary has this information.  Is there a way to make use of this?
Right - concerned about in-memory size. Why does only the per-module summary contain this? Isn't that just temporary - I thought the idea was to propagate this into the backends?

================
Comment at: lib/Analysis/ModuleSummaryAnalysis.cpp:63
+cl::opt<bool> WriteRelBFToSummary(
+    "write-relbf-to-summary", cl::Hidden, cl::init(false),
+    cl::desc("Write relative block frequency to function summary "));
----------------
eraman wrote:
> tejohnson wrote:
> > Suggest moving this to the bitcode writer and always adding RelBlockFreq into the in-memory index here.
> > 
> > Is there any reason to default off? Why not just default on from the get-go?
> Until this information actually gets used (first propagate the counts and then use the counts in the inliner), this just poses extra compile time and increased summary size if not turned off. Alternatively, I can drop this patch and then send it as part of a larger patch that builds the callgraph and propagates the counts.
How long do you expect until the other patches to be sent? If not too long I don't know that it matters too much since we are going to incur the extra space/memory/time in the near term anyway. Better to have it tested.

Repository:
  rL LLVM

https://reviews.llvm.org/D42212