<div dir="ltr"><div dir="ltr">Hi Andrey,<div><br></div><div>I was actually just typing up a reply welcoming contributions and to suggest you give the existing profile support a try - I realized I need to add documentation for the usage to llvm/clang's docs which I will do soon but it sounds like you figured it out ok.</div><div><br></div><div>Some answers below.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jul 8, 2021 at 8:03 AM Andrey Bokhanko <<a href="mailto:andreybokhanko@gmail.com" target="_blank">andreybokhanko@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi Teresa,</div><div><br></div><div>One more thing, if you don't mind.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jul 6, 2021 at 12:54 AM Teresa Johnson <<a href="mailto:tejohnson@google.com" target="_blank">tejohnson@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>We initially plan to use the profile information to provide guidance to the dynamic allocation runtime on data allocation and placement. We'll send more details on that when it is fleshed out too. </div></div></blockquote><div> </div><div>I played with the current implementation, and became a bit concerned if the current data profile is sufficient for an efficient data allocation optimization.</div><div><br></div><div>First, there is no information on temporal locality -- only total_lifetime of an allocation block is recorded, not start / end times -- let alone timestamps of actual memory accesses. I wonder what criteria would be used by data profile-based allocation runtime to allocate two blocks from the same memory chunk?</div></div></div></blockquote><div><br></div><div>It would be difficult to add all of this information for every allocation and particularly every access without being prohibitively expensive. Right now we have the ave/min/max lifetime, and just a single boolean per context indicating whether there was a lifetime overlap with the prior allocation for that context. We can probably expand this a bit to have somewhat richer aggregate information, but like I said, recording and emitting all start/end times and timestamps will be an overwhelming amount of information. As I mentioned in my other response, initially the goal is to provide hints about hotness and lifetime length (short vs long) to the memory allocator so that it can make smarter decisions about how and where to allocate data.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div><br></div><div>Second, according to the data from [Savage'20], memory accesses affinity (= space distance between temporarily close memory accesses from two different allocated blocks) is crucial: figure #12 demonstrates that this is vital for omnetpp benchmark from SPEC CPU 2017.</div></div></div></blockquote><div><br></div><div>Right now we don't track this information. Part of the issue is that memory accesses themselves don't interact with the profile runtime library, but rather the code is instrumented to update shadow counters inline - this keeps the overhead reasonable. My understanding from reading the HALO paper and asking the authors at CGO is that the overheads are currently quite large (both the PIN-based runtime, and also the offline grouping algorithm), and it didn't support multithreaded applications yet.</div><div><br></div><div>Definitely interested in contributions or ideas on how we could collect richer information with the approach we're taking (allocations tracked by the runtime per context and fast shadow memory based updates for accesses).</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div><br></div><div>Said this, my concerns are based essentially on a single paper that employs specific algorithms to guide memory allocation and measures their impact on a specific set of benchmarks. I wonder if you have preliminary data that validates sufficiency of the implemented data profile for efficient optimization of heap memory allocations?</div></div></div></blockquote><div><br></div><div>I don't have anything I can share yet but we will do so in the future. For an idea of how lifetime based allocation would work, here's a related paper which used ML to identify context-sensitive lifetimes and used the info in a custom allocator:</div><div><br></div><div><a href="https://research.google/pubs/pub49008/" target="_blank">https://research.google/pubs/pub49008/</a><br></div><div>Maas, Martin & Andersen, David & Isard, Michael & Javanmard, Mohammad Mahdi & McKinley, Kathryn & Raffel, Colin. (2020). Learning-based Memory Allocation for C++ Server Workloads. Proceedings of the 25th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 541-556. 10.1145/3373376.3378525. </div><div> </div><div>Teresa</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div><br></div><div>References:</div><div><div>[Savage'20] Savage, J., & Jones, T. M. (2020). HALO: Post-Link Heap-Layout Optimisation. CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization, <a href="https://doi.org/10.1145/3368826.3377914" target="_blank">https://doi.org/10.1145/3368826.3377914</a></div><div><br></div></div><div>Yours,<br></div><div>Andrey</div><div><br></div></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><span style="font-family:Times;font-size:medium"><table cellspacing="0" cellpadding="0"><tbody><tr style="color:rgb(85,85,85);font-family:sans-serif;font-size:small"><td nowrap style="border-top:2px solid rgb(213,15,37)">Teresa Johnson |</td><td nowrap style="border-top:2px solid rgb(51,105,232)"> Software Engineer |</td><td nowrap style="border-top:2px solid rgb(0,153,57)"> <a href="mailto:tejohnson@google.com" target="_blank">tejohnson@google.com</a> |</td><td nowrap style="border-top:2px solid rgb(238,178,17)"><br></td></tr></tbody></table></span></div></div></div>
</div>