[PATCH] D17555: [Feedback requested] Implement cold spliting

Wed Feb 24 13:04:28 PST 2016

silvas added a subscriber: silvas.
silvas added a comment.

I can see why this would help iTLB/paging, but I'm not grokking why it would help icache very much compared to per-function machine block placement ensuring that the cold stuff ends up at the end on a separate cacheline (does MBP already do that?). In fact (playing devil's advocate) the MBP approach could be more beneficial because it could allow branches to be relaxed to smaller encodings.

The scenarios I can see this being a substantial win for icache over MBP is when you e.g. have two functions with 1.5 cachelines of hot text (and say 1 cacheline of cold text). With MBP, each function would end up using ceiling(1.5) = 2 cachelines for the hot and one cacheline for the cold, but with the splitting the linker would see 2x 1.5 cacheline hot + 2x 1 cachline cold and so you could put the two 1.5's together and only use 3 cachelines for the hot part. How often does that occur (and does the linker actually manage to exploit this?).
Since the benefit is based on the "rounding", we save at most just under ("just under" is determined by the text alignment) one cacheline every time we can pack these densely. The benefit is at most #hotFunctions * (sizeof(Cacheline) - alignof(Function)) text size for the hot working set.

That being said, this kind of low-level function splitting is a really powerful tool and I fully support adding it, but I agree with Mehdi that I'd like to see some supporting benchmark results.

http://reviews.llvm.org/D17555