[PATCH] D43578: -ftime-report switch support in Clang
Michael Zolotukhin via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Apr 16 01:23:03 PDT 2018
mzolotukhin added a comment.
> What kinds of <some part> would be useful to you? (How do you want the runtime of Clang broken down?) Are vertical slices (through all Clang's various layers and potentially parts of LLVM) -- as this patch will produce -- useful, or would you really want horizontal slices (as in, what part of Clang is actually spending the time)? Or just anything that's basically expected to be consistent from run to run, so you can identify that something has slowed down, even if you don't have enough information to really know what?
For me "something has slowed down" would be enough. I.e. even if "parse templates" would be erroneously attributed to 90% time spent in front-end, I would be able to see a jump from 90% to 95%. While these numbers are not reflecting the actual ratio, they still will indicate changes.
I would find horizontal slicing more interesting, as we can get vertical slices from profilers. I.e. if in LLVM profile I see that time spent in APInt::add has grown a lot, I'd like to know which pass started to use it more extensively - it's rarely the case that APInt::add slowed down itself.
> I think we need to fix the overlap issue as a prerequisite to adding timers with large amounts of overlap, especially self-overlap. Otherwise the numbers will likely do more harm than good, as they will significantly misattribute runtime. Fortunately, I think that should only require some relatively small changes to the LLVM timer infrastructure.
I definitely would support that :)
> Well, that depends on the profiler. With some profilers, you can just attach a profiler to your entire compilation and then ask it what the hottest functions were. But sure, if you have existing infrastructure built around -ftime-report, I can see that it makes sense to reuse that infrastructure.
Yeah, that was exactly my point.
> While LLVM may have some overlap between regions, the vertical timing slices are still pretty well aligned with the horizontal functionality slices. I expect the problems in Clang to be a lot worse. Suppose you enter N levels of parsing templates, and then trigger a whole bunch of template instantiation, AST deserialization, and code generation. Let's say that takes 1s in total. With this patch, I think we'd end up attributing Ns of compile time to "parsing templates", which is clearly very far from the truth, but will likely be listed as (by far) the slowest thing in the compilation. This does not even rise to the level of "not-perfect", it's going to be actively misleading in a lot of cases, and won't even necessarily point anywhere near the problematic spot.
> I think with this patch and no fix to the overlap problem, we will find ourselves frequently fielding bugs where people say "X is slow" when it actually isn't. Plus I think we have the opportunity to deliver something that's hugely useful and not much harder to achieve (providing timing information that relates back to the source code), and I'd prefer we spend our complexity budget for this feature there.
I see your point, and I agree it would be really misleading if e.g the sum of timers won't match the total time due to self-overlaps. I think adding these timers still might be worthwhile as my guess is that no one usually uses them anyway unless they know what to expect from the reported timers, but I might be mistaken here. And anyway, if you think it's possible to fix the issue first, it's totally fine with me.
Thanks,
Michael
https://reviews.llvm.org/D43578
More information about the cfe-commits
mailing list