[cfe-dev] RFC: Clang-Misexpect Proposal

Fri Aug 16 11:09:16 PDT 2019

On Fri, Aug 16, 2019 at 8:52 AM Francis Visoiu Mistrih <francisvm at yahoo.com>
wrote:

> +Adam
>
> > On Aug 15, 2019, at 5:59 PM, Vedant Kumar via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
> >
> >
> >
> >> On Jul 24, 2019, at 7:00 PM, Paul Kirth via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
> >>
> >>
> >>
> >> On Wed, Jul 24, 2019 at 5:42 PM Roman Lebedev <lebedev.ri at gmail.com>
> wrote:
> >> On Thu, Jul 25, 2019 at 3:24 AM Paul Kirth <paulkirth at google.com>
> wrote:
> >> >
> >> >
> >> >
> >> > On Wed, Jul 24, 2019 at 3:14 PM Roman Lebedev <lebedev.ri at gmail.com>
> wrote:
> >> >>
> >> >> On Thu, Jul 25, 2019 at 1:06 AM Paul Kirth via cfe-dev
> >> >> <cfe-dev at lists.llvm.org> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Wed, Jul 24, 2019 at 6:31 AM Michael Kruse <
> cfe-dev at meinersbur.de> wrote:
> >> >> >>
> >> >> >> Sounds like a useful tool.
> >> >> >>
> >> >> >
> >> >> > Thanks! I certainly hope it will be.
> >> >> >
> >> >> >>
> >> >> >> Could it be enough for clang to emit warnings in PGO mode when it
> >> >> >> detects a mismatch between collected profile data and
> >> >> >> __builtin_expect() ?
> >> >> >>
> >> >> >
> >> >> > You are right: it might be sufficient to simply gate these
> warnings behind a flag in the PGO pipeline. I think there is a good
> argument for doing it this way. In fact, it's how my prototype works right
> now, so we have certainly considered this option.
> >> >> >
> >> >> > However, I think it will still be desirable to have a standalone
> tool that doesn't require the full compilation pipeline. My understanding
> was that clang-tools are often aimed at this exact use-case: i.e. a custom
> driver and partial compilation pipeline.
> >> >> >
> >> >> > Another important point is that compiler warnings are, by and
> large, accurate. Our approach to identifying these mismatches is subject to
> some imprecision due to a deficiency in the collected profile.
> >> >> Can those be simply reported as normal Remarks?
> >> >> Those can later be visualized over the codebase via opt-viewer.
> >> >>
> >> >
> >> > Could you clarify the context here for me? I'm not sure if you mean:
> >> >
> >> > 1. using remarks in place of warnings in the PGO pipeline, but not a
> standalone tool.
> >> > 2. using remarks in place of warnings in both a standalone tool and
> in the PGO pipeline.
> >> > 3. using remarks in place of warnings in the PGO pipeline, with no
> standalone tool (so only a compiler flag).
> >> >
> >> > If we're only talking about case 1, then I think that could be a good
> compromise w.r.t using warnings in the compiler.
> >> >
> >> > I'm not sure I see an advantage to using remarks in a standalone
> tool(scenario 2), or as the sole method of reporting(scenario 3). I'm not
> even sure if that was what you meant, but I'm happy to hear more about the
> approach if I have missed something.
> >>
> >> I think i'm failing to catch the subtle difference between 1. and 3.
> >> I was indeed suggesting to *not* introduce a separate tool,
> >> just emit the Remark (when some specific new diag group is not disabled)
> >> when the PGO profile is specified/being used by clang.
> >>
> >>
> >> Sorry for the lack of clarity.
> >>
> >> In 1, in addition to the new flag in clang you still would have a
> standalone tool that emits warnings -- which I think is OK since it isn't
> the compiler itself and has a narrow, well defined scope.
> >>
> >> In 3, it's just as you describe: only an extension to the compiler
> based on Remarks, without any standalone tool.
> >>
> >> In my view, using Remarks here doesn't feel like the right tradeoff.
> They are tremendously useful for compiler developers, but I don't think the
> average user of clang is familiar with them, or should be.
> >
> > As far as I'm aware, opt-remarks are actually intended for a general
> audience, not just compiler developers. I've cc'd some folks who've
> explored various ways to surface remarks / make them user-friendly. I
> actually think they're pretty user-friendly already -- e.g. there's support
> in tree for surfacing opt-remarks up via clang's DiagnosticEngine (i.e.
> they can look like warnings, get serialized, show up in an IDE, etc).
> >
> >
> >
> >> My exposure to Remarks has been exclusively as a diagnostic tool to
> understand the reasoning behind which optimizations have occurred or to
> debug something within the compiler. Maybe this is an aspect of the
> toolchain I'm unfamiliar with, but I can't think of another time we
> interface with developer using Remarks rather than directly giving them
> feedback.
> >>
> >> I also worry that if this information is only available in Remarks,
> then it will never be used, since it is kind of hidden. On the other hand,
> having a standalone tool fits well with existing workflows, and makes it
> easy for developers to try out.
> >
> > If memory serves, remarks can be surfaced to the user by passing an
> extra compiler flag. That seems even easier than using a separate tool?
>
> Some documentation about how remarks are surfaced is available here:
> http://llvm.org/docs/Remarks.html. I tend to agree with Vedant here.
>
> >
> > Also, istm that a new standalone tool will start off 'hidden' too.
> >
> >
> >> Other than my misgivings about issuing warnings based on a heuristic,
> is there any advantage to using Remarks over warnings?
> >
> > I think so. Using remarks can expose users to other important
> performance hints from the compiler, beyond what is/can-be surfaced by a
> standalone misexpect tool.
> >
> > That's why it seems to me that the remarks infrastructure is the perfect
> fit for the hints you are describing.
>
> I agree here as well. With remarks, you can see how the “misexpect”
> affected optimizations by seeing other remarks describing missed or passed
> optimizations.
>
> Is there anything lacking in the remarks “infrastructure” that makes them
> not fit (or harder to use) for your use case?
>
>
In the case of a standalone tool, I think issuing warnings is the right
approach. Warning a developer about the impact of something that they wrote
vs a decision that the compiler made also played a role in my preference
for a warning.

To me the remark didn't seem serious enough. From the discussion, I think
that I simply had a poor understanding of the role remarks play within the
compiler.

I already suggested this to Vedant, but maybe a good approach is to support
remarks and a warning.

> Thanks,
>
> —
> Francis
>
> >
> > best,
> > vedant
> >
> >>
> >> >>
> >> >> > When we issue a warning in clang-misexpect, it means one of two
> things: (i) your annotation is incorrect, or (ii) your profile data is
> insufficient. While it at least signals to the developer that investigation
> is warranted, it is a somewhat unsatisfactory feedback. I think our best
> effort warnings are a better fit for a clang-tool because of that. To me,
> warnings in the main compiler have a feeling of more significance and a
> higher burden of correctness. For something less precise or not generally
> applicable, a clang-tool appears to be the right place to implement this.
> This seems to be consistent with existing tools, like clang-tidy, that
> enable more warnings/diagnostics than are available in the main compiler.
> >> >> >
> >> >> > That being said, we're happy to take feedback from the community.
> So if the general consensus is that this should be part of the PGO
> pipeline, then we can rescope our proposal to only add a new flag to clang.
> >> >> >
> >> >> > Or maybe the right thing to do is to have both? I'm not exactly
> sure, which is one of the reasons we made this RFC.
> >> >> >
> >> >> >>
> >> >> >> Michael
> >> >> >>
> >> >> >>
> >> >> >> Am Di., 23. Juli 2019 um 22:06 Uhr schrieb Paul Kirth via cfe-dev
> >> >> >> <cfe-dev at lists.llvm.org>:
> >> >> >> >
> >> >> >> > We would like to propose adding a new clang based tool,
> clang-mispredict, for identifying potentially incorrect uses of
> __builtin_expect().
> >> >> >> >
> >> >> >> > --- TLDR ---
> >> >> >> > What are we proposing?
> >> >> >> > * Adding a new tool, clang-mispredict, to the LLVM project.
> >> >> >> >
> >> >> >> > What is its purpose?
> >> >> >> > * Identifying potentially problematic uses of
> __builtin_expect().
> >> >> >> >
> >> >> >> > Why is this important?
> >> >> >> > * __builtin_expect() has a high performance cost when it is
> used incorrectly.
> >> >> >> >
> >> >> >> > Where will the code live?
> >> >> >> > * clang-mispredict will live in clang-tools-extra, and there
> will be some additions to clang.
> >> >> >> >
> >> >> >> > Open questions
> >> >> >> > * What is the correct heuristic for determining a mismatch?
> >> >> >> >
> >> >> >> > Potential shortcomings
> >> >> >> > * The approach is sensitive to both profiling input and
> configuration.
> >> >> >> >
> >> >> >> >
> >> >> >> > --- Details ---
> >> >> >> >
> >> >> >> > We plant to build a new tool, clang-mispredict, on top of
> Clang’s PGO infrastructure. This tool will issue warnings about the use of
> _builtin_expect() when collected PGO profile data shows a mismatch with the
> use of __builtin_expect(). Existing solutions for validating these
> annotations generally follow this approach: leverage dynamic profiling to
> validate existing annotations. This is the approach used in the Linux
> kernel, for example. Unfortunately, existing solutions seem to be custom
> efforts specific to a particular code base. Supporting this in the LLVM
> ecosystem gives developers a general way to check the use of
> _builtin_expect() without creating custom instrumentation.
> >> >> >> >
> >> >> >> > Additionally, if _builtin_expect() is used incorrectly, then
> developers may notice a performance regression when switching to the new
> PM. This happens because the New PM follows performance annotations more
> closely than the legacy PM, and is therefore more likely to try and
> aggressively optimize these cases. Our proposed tool can help developers
> narrow their focus to potentially problematic areas during the transition.
> >> >> >> >
> >> >> >> > Finally, it may also be beneficial to suggest annotations when
> profiling suggests it would be beneficial. We consider this a desirable
> property not only from a performance standpoint, but also for identifying
> potential bugs (both in the profiling corpus and within the codebase).
> >> >> >> >
> >> >> >> > We think having clang-mispredict behave similar to clang-tidy
> is good approach. We can use a compile commands database in conjunction
> with profiling data to drive our validation tool. More concretely, we plan
> to use Clang’s existing PGO infrastructure to emit warnings where branch
> weights are assigned, but stop compilation before the IR is passed to the
> backend. This largely amounts to checking if the profile counters are
> outside of certain thresholds for 'hot' and 'cold' code. Similar to clang
> tidy, we plan to support suppressing warnings within the codebase through
> use of comment strings.
> >> >> >> >
> >> >> >> > The most obvious question about this approach is how to
> quantify what is ‘hot’ or ‘cold’. Initially, we plan to use the thresholds
> already present in LLVM; however, we are happy to use any suitable
> heuristic or empirically derived threshold. Input from the community about
> the correct heuristic are most welcome.
> >> >> >> >
> >> >> >> > Our strategy does have some shortcomings. The usefulness of the
> warnings are directly related to how representative the profiling input
> actually was when compared to its normal use. If the profile collected is
> not representative of typical use, then the warnings may not reflect the
> ground truth. Program and build configuration can also dramatically change
> which paths will be executed, thus affecting the quality of the profile
> data and the generated warnings.
> >> >> >> >
> >> >> >> > Ultimately, we are proposing adding a new standalone tool that
> takes a compile commands database and an instrumentation profile to emit
> warnings in a clang-tidy fashion.
> >> >> >> >
> >> >> >> > Please let us know if you have comments or concerns about this
> proposal.
> >> >> >> >
> >> >> >> > Thanks!
> >> >> >> >
> >> >> >> > --
> >> >> >> > Paul Kirth
> >> >> >> > _______________________________________________
> >> >> >> > cfe-dev mailing list
> >> >> >> > cfe-dev at lists.llvm.org
> >> >> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Paul Kirth
> >> >> > _______________________________________________
> >> >> > cfe-dev mailing list
> >> >> > cfe-dev at lists.llvm.org
> >> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >> >>
> >> >> Roman
> >> >
> >> >
> >> >
> >> > --
> >> > Paul Kirth
> >>
> >> Roman
> >>
> >>
> >> --
> >> Paul Kirth
> >> _______________________________________________
> >> cfe-dev mailing list
> >> cfe-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>

-- 
Paul Kirth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190816/a2913ff9/attachment.html>