[PATCH] Add warning capabilities in LLVM (backend part)

Sat Nov 16 11:17:41 PST 2013

----- Original Message -----
> From: "Alp Toker" <alp at nuanti.com>
> To: "Hal Finkel" <hfinkel at anl.gov>, "Quentin Colombet" <qcolombet at apple.com>
> Cc: reviews+D2190+public+af649f7838db3d64 at llvm-reviews.chandlerc.com, "Chandler Carruth" <chandlerc at gmail.com>, "llvm
> commits" <llvm-commits at cs.uiuc.edu>
> Sent: Saturday, November 16, 2013 11:45:50 AM
> Subject: Re: [PATCH] Add warning capabilities in LLVM (backend part)
> 
> Hal, that's a lot of engineering work to suggest beyond what
> Quentin's
> patch sets out to achieve.
> 
> I like your ideas and do see this evolving in the direction of rich,
> strongly typed diagnostic info but it doesn't seem necessary to block
> the original patch on that.
> 
> We haven't seen the clang side the work yet and I have a feeling it'd
> be
> better to keep the interface simple and flexible until the use cases
> are
> in place and better understood.
> 
> For one thing, the tablegen and support code for diagnostics in clang
> is
> fairly modular, and could be pulled down into LLVM if there's ever an
> appetite for that sort of thing. It'd be a shame to jump through
> hoops
> to get it done in this early iteration.
> 
> So it'll be useful to actually see the kinds of LLVM-to-clang
> diagnostic
> output this achieves in practice and take it from there -- for or all
> we
> know, things might not work out or might evolve in a direction other
> than piped diagnostics, in which case the engineering would have been
> in
> vain. What do you think?

Fair enough. I think it makes sense to do some of this now (and I'm certainly willing to help with this, I feel strongly about making sure that this is done right). Arranging the diagnostics into classes is essentially separate from the infrastructure for printing, and I agree that breaking this into smaller pieces makes sense. Let's setup the infrastructure for the diagnostics as classes now (which I think actually makes the code for generating diagnostics simpler), with the printing method serializing into a raw_string_ostream, and then we can work on generalizing the printing interface as followup-work in coordination with frontend support.

Thanks,
Hal

> 
> Alp.
> 
> 
> 
> On 16/11/2013 17:05, Hal Finkel wrote:
> > ----- Original Message -----
> >> From: "Quentin Colombet" <qcolombet at apple.com>
> >> To: "Hal Finkel" <hfinkel at anl.gov>
> >> Cc:
> >> reviews+D2190+public+af649f7838db3d64 at llvm-reviews.chandlerc.com,
> >> "llvm commits" <llvm-commits at cs.uiuc.edu>,
> >> "David Blaikie" <dblaikie at gmail.com>, "Eric Christopher"
> >> <echristo at gmail.com>, "renato golin"
> >> <renato.golin at linaro.org>, "Chandler Carruth"
> >> <chandlerc at gmail.com>
> >> Sent: Friday, November 15, 2013 7:51:46 PM
> >> Subject: Re: [PATCH] Add warning capabilities in LLVM (backend
> >> part)
> >>
> >> Hi Hal,
> >>
> >> Thanks for the feedbacks.
> >>
> >> I have replied within your email, but here is the overall approach
> >> I
> >> am tempted to pursue after looking at all your comments.
> >>
> >> A quick reminder first, as discussed in the RFC, the idea was that
> >> "StringData+Kind" can be used to format something nice in clang,
> >> but
> >> that "msg" fully covers it for clients that don't know the Kind
> >> enum.
> >>
> >> I would like to keep something similar but, I think it would make
> >> sense to provide something more structured for StringData,
> >> especially for statically known case.
> >>
> >> Basically, I would go for a DiagnosticInfo class, whose subclasses
> >> would provide the expected fields easily to avoid boxing/unboxing
> >> overhead. The based class would provide one virtual method:
> >> toStr(),
> >> which would implement the default message in each subclass
> >> (possibly
> >> an argument of the subclass constructor).
> >>
> >> Then, based on the Kind+Severity the front-end can statically cast
> >> this DiagnosticInfoClass to the right type.
> >> In particular, in the case of a InlineAsmDiagnosticInfo, this
> >> class
> >> would hold a message and a location and the front-end would do
> >> what
> >> it was doing with the InlineAsmDiag.
> > I think that you're on the right track with this, and I feel that
> > this is essentially the right solution. You should set this up
> > just like other class hierarchies in LLVM, so that we can use the
> > cast/isa/dynamic_cast templates with them. A good (simple) example
> > of how to do this in SCEV: llvm/Analysis/ScalarEvolution.h defines
> > SCEV, and then llvm/Analysis/ScalarEvolutionExpressions.h defines
> > SCEV's subclasses. There is a enum that defines the type IDs,
> > which is stored in the base class, and then each subclass
> > overrides classof as appropriate.
> >
> > Doing it this way, the 'Kind' gets replaced by the type ID, and so
> > you don't explicitly need it in the interface. You might also want
> > to make Severity a member of this diagnostic class as well. At
> > this point, the interface is very simple, is type safe, and does
> > not involve the frontend parsing of string data (which is
> > something I'd really like to avoid).
> >
> > Regarding the messages, let me explain what I'd like to avoid:
> >
> >   1. Defining a new string-list serialization format: First, the
> >   first thing we'd end up doing was defining a function in the
> >   frontend to parse this out into an array (or hash table or
> >   whatever), so we might as well pass it that way. Second, the
> >   parsing would end up being non-trivial: We'd need a way to
> >   escape the separator, and then we'd need to define a way of
> >   handling binary data (maybe hex encode it?). All of this gets
> >   complicated, and so we'd need more utility functions on the
> >   encoding side as well. Not to mention that, with this design,
> >   the frontend is always limited to exactly the information the
> >   backend decides to serialize (and I'm not sure that's the best
> >   setup). In any case, I think that with your proposal above, this
> >   can be fortunately irrelevant: all of this data can be passes as
> >   class member of the diagnostic subclass, and we don't need this
> >   serialization at all.
> >
> >   2. Practical duplication of similar code in the frontend and
> >   backend. Thinking through how this would work in practice, for
> >   every new diagnostic, I'd first add code to the backend to
> >   produce the default message and store that same data somehow:
> >
> >      std::string DataStorage;
> >      raw_string_ostream Data(DataStorage);
> >      Data << Something << "#" << SomethingElse << "#" <<
> >      hexencode(AThirdThing);
> >
> >      std::string MsgStorage("This is not quite what you has in mind
> >      (");
> >      raw_string_ostream Msg(MsgStorage);
> >      Msg << Something << " which calls " << SomethingElse << " with
> >      a buffer of size " <<
> >             AThirdThing.size() << ") in " << Fn.getName();
> >      F->getContext().report(..., Data.str(),
> >      LLVMContext::RS_Warning, Msg.str());
> >
> > Then I'd need to go into Clang and add code somewhere that looks
> > like this:
> >
> >    SmallVector<StringRef> DataStrings;
> >    unparse_diag_data(Data, DataStrings);
> >
> >    S.Diag(...) << "This is not quite what you has in mind (" <<
> >    DataStrings[0] <<
> >                   " which calls " << demangle(DataStrings[1]) << "
> >                   with a buffer of size " <<
> >                     DataStrings[2].size() << ")";
> >
> > [It would not quite be like this, the fixed string parts of the
> > message would be assigned a message id with %1, %2, etc. inside,
> > but you get the idea.]
> >
> > For a diagnostic generated from multiple places in the backend,
> > we'd end up defining utility functions to generate the text and
> > the data string. We'd still need to keep the frontend and the
> > backend in sync, which would be a pain.
> >
> > So what I'd like is for the diagnostic class itself to have a
> > virtual print() method that uses a printer class to do something.
> > This method takes as an argument a printer class (which has
> > virtual overloads of operator << for the basic types and for Value
> > *, SCEV *, MachineInst *, etc.). There is a default printer class
> > (which just uses raw_string_ostream), but the frontend can provide
> > one that does other things. The idea is that, in Clang, we can
> > construct one of these printer implementations that forwards
> > things to Clang's DiagnosticBuilder (in
> > include/clang/Basic/Diagnostic.h). For Clang, for example, the
> > overload that prints a Value* will check to see if it is a
> > Function*, for example, and then print the demangled function
> > name, or if it is an Instruction *, might do something special if
> > it has srcloc metadata, etc. Can the frontend check for a
> > particular diagnostic class, dyn_cast it, and then directly use
> > its data? Sure, but I doubt that will be the common case.
> >
> > I think that, if we do it this way, we can have an easy-to-maintain
> > system that lets us add backend diagnostics without touching the
> > frontend each time, and, at the same time, let's us produce
> > backend diagnostics that *look like* frontend diagnostics (which
> > as many of the niceties of frontend diagnostics as possible).
> >
> > Thanks again,
> > Hal
> >
> >> In the “Other” case, the subclass can return an array of Value*
> >> and a
> >> formatted string. The default implementation of
> >> OtherDiagnosticInfo::toString would walk over this array and dump
> >> the value at the right place.
> >>
> >> However, I do not want to push the array of Value* and the
> >> formatted
> >> string into the generic class. My goal is to keep the
> >> formatting/localization burden for known cases in the front-end,
> >> like we already do.
> >>
> >> ** New Proposal **
> >>
> >> - Define a class DiagnosticInfo that has one virtual method
> >> toStr().
> >> - Have a subclass of DiagnosticInfo per Kind+Severity.
> >> - Have the front-end knows the formatting of each Kind+Severity
> >> and
> >> use the right fields of the DiagnosticInfo subclass accordingly.
> >> - For “Other" cases, the front-end either use
> >> OtherDiagnosticInfo::toStr or use the pair ArrayRef<Value*>,
> >> StringRef Format.
> >>
> >> Finally the reporting API would be:
> >> void LLVMContext::report(DiagnosticInfo *, Kind, Severity)
> >> The default message would be given by toStr().
> >>
> >> Thoughts?
> >>
> >> On Nov 15, 2013, at 12:06 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> >>
> >>> ----- Original Message -----
> >>>> From: "Quentin Colombet" <qcolombet at apple.com>
> >>>> To: dblaikie at gmail.com, echristo at gmail.com, "renato golin"
> >>>> <renato.golin at linaro.org>, chandlerc at gmail.com,
> >>>> qcolombet at apple.com
> >>>> Cc: llvm-commits at cs.uiuc.edu, hfinkel at anl.gov
> >>>> Sent: Friday, November 15, 2013 1:18:22 PM
> >>>> Subject: Re: [PATCH] Add warning capabilities in LLVM (backend
> >>>> part)
> >>>>
> >>>>
> >>>>   Hi Hal,
> >>>>
> >>>>   Thanks for the feedback.
> >>>>   I suggest that you read the RFC regarding the choices for the
> >>>>   interface.
> >>>>   What you are proposing has IIRC already been discussed.
> >>> In part, yes, but not exactly, and I think it makes sense to have
> >>> a
> >>> more-targeted discussion now that we have an implementation to
> >>> play with.
> >> Fair enough :).
> >>
> >>>>   Let me try to summarize why we did not choose what you are
> >>>>   proposing:
> >>>>   1. srcloc metadata.
> >>>>   We cannot anticipate what would be reported and in particular,
> >>>>   a
> >>>>   source location may not be adapted. Assume you want to report
> >>>>   why
> >>>>   the vectorizer failed to optimize a given loop. Having only
> >>>>   one
> >>>>   source location will not be enough. So why should we be bound
> >>>>   to
> >>>>   one source location?
> >>> No, my point is:
> >>>
> >>> 1. We need the srcloc data to deal with inline asm messages (this
> >>> is an existing requirement).
> >> I know, but it does not mean we need an explicit argument for
> >> that,
> >> al least not for the front-end call-back.
> >> Indeed, we could provide in the “reporting API’ an additional
> >> prototype with the srcloc (unsigned, not MDNode*) or the
> >> Instruction, like for LLVMContext::emitError, that maps to the
> >> proposed LLVMContext::report encoding the srcloc in the data
> >> argument.
> >>
> >> That way, we are not bound to the one srcloc requirement, and can
> >> still support several lines requirement.
> >>
> >>> 2. Because the mechanism is completely general (although
> >>> currently
> >>> used only for inline asm), it makes sense for it to be a
> >>> first-class part of the interface.
> >> The mechanism is not supposed to be used by inline asm only. We
> >> use
> >> it for the internal stack-size warning, which does not report a
> >> srcloc (although this may be a mistake, I am the one to blame at
> >> the
> >> moment :)) and there are several spots where we call
> >> LLVMContext::emitError without a srcloc.
> >> Therefore, I am not sure the srcloc must be part of the interface
> >> or
> >> of the generic DiagnosticInfo I am proposing.
> >>
> >>> Might we want to extend this to deal with more general situation
> >>> later? Maybe. But we need at least one srcloc for inline asm
> >>> messages, and we can discuss how to deal with multiple-location
> >>> messages as we have concrete use cases (although if we have a
> >>> value array, we can also put them in there).
> >> I am a bit reluctant to go in that direction: i.e., specializing
> >> the
> >> API for InlineAsm, then thinking generalization if needed.
> >> In particular, I do not like the “if needed” part.
> >>
> >> To me, we may already not be aware of the generalization needs, I
> >> am
> >> thinking plug-ins. That said, I prefer to have the general API,
> >> even
> >> if it means spending more, useful :), time discussing, than
> >> introducing a new API that we will have to support for an unknown
> >> amount of time.
> >>
> >> If you look at what we currently have, the existing InlineAsmDiag
> >> API
> >> works well for our purpose but has been stuck to that purpose. I
> >> would like to directly address the general case, otherwise, why do
> >> not we just extend the InlineAsmDiag?
> >>
> >> Regarding your point for Values, I agree that if we want to pass
> >> Values the current API is not great.
> >> I think what I am proposing with the DiagnosticInfo could get the
> >> consensus here.
> >>
> >>>>   2. StringRef vs. ArrayRef<Value*>
> >>>>   First I will quote David Blaikie and Chris Lattner on
> >>>>   StringRef
> >>>>   vs.
> >>>>   ArrayRef<StringRef>:
> >>>>   David: “Presumably there are diagnostics that are going to
> >>>>   have
> >>>>   more than one parameter, though. ArrayRef<StringRef>?”
> >>>>   Chris: “It's possible, but the full generality of any
> >>>>   diagnostic
> >>>>   can be folded (at some loss of generality) into a single
> >>>>   string.
> >>>>    To balance complexity and generality vs simplicity, "1" is a
> >>>>   pretty decent number.”
> >>> Fair enough, and in terms of ArrayRef<StringRef> I completely
> >>> agree, but:
> >>>
> >>> 1. We already have a message string (and I don't think that we
> >>> need
> >>> more).
> >> They have different purpose:
> >> - One carries a default printing message.
> >> - The other carries the information to build a user report.
> >>
> >> Let me try to illustrate the concept.
> >> Let us assume we are reporting a loop that cannot be optimized by
> >> the
> >> vectorizer.
> >> The default message could be: “Cannot optimize loop <llvm internal
> >> label> because of weird data dependencies”.
> >> The StringData could be:
> >> “<Function>##<SrcLocOfLoopHeader>##<dataDepencyKind>#<SrcLoc1>#<SrcLoc2>#<SrcLoc3>##<dataDepencyKind>#<SrcLoc1>#<SrcLoc3>##”.
> >>
> >> The front-end would then produce a much more user friendly
> >> message,
> >> or point-out the interesting part in an IDE.
> >>
> >> The introduction of the DIagnosticInfo intends to avoid the
> >> serialization/deserialization overhead.
> >>
> >>> 2. Value* also gives us constants, metadata, and essentially
> >>> everything else. Even the backend often has values from things in
> >>> the MMO structure.
> >> Agree, but do we want to pay the boxing/unboxing cost?
> > To be honest, I think this is irrelevant. Error reporting is not a
> > hot-path operation: the goal is to make the messages as useful as
> > possible to the user.
> >
> >>> 3. The backend is the wrong place to do string formatting for
> >>> user
> >>> consumption. Should it demangle function names too? ;)
> >> No, of course not :).
> >> The point of the default string was not really for user
> >> consumption
> >> but as a *default* message if the front-end is not able or does
> >> not
> >> want to produce something from the StringData.
> >>
> >>>>   Now, why using StringRef instead of Value, I see two reasons
> >>>>   for
> >>>>   that:
> >>>>   - I believe we do not want the diagnostic system of the
> >>>>   front-end
> >>>>   to have to deal with the IR.
> >>> - The IR is the interface between the frontend and the backend.
> >>> Will the frontend know what to do with some arbitrary Value*?
> >>> Probably not. That's not the point. That Value* can be a
> >>> ConstantInt, or a MDNode (maybe a srcloc, maybe a loop id, some
> >>> debug metadata, etc.), a global variable, and many other things.
> >>> Only the frontend knows how to format those things reasonably in
> >>> the context of the source language.
> >> Using my running example, you have to use tricks in the array of
> >> Value* (like in the StringData, BTW) to explain to the front-end
> >> that the first 3 values are for the first dependency and the next
> >> two other for the second dependency.
> >> My concern here is that an array may not be structured enough for
> >> nice exposure with variable length arguments for instance.
> >> We can argue that if we feature a formatting string this is not a
> >> problem, but I do not like that because I think we rather have the
> >> formatting string into the front-end for known report.
> >>
> >> I understand that my example is kind of hand waving, but my goal
> >> is
> >> to expose these problematics so that we can come up with the best
> >> solution. I certainly do not pretend to have the right design in
> >> mind :).
> >>
> >>
> >>>>   - What happens for a report coming for the MC level (e.g.,
> >>>>   AsmParser), IIRC, there is no Value involved here.\
> >>> - The backends often do have values (from MMO), and can certainly
> >>> use IR interfaces to create constants, metadata, etc. as
> >>> necessary. I'd rather pass constants as constants, than having to
> >>> parse them again later out of a string.
> >> You made a point :).
> >>
> >>>>   3. Default message printer.
> >>>>   The idea of the default message printer is to provide a
> >>>>   directly
> >>>>   usable message.
> >>> - I think that we should provide a utility function that will
> >>> take
> >>> the string + the value array and produce a default serialization
> >>> of the message. Any frontend that does not really care too much
> >>> (or opt, for example), can use that.
> >> Assuming everything fits into a Value, that could work.
> >> However, I am not a huge fan of that, because the producer of the
> >> report has to do some formatting/localization. (See my example
> >> where
> >> the default message is more like an assert message with no
> >> parameters).
> >>>> I believe we do not what to push in every
> >>>>   front-end a parsing of the default message.
> >>> - I agree, but we really should give them that option.
> >> Well, like I said, I am not a huge fan, but if we can reach a
> >> consensus on that, why not.
> >>
> >>> Moreover, like I said,
> >>>>   I think we may not be able to produce Value objects every
> >>>>   time.
> >>> - I disagree, we can always produce value objects -- value
> >>> objects
> >>> can be metadata strings, and are strictly more expressive than
> >>> strings.
> >> Fair enough, but again, I am not sure we want to pay the boxing
> >> cost
> >> and that it is structured enough for known cases.
> >>
> >>>>   What do you think?
> >>> There you have it ;)
> >> Thanks :).
> >>
> >> -Quentin
> >>
> >>> Thanks again,
> >>> Hal
> >>>
> >>>>   Cheers,
> >>>>   Quentin
> >>>>
> >>>> http://llvm-reviews.chandlerc.com/D2190
> >>>>
> >>
> 
> --
> http://www.nuanti.com
> the browser experts
> 
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory