[PATCH] Add warning capabilities in LLVM (backend part)

Fri Nov 15 17:51:46 PST 2013

Hi Hal,

Thanks for the feedbacks.

I have replied within your email, but here is the overall approach I am tempted to pursue after looking at all your comments.

A quick reminder first, as discussed in the RFC, the idea was that "StringData+Kind" can be used to format something nice in clang, but that "msg" fully covers it for clients that don't know the Kind enum.

I would like to keep something similar but, I think it would make sense to provide something more structured for StringData, especially for statically known case.

Basically, I would go for a DiagnosticInfo class, whose subclasses would provide the expected fields easily to avoid boxing/unboxing overhead. The based class would provide one virtual method: toStr(), which would implement the default message in each subclass (possibly an argument of the subclass constructor).

Then, based on the Kind+Severity the front-end can statically cast this DiagnosticInfoClass to the right type.
In particular, in the case of a InlineAsmDiagnosticInfo, this class would hold a message and a location and the front-end would do what it was doing with the InlineAsmDiag.

In the “Other” case, the subclass can return an array of Value* and a formatted string. The default implementation of OtherDiagnosticInfo::toString would walk over this array and dump the value at the right place.

However, I do not want to push the array of Value* and the formatted string into the generic class. My goal is to keep the formatting/localization burden for known cases in the front-end, like we already do. 

** New Proposal **

- Define a class DiagnosticInfo that has one virtual method toStr().
- Have a subclass of DiagnosticInfo per Kind+Severity.
- Have the front-end knows the formatting of each Kind+Severity and use the right fields of the DiagnosticInfo subclass accordingly.
- For “Other" cases, the front-end either use OtherDiagnosticInfo::toStr or use the pair ArrayRef<Value*>, StringRef Format.

Finally the reporting API would be:
void LLVMContext::report(DiagnosticInfo *, Kind, Severity)
The default message would be given by toStr().

Thoughts? 

On Nov 15, 2013, at 12:06 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
>> From: "Quentin Colombet" <qcolombet at apple.com>
>> To: dblaikie at gmail.com, echristo at gmail.com, "renato golin" <renato.golin at linaro.org>, chandlerc at gmail.com,
>> qcolombet at apple.com
>> Cc: llvm-commits at cs.uiuc.edu, hfinkel at anl.gov
>> Sent: Friday, November 15, 2013 1:18:22 PM
>> Subject: Re: [PATCH] Add warning capabilities in LLVM (backend part)
>> 
>> 
>>  Hi Hal,
>> 
>>  Thanks for the feedback.
>>  I suggest that you read the RFC regarding the choices for the
>>  interface.
>>  What you are proposing has IIRC already been discussed.
> 
> In part, yes, but not exactly, and I think it makes sense to have a more-targeted discussion now that we have an implementation to play with.
Fair enough :).

> 
>> 
>>  Let me try to summarize why we did not choose what you are
>>  proposing:
>>  1. srcloc metadata.
>>  We cannot anticipate what would be reported and in particular, a
>>  source location may not be adapted. Assume you want to report why
>>  the vectorizer failed to optimize a given loop. Having only one
>>  source location will not be enough. So why should we be bound to
>>  one source location?
> 
> No, my point is:
> 
> 1. We need the srcloc data to deal with inline asm messages (this is an existing requirement).
I know, but it does not mean we need an explicit argument for that, al least not for the front-end call-back.
Indeed, we could provide in the “reporting API’ an additional prototype with the srcloc (unsigned, not MDNode*) or the Instruction, like for LLVMContext::emitError, that maps to the proposed LLVMContext::report encoding the srcloc in the data argument.

That way, we are not bound to the one srcloc requirement, and can still support several lines requirement.

> 
> 2. Because the mechanism is completely general (although currently used only for inline asm), it makes sense for it to be a first-class part of the interface.
The mechanism is not supposed to be used by inline asm only. We use it for the internal stack-size warning, which does not report a srcloc (although this may be a mistake, I am the one to blame at the moment :)) and there are several spots where we call LLVMContext::emitError without a srcloc.
Therefore, I am not sure the srcloc must be part of the interface or of the generic DiagnosticInfo I am proposing.

> Might we want to extend this to deal with more general situation later? Maybe. But we need at least one srcloc for inline asm messages, and we can discuss how to deal with multiple-location messages as we have concrete use cases (although if we have a value array, we can also put them in there).
I am a bit reluctant to go in that direction: i.e., specializing the API for InlineAsm, then thinking generalization if needed.
In particular, I do not like the “if needed” part.

To me, we may already not be aware of the generalization needs, I am thinking plug-ins. That said, I prefer to have the general API, even if it means spending more, useful :), time discussing, than introducing a new API that we will have to support for an unknown amount of time.

If you look at what we currently have, the existing InlineAsmDiag API works well for our purpose but has been stuck to that purpose. I would like to directly address the general case, otherwise, why do not we just extend the InlineAsmDiag? 

Regarding your point for Values, I agree that if we want to pass Values the current API is not great.
I think what I am proposing with the DiagnosticInfo could get the consensus here.

> 
>> 
>>  2. StringRef vs. ArrayRef<Value*>
>>  First I will quote David Blaikie and Chris Lattner on StringRef vs.
>>  ArrayRef<StringRef>:
>>  David: “Presumably there are diagnostics that are going to have
>>  more than one parameter, though. ArrayRef<StringRef>?”
>>  Chris: “It's possible, but the full generality of any diagnostic
>>  can be folded (at some loss of generality) into a single string.
>>   To balance complexity and generality vs simplicity, "1" is a
>>  pretty decent number.”
> 
> Fair enough, and in terms of ArrayRef<StringRef> I completely agree, but:
> 
> 1. We already have a message string (and I don't think that we need more).
They have different purpose:
- One carries a default printing message.
- The other carries the information to build a user report.

Let me try to illustrate the concept.
Let us assume we are reporting a loop that cannot be optimized by the vectorizer.
The default message could be: “Cannot optimize loop <llvm internal label> because of weird data dependencies”.
The StringData could be: “<Function>##<SrcLocOfLoopHeader>##<dataDepencyKind>#<SrcLoc1>#<SrcLoc2>#<SrcLoc3>##<dataDepencyKind>#<SrcLoc1>#<SrcLoc3>##”.

The front-end would then produce a much more user friendly message, or point-out the interesting part in an IDE.

The introduction of the DIagnosticInfo intends to avoid the serialization/deserialization overhead.

> 
> 2. Value* also gives us constants, metadata, and essentially everything else. Even the backend often has values from things in the MMO structure.
Agree, but do we want to pay the boxing/unboxing cost?

> 
> 3. The backend is the wrong place to do string formatting for user consumption. Should it demangle function names too? ;)
No, of course not :).
The point of the default string was not really for user consumption but as a *default* message if the front-end is not able or does not want to produce something from the StringData.

> 
>> 
>>  Now, why using StringRef instead of Value, I see two reasons for
>>  that:
>>  - I believe we do not want the diagnostic system of the front-end
>>  to have to deal with the IR.
> 
> - The IR is the interface between the frontend and the backend. Will the frontend know what to do with some arbitrary Value*? Probably not. That's not the point. That Value* can be a ConstantInt, or a MDNode (maybe a srcloc, maybe a loop id, some debug metadata, etc.), a global variable, and many other things. Only the frontend knows how to format those things reasonably in the context of the source language. 
Using my running example, you have to use tricks in the array of Value* (like in the StringData, BTW) to explain to the front-end that the first 3 values are for the first dependency and the next two other for the second dependency.
My concern here is that an array may not be structured enough for nice exposure with variable length arguments for instance.
We can argue that if we feature a formatting string this is not a problem, but I do not like that because I think we rather have the formatting string into the front-end for known report.

I understand that my example is kind of hand waving, but my goal is to expose these problematics so that we can come up with the best solution. I certainly do not pretend to have the right design in mind :).

> 
>>  - What happens for a report coming for the MC level (e.g.,
>>  AsmParser), IIRC, there is no Value involved here.\
> 
> - The backends often do have values (from MMO), and can certainly use IR interfaces to create constants, metadata, etc. as necessary. I'd rather pass constants as constants, than having to parse them again later out of a string.
You made a point :).

> 
>> 
>>  3. Default message printer.
>>  The idea of the default message printer is to provide a directly
>>  usable message.
> 
> - I think that we should provide a utility function that will take the string + the value array and produce a default serialization of the message. Any frontend that does not really care too much (or opt, for example), can use that.
Assuming everything fits into a Value, that could work.
However, I am not a huge fan of that, because the producer of the report has to do some formatting/localization. (See my example where the default message is more like an assert message with no parameters).
> 
>> I believe we do not what to push in every
>>  front-end a parsing of the default message.
> 
> - I agree, but we really should give them that option.
Well, like I said, I am not a huge fan, but if we can reach a consensus on that, why not.

> 
> Moreover, like I said,
>>  I think we may not be able to produce Value objects every time.
> 
> - I disagree, we can always produce value objects -- value objects can be metadata strings, and are strictly more expressive than strings.
Fair enough, but again, I am not sure we want to pay the boxing cost and that it is structured enough for known cases.

> 
>> 
>>  What do you think?
> 
> There you have it ;)
Thanks :).

-Quentin

> 
> Thanks again,
> Hal
> 
>> 
>>  Cheers,
>>  Quentin
>> 
>> http://llvm-reviews.chandlerc.com/D2190
>>