[LLVMdev] [RFC] Add warning capabilities in LLVM.

Tue Jul 23 16:55:18 PDT 2013

Hi David,

Thanks for the feedbacks/clarifications, I appreciate.
You will find my comments inlined with your mail.

For a quick reference, here is the summary.

** Summary **

The other considered approach would provide some callbacks for event/information we think they are important.
Like Chris pointed out, some of the information is not available for querying, thus the call back should provide sufficient information.

The client of the callback could then query llvm if the callbacks does not provide sufficient information (and assuming more information would be available via querying).

At the moment, I see the following use cases:
- Optimization hints (see Hal’s idea).
- Fatal error/warning reporting (this could be done via the first proposal: enum + bool + message.).
- Stack size reporting.

What else?
Thoughts?

Thanks again for all the feedbacks.

Cheers,

-Quentin

On Jul 23, 2013, at 3:37 PM, David Blaikie <dblaikie at gmail.com> wrote:

> On Mon, Jul 22, 2013 at 4:17 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>> Hi,
>> 
>> Compared to my previous email, I have added Hal’s idea for formatting the
>> message and pull back some idea from the "querying framework”.
>> Indeed, I propose to add some information in the reporting so that a
>> front-end (more generally a client) can filter the diagnostics or take
>> proper actions.
>> See the details hereafter.
>> 
>> On Jul 22, 2013, at 2:25 PM, Chandler Carruth <chandlerc at google.com> wrote:
>> 
>> 
>> On Mon, Jul 22, 2013 at 2:21 PM, Eric Christopher <echristo at gmail.com>
>> wrote:
>>> 
>>>>> This is pretty much the same as what Quentin proposed (with the
>>>>> addition of the enum), isn't it?
>>>>> 
>>>> 
>>>> Pretty close yeah.
>>>> 
>>> 
>>> Another thought and alternate strategy for dealing with these sorts of
>>> things:
>>> 
>>> A much more broad set of callback machinery that allows the backend to
>>> communicate values or other information back to the front end that can
>>> then decide what to do. We can define an interface around this, but
>>> instead of having the backend vending diagnostics we have the callback
>>> take a "do something with this value" which can just be "communicate
>>> it back to the front end" or a diagnostic callback can be passed down
>>> from the front end, etc.
>>> 
>>> This will probably take a bit more design to get a general framework
>>> set up, but imagine the usefulness of say being able to automatically
>>> reschedule a jitted function to a thread with a larger default stack
>>> size if the callback states that the thread size was N+1 where N is
>>> the size of the stack for a thread you've created.
>> 
>> 
>> FWIW, *this* is what I was trying to get across. Not that it wouldn't be a
>> callback-based mechanism, but that it should be a fully general mechanism
>> rather than having something to do with warnings, errors, notes, etc. If a
>> frontend chooses to use it to produce such diagnostics, cool, but there are
>> other use cases that the same machinery should serve.
>> 
>> 
>> I like the general idea.
>> 
>> To be sure I understood the proposal, let me give an example.
>> 
>> ** Example **
>> The compiler says here is the size of the stack for Loc via a “handler”
>> (“handler" in the sense whatever mechanism we come up to make such
>> communication possible). Then the front-end builds the diagnostic from that
>> information (or query for more if needed) or drops everything if it does not
>> care about this size for instance (either it does not care at all or the
>> size is small enough compared to its setting).
>> 
>> ** Comments **
>> Unless we have one handler per -kind of - use, and I would like to avoid
>> that,
> 
> I think that's, somewhat, Chandlers point (sorry, I hate to play
> Telephone here - but hope to help clarify some positions... apologies
> if this just further obfuscates/confuses). I believe the idea is some
> kind of generic callback type with a bunch of no-op-default callbacks,
> override the ones your frontend cares about ("void onStackSize(size_t
> bytes)", etc...).
I see.

> 
> Yes, we could come up with a system that doesn't require adding a new
> function call for every data that needs to be provided. Does it seem
> likely we'll have so many of these that we'll really want/need that?

That’s a good point.
I guess there are use cases that we may not anticipate and it would be nice if we do not have to modify this interface for that.
In particular, out-of-the-tree target may want to do fancy thing without to push for change in the public tree.

For now, I believe this is great that many people contributed their ideas and use cases, we can then decide what we want and what we do not want to address in the near future.

Anyway, I agree that we should focus on what we care the most when we are done with collecting the use cases (are we done?).

> 
>> I think we should still provide an information on the severity of the
>> thing we are reporting and what we are reporting.
>> Basically:
>> - Severity: Will the back-end abort after the information pass down or will
>> it continue (the boolean of the previous proposal)?
> 
> In the case of specific callbacks - that would be statically known
> (you might have a callback for some particular problem (we ran out of
> registers & can't satisfy this inline asm due to the register
> allocation of this function) - it's the very contract that that
> callback is a fatal problem). If we have a more generic callback
> mechanism, yes - we could enshrine some general properties (such as
> fatality) in the common part & leave the specifics of what kind of
> fatal problem to the 'blob’.

Agreed.

> 
>> - Kind: What are we reporting (the enum from the previous proposal)?
>> 
>> I also think we should be able to provide a default (formatted) message,
>> such that a client that does not need to know what to do with the
>> information can still print something somehow useful, especially on abort
>> cases.
> 
> Do you have some examples of fatal/abort cases we already have & how
> they're reported today? (including what kind of textual description we
> use?)

Here are a few example of some warnings that you can find in the back end:
- IntrinsicLowering:
case Intrinsic::stackrestore: {
 if (!Warned)
   errs() << “WARNING: this target does not support the llvm.stack"
             << (Callee->getIntrinsicID() == Intrinsic::stacksave ?
               "save" : "restore") << " intrinsic.\n”;
- PrologEpilogInserter:
    errs() << "warning: Stack size limit exceeded (" << MFI->getStackSize()
           << ") in " << Fn.getName()  << ".\n”;

Actually, when I wrote this, I had Hal’s Optimization Diary in mind:
<From Hal>
- "*This loop* cannot be optimized because the induction variable *i* is unsigned, and cannot be proved not to wrap"
- "*This loop* cannot be vectorized because the compiler cannot prove that memory read from *a* does not overlap with memory written to through *b*"
- "*This loop* cannot be vectorized because the compiler cannot prove that it is unconditionally safe to dereference the pointer *a*.

- The message string is text but a single kind of markup is allowed: <debug/>, for example:
    "We cannot vectorize <debug/> because <debug/> is an unfriendly variable"
  (where the first will be replaced by text derived from a DIScope and the second from a DIVariable).
</Form Hal>

> 
>> Thus, it sounds a good thing to me to have a string with some markers to
>> format the output plus the arguments to be used in the formatted output.
>> Hal’s proposal could do the trick (although I do not know if DIDescriptor
>> are the best thing to use here).
>> 
>> ** Summary **
>> I am starting to think that we should be able to cover the reporting case
>> plus some querying mechanism with something like:
>> void reportSomehtingToFEHandler(enum Reporting Kind, bool IsAbort, <some
>> information>, const char* DefautMsg, <pointer to a list of args to format in
>> the defautMsg>)
> 
> Personally I dislike losing type safety in this kind of API ("here's a
> blob of data you must programmatically query based on a schema implied
> by the 'Kind' parameter & some documentation you read"). I'd prefer
> explicit callbacks per thing - if we're going to have to write an
> explicit structure & document the parameters to each of these
> callbacks anyway, it seems easier to document that by API. (for fatal
> cases we could have no default implementations - this would ensure
> clients would be required to update for new callbacks & not
> accidentally suppress them)

If we want to let people do everything they want without modifying the existing structure, I think we need both.
I agree that for the cases we care the most, we could specialize that approach with a specific call back for each cases.

Also note that the initial intend was to report error/warning, thus the <some information> is not required.
My point is, if we do not rely on the front-end to take specific action, there is no need to pass this information. Therefore, we can eliminate that type safety problem.

> 
>> Where <some information>  is supposed to be the class/struct/pointer to the
>> relevant information for this kind. If it is not enough the FE should call
>> additional APIs to get what it wants.
>> 
>> This looks similar to the “classical” back-end report to front-end approach,
>> but gives more freedom to the front-end as it can choose what to do based on
>> the attached information.
>> I also believe this will reduce the need to expose back-end APIs and speed
>> up the process.
> 
> Speed up the process of adding these diagnostic, possibly at the cost
> of having a more opaque/inscrutible API to data from LLVM, it seems.

Good point and it matches what I have written above.

Indeed, if we rely on the front-end to query for more information when it gets a reporting like this, we do not need to pass the information and we avoid this problem.

> 
>> However, the ability of the front-end (or client) to query the back-end is
>> limited to the places where the back-end is reporting something. Also, if
>> the back-end is meant to abort, the FE cannot do anything about it (e.g.,
>> the stack size is not big enough for the jitted function).
>> That’s why I said it cover “some" querying mechanism.
>> 
>> ** Concerns **
>> 1. Testing.
> 
> Testing - I assume we'd have opt/llc register for all these callbacks
> & print them in some way (it doesn't need to have a "stack size is too
> small warning" it just needs to print the stack size whenever it's
> told - or maybe have some way to opt in to callback rendering) & then
> check the behavior with FileCheck as usual (perhaps print this stuff
> to stderr so it doesn't get confused with bytecode/asm under -o -).
> 
> That tests LLVM's contract  - that it called the notifications.
> Testing Clang's behavior when these notifications are provided would
> either require end-to-end testing (just having Clang tests that run
> LLVM, assume it already passes the LLVM-only tests & then tests Clang
> behavior on top of that) as we do in a few places already - or have
> some kind of stub callback implementation we can point Clang to (it
> could read a file of callbacks to call). That would be nice, but going
> on past experience I don't suppose anyone would actually bother to
> implement it.
> 
>> 
>> Assuming we will always emit these reports, relying on a front-end to filter
>> out what is not currently relevant (e.g., we did not set the stack size
>> warning in the FE), what will happen when we test (make check) without a
>> front-end?
>> I am afraid we will pollute all tests or we will have some difficulty to
>> test a specific reporting.
>> 
>> 2. Regarding a completely query based approach, like Chris pointed out, I do
>> not see how we can report consistent information at any given time. Also,
>> Eric, coming back to your jit example, how could we prevent the back-end to
>> abort if the jitted is too big for the stack?
> 
> Eric's (originally Chandler's discussed in person) example wasn't
> about aborting compilation. The idea was you JIT a function, you get a
> callback saying "stack size 1 million bytes" and so you spin up a
> thread that has a big stack to run this function you just compiled.

Thanks for the clarification.

> 
> The point of the example is that a pure warning-based callback is
> 'general' for LLVM but very specific for LLVM /clients/ (LLVM as a
> library, something we should keep in mind) - all they can do is print
> it out. If we provide a more general feature for LLVM clients
> (callbacks that notify those clients about things they might be
> interested in, like the size of a function's stack) then they can
> build other features (apart from just warning) such as a JIT that
> dynamically chooses thread stack size based on the stack size of the
> functions it JITs.

I see your point.

> 
>> 
>> 3. Back to the strictly reporting approach where we extend the inlineasm
>> handler (the approach proposed by Bob and that I sketched a little bit
>> more), now looks similar to this approach expect that the back-end chooses
>> what it is relevant to report and the back-end does not need to pass down
>> the information.
>> The concern is how do we easily (in a robust and extendable manner) provide
>> a front-end/back-end option for each warning/error?
>> 
>> 
>> Thoughts?
>> 
>> Cheers,
>> 
>> -Quentin
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130723/a9304d69/attachment.html>