[LLVMdev] [RFC] Add warning capabilities in LLVM.

Mon Jul 22 12:15:53 PDT 2013

----- Original Message -----
> On Sat, Jul 20, 2013 at 9:15 PM, Chris Lattner <clattner at apple.com>
> wrote:
> > Sorry, just getting caught up on an old thread.  I haven't been
> > involved in discussions of this.
> >
> > On Jul 17, 2013, at 8:53 AM, Bob Wilson <bob.wilson at apple.com>
> > wrote:
> >> First, let me try to clarify my proposal, in case there was any
> >> confusion about that.  LLVMContext already has a hook for
> >> diagnostics, setInlineAsmDiagnosticHandler() et al.  I was
> >> suggesting that we rename those interfaces to be more generic,
> >> add a simple enumeration of whatever diagnostics can be produced
> >> from the backend, and add support in clang for mapping those
> >> enumeration values to the corresponding clang diagnostics.  This
> >> would be a small amount of work and would also be consistent with
> >> everything you wrote above about reusing the standard and
> >> existing machinery for diagnostics in clang.
> >
> > Of all of the proposals discussed, I like this the best:
> >
> > 1) This is a really simple extension of what we already have.
> >
> > 2) The backend providing a set of enumerations for the classes of
> > diagnostics it produces doesn't tie it to clang, and doesn't make
> > it language specific.  Clients should be able to completely ignore
> > the enum if they want the current (unclassified) behavior, and if
> > an unknown enum value comes through, it is easy to handle.
> >
> > 3) I don't see how something like the stack size diagnostic can be
> > implemented by clang calling into the backend. First, the
> > MachineFunction (and thus, MachineFrameInfo) is a transient
> > datastructure used by the backend when a function is compiled.
> >  There is nothing persistent for clang to query.  Second, clang
> > would have to know about all of the LLVM IR functions generated,
> > which is possible, but impractical to track for things like thunks
> > and other implicitly generated entrypoints.
> >
> > What is the specific concern with this approach?  I don't see how
> > this couples the backend to the frontend or causes layering
> > violation problems.
> >
> 
> I've not talked with Chandler about this, but to sketch out the way
> I'd do it (which is similar):
> 
> Have the backend vend diagnostics, this can be done either with a set
> of enums and messages like you mentioned, or just have a message and
> location struct ala:
> 
> struct Msg {
>    const char *Message;
>    Location Loc;
> };
> 
> that the consumer of the message can use via a handler.
> 
> Alternately a handler (and we should have a default handler) can be
> passed in from the printer of the message (the frontend in the case
> provided) and it can be called on the error message. Absolutely this
> should be done via the LLVMContext to deal with the case of parallel
> function passes.
> 
> class Handler {
>    void printDiagnostic(const char *Message, Location Loc);
> };
> 
> (Note that I didn't say this was a fleshed out design ;) I think I
> prefer the latter to the former and we'd just need an "diagnostic
> callback handler" on the context. Though we would need to keep a set
> of diagnostics that the backend handles. That said, that it provides
> diagnostics based on its input language seems to make the most sense.
> It can use the location metadata if it has it to produce a location,
> otherwise you get the function, etc. in some sort of nicely degraded
> quality.
> 
> I think this scheme could also work as a way of dealing with the
> "Optimization Diary" sort of use that Hal is envisioning as well.

I like "Optimization Diary" :)

Obviously, we sometimes loose debug information on variables while optimizing, and so the trick is to make it degrade nicely. Nevertheless, I think that we can often be more expressive than just using a single location for everything. How about something like this:

 - The message string is text but a single kind of markup is allowed: <debug/>, for example:
     "We cannot vectorize <debug/> because <debug/> is an unfriendly variable"
   (where the first will be replaced by text derived from a DIScope and the second from a DIVariable).

 - The structure is this:
    struct Msg {
      const char *Message;
      Function *F; // If nothing else, we can extract a useful name from here, hopefully.
      SmallVector<DIDescriptor, 2> DIs; // Should be DIDescriptor* ?
    };

Then, in the backend, we can look for a DbgValueInst associated with the variable that we want (similar for scopes), and push a DIScope, DIVariable, etc. onto the array, one for every <debug/> in the message string. The frontend can then render these references in whatever way seems most appropriate (which may include things like making them into hyperlinks, doing some kind of source-code highlighting, etc.).

Thanks again,
Hal

> 
> Keeping the separation of concerns around where the front end handles
> diagnostics on what we'll call "source locations" is pretty
> important,
> however, I agree that not every warning can be expressed this way,
> i.e. the stack size diagnostic. However, leaving the printing up to
> the front end is the best way to deal with this and a generic
> diagnostic engine would probably help for things like llc/opt where
> the backend just deals with its input language - IR.
> 
> The existing inline asm diagnostics are ... problematic and it would
> definitely be nice to get a generic interface for them. Though
> they're
> actually separated into two separate cases where, I think, we end up
> with our confusion:
> 
> a) Front end diagnostics - This is an area that needs some work to be
> decent, but it involves the front end querying the back end for
> things
> like register size, valid immediates, etc and should be implemented
> by
> the front end with an expanded set of target queries. We could use
> this as a way to solidify the backend api for the MS inline asm
> support as well and use some of that when parsing GNU style inline
> asm.
> 
> b) Back end diagnostics - This is the stuff that the front end has no
> hope of diagnosing. i.e. "ran out of registers", or "can't figure out
> how to split this up into this kind of vector register". The latter
> has always been a bit funny and I'm always unhappy with it, but I
> don't have any better ideas. A unified scheme of communicating "help
> help I'm being oppressed by the front end" in the backend would be,
> at
> the very least, a step forward.
> 
> Thoughts?
> 
> -eric
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory