[cfe-dev] Diagnostic Improvements

Sun Nov 16 21:25:10 PST 2008

I've been pondering on diagnostics a bit lately, and reviewing one of  
Doug's patches got me to thinking about this more intently.  Here are  
three problems I've noticed with Clang's current diagnostics:

1) We create a ton of std::strings when emitting Diagnostics.  We do  
this any time a substitution happens (e.g. %1) like when printing  
integers (utostr), types, identifiers, and soon DeclarationNames.   
This generally doesn't matter, as the actual I/O and other work to  
print out the diagnostic dwarfs the cost of doing the string  
manipulations.  However, there are plenty of diagnostics for warnings  
and extensions that end up *not* being printed, and we end up doing a  
ton of string manipulation (yay heap traffic) to create strings that  
are ultimately never printed.  I believe, but haven't proven, that  
this also makes the code for "Diag" call sites large (in terms of code  
size), as they have to make lots of temporary strings and manage their  
lifetime.

2) There are cases when we want to format some things more richly than  
we're already doing.  One simple example of this is with types.  For  
example, I've seen diagnostics complaining that "there is some problem  
with type 'foo'".  The problem here is that 'foo' is a typedef, and I  
have no idea what it is a typedef of - this is the amusing inverse of  
the problem most compilers have where they don't track typedefs at  
all.   In an IDE, we'd like to be able to have more rich information  
about the 'foo' in the diagnostic, potentially allowing the user to  
click through it to dive into the structure of the type (e.g. click  
once and 'foo' turns into 'bar*', click again and it turns into "void  
(*)()").  In a CLI, there might be an -funravel-types=2 option or  
something.  There is obviously no way to do this if the type is turned  
into a string before the DiagnosticClient gets it.

You could imagine doing this for things like template instantiation  
issues etc as well of course.

3) Poor Doug :) is having to write code like this:

       if (NumParams == 1)
         DK = diag::err_operator_overload_must_be_binary;
       else
         DK = diag::err_operator_overload_must_be_binary_plural;

The difference is a single 's':

DIAG(err_operator_overload_must_be_binary, ERROR,
      "overloaded operator '%0' must be a binary operator (has %1  
parameter)")
DIAG(err_operator_overload_must_be_binary_plural, ERROR,
      "overloaded operator '%0' must be a binary operator (has %1  
parameters)")

and:

       diag::kind DK;
       if (Op == OO_PlusPlus)
         DK = diag::err_operator_overload_post_inc_must_be_int;
       else
         DK = diag::err_operator_overload_post_dec_must_be_int;

The difference is increment vs decrement:
DIAG(err_operator_overload_post_inc_must_be_int_member, ERROR,
      "parameter of overloaded post-increment operator must have type  
'int' (not '%0')")
DIAG(err_operator_overload_post_dec_must_be_int_member, ERROR,
      "parameter of overloaded post-decrement operator must have type  
'int' (not '%0')")

Having tons of variants of diagnostics like this is currently  
necessary for localization purposes, but it is ugly and painful to  
write.

I think it is time to stop the madness.  Currently, diagnostics are  
passed down into DiagnosticClient with an array of strings that are  
referenced by substitutions, and turned into the final string by  
DiagnosticClient::FormatDiagnostic.  I think it would be straight- 
forward to change this to be an array of void*, where (at the  
beginning) all of these are pointers to std::string.  Once that was  
done, we can start introducing some more rich formatting characters  
like "%T0" instead of "%0" and making the corresponding pointers be to  
richer thing than strings.  For example, "%T0" format the 0th element  
of the array as a QualType.  We could also introduce things like %u0  
(unsigned int) and %i0 (signed int).

Doing this would directly solve #1 above, because 1) diagnostics that  
are squelched never make it to FormatDiagnostic and 2) those that are  
could be printed into one big SmallString.  This would also solve #2,  
because FormatDiagnostic would be given the QualType, so it could do  
any crazy AST-level groveling that it wants to format it based on what  
the DiagClient implementation wants.  This also is a big step to  
solving #3, because it would mean that we could write things like:

DIAG(err_operator_overload_must_be_binary, ERROR,
      "overloaded '%0' must be a binary operator (has %u1 parameter 
%up1)")

Where %u1 prints the unsigned, and %up1 (p=plural, or pick a better  
letter) prints a "s" if the unsigned is != 1, or nothing if it is.

Does this seem like a reasonable thing to do?  Does anyone volunteer  
to start tackling it?

-Chris