[cfe-dev] Diagnostic Improvements
Chris Lattner
clattner at apple.com
Sun Nov 16 21:25:10 PST 2008
I've been pondering on diagnostics a bit lately, and reviewing one of
Doug's patches got me to thinking about this more intently. Here are
three problems I've noticed with Clang's current diagnostics:
1) We create a ton of std::strings when emitting Diagnostics. We do
this any time a substitution happens (e.g. %1) like when printing
integers (utostr), types, identifiers, and soon DeclarationNames.
This generally doesn't matter, as the actual I/O and other work to
print out the diagnostic dwarfs the cost of doing the string
manipulations. However, there are plenty of diagnostics for warnings
and extensions that end up *not* being printed, and we end up doing a
ton of string manipulation (yay heap traffic) to create strings that
are ultimately never printed. I believe, but haven't proven, that
this also makes the code for "Diag" call sites large (in terms of code
size), as they have to make lots of temporary strings and manage their
lifetime.
2) There are cases when we want to format some things more richly than
we're already doing. One simple example of this is with types. For
example, I've seen diagnostics complaining that "there is some problem
with type 'foo'". The problem here is that 'foo' is a typedef, and I
have no idea what it is a typedef of - this is the amusing inverse of
the problem most compilers have where they don't track typedefs at
all. In an IDE, we'd like to be able to have more rich information
about the 'foo' in the diagnostic, potentially allowing the user to
click through it to dive into the structure of the type (e.g. click
once and 'foo' turns into 'bar*', click again and it turns into "void
(*)()"). In a CLI, there might be an -funravel-types=2 option or
something. There is obviously no way to do this if the type is turned
into a string before the DiagnosticClient gets it.
You could imagine doing this for things like template instantiation
issues etc as well of course.
3) Poor Doug :) is having to write code like this:
if (NumParams == 1)
DK = diag::err_operator_overload_must_be_binary;
else
DK = diag::err_operator_overload_must_be_binary_plural;
The difference is a single 's':
DIAG(err_operator_overload_must_be_binary, ERROR,
"overloaded operator '%0' must be a binary operator (has %1
parameter)")
DIAG(err_operator_overload_must_be_binary_plural, ERROR,
"overloaded operator '%0' must be a binary operator (has %1
parameters)")
and:
diag::kind DK;
if (Op == OO_PlusPlus)
DK = diag::err_operator_overload_post_inc_must_be_int;
else
DK = diag::err_operator_overload_post_dec_must_be_int;
The difference is increment vs decrement:
DIAG(err_operator_overload_post_inc_must_be_int_member, ERROR,
"parameter of overloaded post-increment operator must have type
'int' (not '%0')")
DIAG(err_operator_overload_post_dec_must_be_int_member, ERROR,
"parameter of overloaded post-decrement operator must have type
'int' (not '%0')")
Having tons of variants of diagnostics like this is currently
necessary for localization purposes, but it is ugly and painful to
write.
I think it is time to stop the madness. Currently, diagnostics are
passed down into DiagnosticClient with an array of strings that are
referenced by substitutions, and turned into the final string by
DiagnosticClient::FormatDiagnostic. I think it would be straight-
forward to change this to be an array of void*, where (at the
beginning) all of these are pointers to std::string. Once that was
done, we can start introducing some more rich formatting characters
like "%T0" instead of "%0" and making the corresponding pointers be to
richer thing than strings. For example, "%T0" format the 0th element
of the array as a QualType. We could also introduce things like %u0
(unsigned int) and %i0 (signed int).
Doing this would directly solve #1 above, because 1) diagnostics that
are squelched never make it to FormatDiagnostic and 2) those that are
could be printed into one big SmallString. This would also solve #2,
because FormatDiagnostic would be given the QualType, so it could do
any crazy AST-level groveling that it wants to format it based on what
the DiagClient implementation wants. This also is a big step to
solving #3, because it would mean that we could write things like:
DIAG(err_operator_overload_must_be_binary, ERROR,
"overloaded '%0' must be a binary operator (has %u1 parameter
%up1)")
Where %u1 prints the unsigned, and %up1 (p=plural, or pick a better
letter) prints a "s" if the unsigned is != 1, or nothing if it is.
Does this seem like a reasonable thing to do? Does anyone volunteer
to start tackling it?
-Chris
More information about the cfe-dev
mailing list