[cfe-commits] [PATCH] Fix crash printing diagnostic range spanning macros

Fri Nov 2 20:30:42 PDT 2012

On Thu, Nov 1, 2012 at 11:14 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Thanks for the ping. =] This is definitely the direction I was
> imagining, and I really like the approach, Feel free to commit this at
> any point and rotate this to post-commit discussion as needed. I think
> its a strict improvement and would be happy to do further refinements
> incrementally.
>
> One meta comment, I think you might want more test cases to check for
> other edge case patterns, or for ranges that *shouldn't* work in
> addition to the ones that should.

I think I'm particularly lacking coverage for macro arguments; I'll
continue working on this after the initial commit.

> Index: lib/Frontend/DiagnosticRenderer.cpp
> ===================================================================
> --- lib/Frontend/DiagnosticRenderer.cpp (revision 166711)
> +++ lib/Frontend/DiagnosticRenderer.cpp (working copy)
> @@ -18,6 +18,7 @@
>  #include "llvm/Support/MemoryBuffer.h"
>  #include "llvm/Support/raw_ostream.h"
>  #include "llvm/Support/ErrorHandling.h"
> +#include "llvm/ADT/SmallSet.h"
>  #include "llvm/ADT/SmallString.h"
>  #include <algorithm>
>  using namespace clang;
> @@ -218,6 +219,42 @@
>    emitIncludeLocation(Loc, PLoc, SM);
>  }
>
> +// Helper function to fix up the source ranges so the start and end
> +// have the same FileID, and that FileID is somewhere in the
> +// caller chain for Loc.
>
> I didn't really follow this comment. Let me see if I've understood the
> implementation itself correctly:
>
> Because FileIDs aren't actually identifying the *file*, but in fact
> distinguish one macro expansion from another, such that if we have two
> source locations both of which are macro source locations, their
> FileIDs are equal iff they are expanded from the same macro.

"iff they come from the same macro expansion step" would be clearer

> Thus we
> can walk down the macro expansion stack of the start and end locations
> of each of the input ranges, and if we find the same macro expansion
> as has been selected for the given caret location, use that macro
> expansion location's spelling location.
>
> Ok, if I've gotten this all correct, this makes *lots* of sense to me.
> I particularly like the way it will "resynchronize" even when their
> are intervening macros on one side or the other side of the range. If
> there is a way to make this a bit more clear in the comments, that
> would be awesome. I wish "FileID" weren't so misleading...

Yes, this is all correct.  I'll try to make the code more clear.

> +static void MapDiagnosticRanges(SourceLocation Loc,
> +                            SmallVectorImpl<CharSourceRange>& Ranges,
> +                            SmallVectorImpl<CharSourceRange>& SpellingRanges,
> +                            const SourceManager *SM) {
>
> Style nit-pick: 'mapDiagnosticRanges'? Also, please don't use
> 80-column justification to pick strange indentations. I'd much rather
> have a line break after the '(' and indent the argument list 4 spaces
> instead of 2.
>
> +  FileID LocFileID = SM->getFileID(Loc);
>
> Do you want to do anything other than map to spelling location for the
> ranges in the event that this is not a macro expansion location?
> Wonder if it would be better to short circuit.... probably not because
> you need to push stuff into the output range anyways.

Even if the Loc isn't a macro location, we still need remapping; the
ranges could still involve macro locations.

> +  for (SmallVectorImpl<CharSourceRange>::iterator I = Ranges.begin(),
> +       E = Ranges.end();
> +       I != E; ++I) {
> +    SourceLocation Start = I->getBegin(), End = I->getEnd();
> +    bool IsTokenRange = I->isTokenRange();
> +
> +    while (Start.isMacroID() && SM->getFileID(Start) != LocFileID)
> +      Start = SM->getImmediateMacroCallerLoc(Start);
> +
> +    while (End.isMacroID() && SM->getFileID(End) != LocFileID) {
> +      // The computation of the next End is an inlined version of
> +      // getImmediateMacroCallerLoc, except it chooses the end of an
> +      // expansion range.
> +      if (SM->isMacroArgExpansion(End)) {
> +        End = SM->getImmediateSpellingLoc(End);
> +      } else {
> +        End = SM->getImmediateExpansionRange(End).second;
> +      }
> +    }
>
> Should this do anything special if one end of the range finds a macro
> expansion location, and the other doesn't? That is, should we insist
> that *both* locations FileIDs match the LocFileID if one of them does
> at this point?

We already perform this check in TextDiagnostic::highlightRange.

> One interesting option here would be to find the start (or end)
> location of the this macro definition and if the start (or end) is not
> in the same macro expansion as Loc, but the end (or start) is in the
> same expansion, use the start (or end) of the expansion instead. This
> would essentially saturate the range at the macro definition boundary.

Hmm, interesting.  I'll have to think about that one.

> +
> +    Start = SM->getSpellingLoc(Start);
> +    End = SM->getSpellingLoc(End);
> +    SpellingRanges.push_back(CharSourceRange(SourceRange(Start, End),
> +                                             IsTokenRange));
> +  }
>
> One unfortunate aspect of this re-synchronization style mapping is
> that it is O(N^2). Do you think it would be worth adjusting the input
> ranges to be the immediate caller loc when we find one that matches
> the Loc's FileID? I don't think that would miss any cases as all we
> really need is the ability to skip over macro expansion frames that
> exist in the range stack but not in the Loc stack. Still, might be
> more complex, and I certainly hope that having an N^2 algorithm here
> really isn't a problem in practice...

I'll tweak it so it's O(N*MacroBacktraceLimit); that should be good
enough in practice.  The O(N) algorithm is substantially more
complicated.  It involves two mapping functions: one to find the
nearest common ancestor, and one to step to the caller loc for each
step of the trace.

-Eli