<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Jan 6, 2016 at 4:22 PM, Bruno Cardoso Lopes <span dir="ltr"><<a href="mailto:bruno.cardoso@gmail.com" target="_blank">bruno.cardoso@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">bruno updated this revision to Diff 44180.<br>

bruno added a comment.<br>

<br>

Hi Richard,<br>

<br>

Thanks for the comments. Updated the patch!<br>

<span class=""><br>

In <a href="http://reviews.llvm.org/D15173#313235" rel="noreferrer" target="_blank">http://reviews.llvm.org/D15173#313235</a>, @rsmith wrote:<br>

<br>

> I think that this will leave us with a broken token stream. In your example, the cached token stream starts as<br>

><br>

>   `NSArray` `<` `id` `<` `PB` `>>` `*` [...]<br>

><br>

><br>

> ... and we try to annotate the `id<PB>` with our `CachedLexPos` pointing at the `*` token. That leaves `CachedTokens` containing:<br>

><br>

>   `NSArray` `<` `(type annotation)` `*` [...]<br>

><br>

><br>

> ... which is wrong. We need to actually convert the `tok::greatergreater` in `CachedTokens` into a `tok::greater` and update its location and length in order for the cached token stream to be correctly updated. Otherwise if the parser backtracks it will see the wrong token stream.<br>

<br>

<br>

</span>I don't think this happens, the type annotation starts at 7:11 and ends at 7:24:<br>

identifier 'NSArray'            Loc=<testcase.mm:7:11><br>

<span class="">greatergreater '>>'             Loc=<testcase.mm:7:24><br>

<br>

</span>The code that follows the assert then patches the CachedTokens the correct way, see the CachedTokens before and after:<br></blockquote><div><br></div><div>This example doesn't show the above problem, because it's annotating the NSArray<...> type, not the id<...> type. On reflection, the problem that I'm concerned about won't actually trigger the assertion here (except in weird and rare cases like annotating the middle > in a >>> token), but it's really caused by the same underlying bug. In practice, things will go wrong in two ways when we want to annotate half of a split token such as `>>`:</div><div><br></div><div>1) If the annotation ends in the middle of a split token, we'll currently annotate the entire token and lose the second half if we backtrack</div><div>2) If the annotation ends at the end of a split token, we'll assert because the start of the final token is not on a token boundary in the cached token stream</div><div><br></div><div>The underlying bug is that splitting a token is not properly updating the token cache. Assuming we never want to split a token, then backtrack over the token, then choose to not split it, the best fix would seem to be to update the cached token stream at the point when we perform the split (replace the tok::greatergreater with two tok::greaters in CachedTokens). And I think that's a correct assumption, as we only consider splitting when we know we're parsing a template argument list, and in that context the token is always split.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- Before:<br>

<br>

(clang::Preprocessor::CachedTokensTy) $32 = {<br>

<br>

  [0] = (Loc = 89, UintData = 1, PtrData = 0x0000000000000000, Kind = l_paren, Flags = 0)<br>

  [1] = (Loc = 90, UintData = 7, PtrData = 0x000000010d82fba0, Kind = identifier, Flags = 0)<br>

  [2] = (Loc = 97, UintData = 1, PtrData = 0x0000000000000000, Kind = less, Flags = 0)<br>

  [3] = (Loc = 98, UintData = 2, PtrData = 0x000000010d01da58, Kind = identifier, Flags = 0)<br>

  [4] = (Loc = 100, UintData = 1, PtrData = 0x0000000000000000, Kind = less, Flags = 0)<br>

  [5] = (Loc = 101, UintData = 2, PtrData = 0x000000010d82fb70, Kind = identifier, Flags = 0)<br>

  [6] = (Loc = 103, UintData = 2, PtrData = 0x0000000000000000, Kind = greatergreater, Flags = 0)<br>

  [7] = (Loc = 106, UintData = 1, PtrData = 0x0000000000000000, Kind = star, Flags = 2)<br>

<br>

}<br>

<br>

- After:<br>

<br>

(clang::Preprocessor::CachedTokensTy) $34 = {<br>

<br>

  [0] = (Loc = 89, UintData = 1, PtrData = 0x0000000000000000, Kind = l_paren, Flags = 0)<br>

  [1] = (Loc = 90, UintData = 104, PtrData = 0x000000010d820660, Kind = annot_typename, Flags = 0)<br>

  [2] = (Loc = 106, UintData = 1, PtrData = 0x0000000000000000, Kind = star, Flags = 2)<br>

<span class=""><br>

}<br>

<br>

<br>

<a href="http://reviews.llvm.org/D15173" rel="noreferrer" target="_blank">http://reviews.llvm.org/D15173</a><br>

<br>

Files:<br>

  lib/Lex/PPCaching.cpp<br>

  test/Parser/<a href="http://objcxx11-protocol-in-template.mm" rel="noreferrer" target="_blank">objcxx11-protocol-in-template.mm</a><br>

<br>

Index: test/Parser/<a href="http://objcxx11-protocol-in-template.mm" rel="noreferrer" target="_blank">objcxx11-protocol-in-template.mm</a><br>

===================================================================<br>

--- test/Parser/<a href="http://objcxx11-protocol-in-template.mm" rel="noreferrer" target="_blank">objcxx11-protocol-in-template.mm</a><br>

+++ test/Parser/<a href="http://objcxx11-protocol-in-template.mm" rel="noreferrer" target="_blank">objcxx11-protocol-in-template.mm</a><br>

@@ -8,3 +8,11 @@<br>

<br>

 vector<id<P>> v;<br>

 vector<vector<id<P>>> v2;<br>

+<br>

+@protocol PA;<br>

+@protocol PB;<br>

+<br>

+@class NSArray<ObjectType>;<br>

+typedef int some_t;<br>

+<br>

+id<PA> FA(NSArray<id<PB>> *h, some_t group);<br>

Index: lib/Lex/PPCaching.cpp<br>

===================================================================<br>

--- lib/Lex/PPCaching.cpp<br>

+++ lib/Lex/PPCaching.cpp<br>

</span>@@ -97,13 +97,33 @@<br>

<span class=""> void Preprocessor::AnnotatePreviousCachedTokens(const Token &Tok) {<br>

   assert(Tok.isAnnotation() && "Expected annotation token");<br>

   assert(CachedLexPos != 0 && "Expected to have some cached tokens");<br>

-  assert(CachedTokens[CachedLexPos-1].getLastLoc() == Tok.getAnnotationEndLoc()<br>

-         && "The annotation should be until the most recent cached token");<br>

+<br>

</span>+#ifndef NDEBUG<br>

+  {<br>

<span class="">+    Token CachedLastTok = CachedTokens[CachedLexPos - 1];<br>

</span><span class="">+    SourceLocation CachedLastTokLoc = CachedLastTok.getLastLoc();<br>

+    SourceLocation TokAnnEndLoc = Tok.getAnnotationEndLoc();<br>

</span><span class="">+<br>

+    // The annotation should be until the most recent cached token. Since<br>

+    // `Tok` length could be bigger than one (e.g. greatergreater '>>'), account<br>

</span>+    // for that case before checking the assertion.<br>

<span class="">+    if (CachedLastTokLoc != TokAnnEndLoc && !CachedLastTok.isAnnotation()) {<br>

</span><span class="">+      CachedLastTokLoc =<br>

+          CachedLastTokLoc.getLocWithOffset(CachedLastTok.getLength());<br>

+      unsigned TokAnnEndLocSize = Lexer::MeasureTokenLength(<br>

+          SourceMgr.getSpellingLoc(TokAnnEndLoc), SourceMgr, LangOpts);<br>

+      TokAnnEndLoc = TokAnnEndLoc.getLocWithOffset(TokAnnEndLocSize);<br>

+    }<br>

+<br>

</span><span class="">+    assert(CachedLastTokLoc == TokAnnEndLoc &&<br>

+           "The annotation should be until the most recent cached token");<br>

</span>+  }<br>

+#endif<br>

<span class=""><br>

   // Start from the end of the cached tokens list and look for the token<br>

   // that is the beginning of the annotation token.<br>

</span>   for (CachedTokensTy::size_type i = CachedLexPos; i != 0; --i) {<br>

-    CachedTokensTy::iterator AnnotBegin = CachedTokens.begin() + i-1;<br>

+    CachedTokensTy::iterator AnnotBegin = CachedTokens.begin() + i - 1;<br>

     if (AnnotBegin->getLocation() == Tok.getLocation()) {<br>

       assert((BacktrackPositions.empty() || BacktrackPositions.back() < i) &&<br>

              "The backtrack pos points inside the annotated tokens!");<br>

<br>

<br>

</blockquote></div><br></div></div>