[cfe-commits] [PATCH] Implementation quantity limits to prevent crashes

Fri Oct 7 07:33:34 PDT 2011

On Sep 28, 2011, at 6:46 PM, Aaron Ballman wrote:

> On Fri, Sep 23, 2011 at 3:23 PM, Douglas Gregor <dgregor at apple.com> wrote:
>> 
>> On Sep 23, 2011, at 1:22 PM, Aaron Ballman wrote:
>> 
>>> On Fri, Sep 23, 2011 at 1:35 PM, Douglas Gregor <dgregor at apple.com> wrote:
>>>> 
>>>> On Sep 21, 2011, at 8:45 PM, Aaron Ballman wrote:
>>>> 
>>>>> On Wed, Sep 21, 2011 at 8:11 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>>>>>> Also, the diagnostic wording was taken from MSVC, but could likely be
>>>>>>> improved.  Suggestions welcome.
>>>>>> 
>>>>>> I would say something more along the lines of "parser recusion limit
>>>>>> reached"; using the term "stack overflow" makes it sound like a bug in
>>>>>> the compiler.
>>>>> 
>>>>> Here is a revised patch with the new diagnostic wording.
>>>> 
>>>> Minor stuff: there's a typo "recusion" in the diagnostic wording, and the new test case hasn't been updated for the new wording.
>>> 
>>> Good catches!
>>> 
>>>> I like this general direction, but I do wonder about adding another RAII object for this. You've added the RAII object in two places, but aren't there many other similar places? Can we be more methodical, so we can be sure that we don't lose the benefits of this checking with later maintenance?
>>> 
>>> The intention was to see if folks liked the direction, and then add it
>>> to more places.  Why add it everywhere under the sun if people think
>>> it is the wrong way to solve the problem?
>>> 
>>>> I have one idea: what if we expanded the mandate of this RAII object a bit, so that it was responsible for both balancing the delimiters (parentheses, square brackets, braces, etc.)? In other words, we introduce the RAII object to consume the opening paren/bracket/brace and also to find the matching ')':
>>> 
>>> When I started out, I had intended this to have a bit more to do with
>>> [implimits].  But what I ended up with is really is about depth of
>>> recursion in the parser.  Without further accessors, it's not useful
>>> for iterative limit checking (such as the number of constants in an
>>> enumeration).  But do we care?
>> 
>> Probably not. We should be able to handle 'lots' of enumerators. Recursion is really the killer for us, since we have limited stack space (and Clang uses a lot of it).
>> 
>>>>  BalancedDelimiterTracket Parens(*this, tok::l_paren); // tries to consume the opening '('
>>>>  if (!Parens) // checks whether an error occurred in consuming the opening ')'
>>>>    return error;
>>>> 
>>>>  // parse whatever is in the parens
>>>> 
>>>>  if (Parens.close()) { // tries to consume the closing ')'
>>>>    // couldn't find closing ')'
>>>>  }
>>>> 
>>>> The idea here would be to encapsulate all of the behavior that now uses MatchRHSPunctuation into an RAII object, and have that also check for implementation limits. There are three benefits here:
>>>> 
>>>>  (1) It becomes clearer where we're matching delimited sets, so the structure of the parser improves in general.
>>>>  (2) Implementation limits checking comes "free" when we use the right abstraction, so it won't be forgotten or lost.
>>>>  (3) These RAII objects, in aggregate, outline the source as we're parsing it. We should be able to use that information to improve parser recovery.
>>>> 
>>>> What do you think?
>>> 
>>> I think I'll play around with the idea and get back to you.  :-)  On
>>> its face, it sounds like a clean way to go about it, so I'll see if it
>>> works in practice as well.  Thank you for the idea!
>> 
>> Great! I'm looking forward to your next patch.
> 
> Going off Douglas' suggestions, I've created a new patch for parsing
> balanced delimiters.
> 
> There is a Parser::DelimiterTracker class which is responsible for
> tracking the "depth" any individual delimiter is at.  The parser owns
> an instance of this class.  There is a
> Parser::BalancedDelimiterTracker class which is used by the parsing
> methods to navigate the balanced delimiters.  It works in conjunction
> with the Parser::DelimiterTracker so that we can track recursion
> limits.  The class comes with three methods: consumeOpen,
> expectAndConsume and consumeClose.
> 
> I've refactored all occurrences of MatchRHSPunctuation to now use a
> BalancedDelimiterTracker object instead.  Due to this, I've removed
> the MatchRHSPunctuation function entirely.

Excellent.

> I've tested this patch against the test suite and all tests continue
> to behave as expected.  I've also checked to make sure this doesn't
> regress performance, and it hasn't.
> 
> This patch does still address bug 10332, and contains a test case for
> it as well.

A few comments below.

+  class DelimiterTracker {
+  private:
+    friend class Parser;
+
+    unsigned P, B, S, L, LLL, T;

Some more-descriptive variable names would be useful here :)

+    BalancedDelimiterTracker(Parser& p, tok::TokenKind k);
+    ~BalancedDelimiterTracker();
+
+    bool consumeOpen(SourceLocation *open) const;
+    bool expectAndConsume(SourceLocation *open, 
+                           unsigned DiagID, 
+                           const char *Msg = "", 
+                           tok::TokenKind SkipToTok = tok::unknown) const;
+    bool consumeClose(const SourceLocation& open, SourceLocation *close) const;
+  };

I'd prefer that we keep the open/close source locations within BalancedDelimiterTracker, which would simplify the interface a bit, since *every* user of BalancedDelimiterTracker needs to track this information anyway.

Otherwise, this looks great. Thanks for working on it!

	- Doug