[cfe-dev] Limits on AST depth and stack usage?

Mon Aug 31 20:46:09 PDT 2020

Since you asked about more general issues:

In general terms, it would be nice if we could accept arbitrarily-deep
ASTs. We make some effort to do so, with three major efforts (in addition
to expecting 8MB of stack space in each thread that might run Clang):

1) Data recursion. RecursiveASTVisitor does this by default, and the
constant evaluator does it when evaluating integer-valued expressions. But
this is limited: you only get it where you apply it, and RAV's automatic
data recursion turns itself off when it thinks the difference might be
observable in subclasses.
2) Minimizing stack space during recursion. For example, during "regular"
function template instantiation at the end of the translation unit, we try
to keep the size of the recursive frame as small as possible.
3) The "runWithSufficientStackSpace" mechanism that tries to check that
"enough" stack space is available and pops out a diagnostic (and as a
slow-but-correct fallback, allocates a new stack) if we're getting close to
running out.

Applying these techniques (largely in the above order) should be
encouraged, and if we can make memoizedMatchesAncestorOfRecursively do so,
that seems like a good directed approach for the local problem. But it's
unlikely that we'll ever cover enough ground here to consistently prevent
ever running out of stack in all cases -- not unless we fully move away
from recursion as an implementation technique, which seems implausible, or
add "runWithSufficientStackSpace" calls to every call graph SCC, which
seems expensive. So this is currently done pragmatically: we fix stack
usage issues as we see them become a problem. And in the remaining cases,
we crash :-( Not the end of the world for a compiler, but not great for
more or less any other user of Clang as a library.

For standards-conformance, our backstop here are so-called "implementation
limits" -- we're permitted to bail out if things get too complex, and we
largely get to define what that means. C has some minimal requirements here
-- for example, that we can support at least one program with 63 levels of
nested parentheses; C++ has only suggestions and no hard requirements. But
in either case, if our limits are violated, there are no constraints on our
behavior; we're allowed to crash or whatever. So that puts us in
quality-of-implementation land.

Now, we do have some defined implementation limits (and there has been some
work done towards improving and formalizing them:
https://reviews.llvm.org/D72053). In the specific case of expanding a
fold-expression, the relevant implementation limit is
LangOptions::BracketDepth, which limits the total number of parentheses,
braces, or square brackets that can be open at once, but that limit is
currently only applied when parsing, not during template instantiation. So
in this specific case, as a bare minimum, I think we're missing a check for
that language option during fold-expression expansion (each level of a
fold-expression introduces a set of parentheses, so even a pedantic
interpretation of the rules has us covered reusing the same limit here).

As for limiting the size of packs in general: I think it might make sense
to impose a limit -- not to avoid excessive recursion; we should have other
mechanisms to deal with that -- but to avoid huge memory usage during
compilation due to forming overly-large packs. Eg, passing a very large
integer to the __make_integer_seq builtin can easily cause Clang to fall
into a hole from which it will never escape, assuming you don't run out of
address space and just crash.

Setting an implementation limit on the overall depth of the AST could be
interesting, if there's a way we can practically do it. This seems like a
hard problem -- there are just too many different places and ways in which
AST nodes are created, many of them aren't equipped to cope with the
operation failing in some kind of gracefully-recoverable way, and just
keeping track of the AST depth without paying a code complexity and memory
usage cost seems like a challenge -- and we'd need to be sure the cost of
solving it is worth the benefit.

On Mon, 31 Aug 2020 at 16:35, David Rector <davrecthreads at gmail.com> wrote:

> RecursiveASTVisitor::TraverseStmt has a `DataRecursionQueue *Queue` in its
> signature to prevent stack overflow by default (as Richard once helpfully
> explained to me).  ASTMatchers uses the DataRecursionQueue, so that
> searching for matches in Stmt children should never overflow the stack, I
> believe (though it does confuse users who expect a more orderly
> traversal…but that’s another matter).
>
> But `memoizedMatchesAncestorOfRecursively` does *not* use data recursion,
> as you’ve noted.  So in other words it is using data recursion when
> traversing descendants of deeply nested Exprs, but not when traversing the
> ancestors of those at the bottom.
>
> So, I think it could just be a limited problem with a simple solution:
> there should be a non-recursive, loop-based version of
> memoizedMatchesAncestorOfRecursively specific to traversing the ancestors
> of Stmts/Exprs, mirroring what RecursiveASTVisitor::TraverseStmt does.
>
> Whether there are more general issues with e.g. evaluating extremely deep
> ASTs, is beyond my own depth, though...
>
> Dave
>
>
> On Aug 31, 2020, at 11:33 AM, Sam McCall via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> (doh, obviously that first "constexpr" shouldn't be there)
>
> On Mon, Aug 31, 2020 at 5:22 PM Sam McCall <sammccall at google.com> wrote:
>
>> This program creates an AST of depth 10000:
>>
>> constexpr
>> template <class T, T...V> struct seq {
>>   constexpr bool zero() { return (true && ... && (V == 0)); };
>> };
>> constexpr unsigned N = 10000;
>> auto x = __make_integer_seq<seq, int, N>{};
>> static_assert(!x.zero(), "");
>>
>> On my machine, clang-10 parses this successfully (slowly!) but crashes if
>> I increase to 100000.
>> I suppose -ftemplate-depth doesn't apply as there's no actual recursion
>> here (just fold-expressions), but I feel like some implementation limit
>> *should* apply: clang relies heavily on being able to traverse the AST
>> recursively, and will inevitably crash if the tree is too tall.
>>
>> Should there be a limit here? It seems tempting to say one of:
>>  -ftemplate-depth should constrain either the number of arguments a pack
>> can represent
>>  -ftemplate-depth should constrain the size of pack that can be unfolded
>>  -some limit should constrain the overall AST height
>>
>> ---
>>
>> The immediate problem we're dealing with is that
>> https://reviews.llvm.org/D85621 increased the size of DynTypedNode from
>> 32->56 and we're seeing clang-tidy crashes due to stack overflow in the
>> recursive traversal MatchASTVisitor::memoizedMatchesAncestorOfRecursively.
>>
>> However this is on a unit test very similar to the above (with N=2000) so
>> before trying to squeeze stack usage I'd like to understand what we should
>> be aiming to support.
>>
>> (Tangentially related: Richard, if you think there are good ways to
>> squeeze TemplateArgumentLoc, that'd be interesting to know. Currently it's
>> by-value and holds TemplateArgument and TemplateArgumentLocInfo which are
>> 24 bytes each)
>>
> I think it should be possible to squeeze TemplateArgument down to 16
bytes, but it'll be a little fiddly. All representations are already 16
bytes (including the kind field) except for the Declaration representation,
and that one is only two pointers plus a kind. We have too many kinds to
fit into the bottom bits of a pointer (we have 9 kinds currently), but we
can handle that: all the other representations have a full 4 bytes for the
"kind" field, so we can use (for example) bottom bit == 0 means it's the
Declaration kind, and bottom bit == 1 means the remaining 31 bits are the
actual kind. (Or we can get down to 8 kinds by representing "null" as --
for example -- a null type template argument, and then pack the kind in the
first 3 bits.)

The lower-hanging fruit is TemplateArgumentLocInfo, for which we could
heap-allocate the information for the template name case and reduce the
class to 8 bytes. Given that template template arguments are the rarest
kind, that's very likely to be a net memory usage reduction.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200831/2ace31e1/attachment-0001.html>