[cfe-dev] [RFC] Clearing Clang AST before running backend optimizations/codegen to save memory

Fri Sep 24 14:13:22 PDT 2021

Adding a map of mangled names to source locations (only those emitted in
the IR since those are the only ones the backend can see) doesn't
noticeably impact compile time
<https://llvm-compile-time-tracker.com/compare.php?from=ac51ad24a75c02152f8ece943d65de9a1c4e947a&to=fd8cfa1e33bfce6421b3a088b859a240b056be5a&stat=instructions>,
but does noticeably increase memory usage
<https://llvm-compile-time-tracker.com/compare.php?from=ac51ad24a75c02152f8ece943d65de9a1c4e947a&to=fd8cfa1e33bfce6421b3a088b859a240b056be5a&stat=max-rss>.
Of course, we'll win that back with clearing the Clang AST.

I found the recently implemented "dontcall" attribute which relies on the
Clang AST. I have a draft patch to make it not rely on the Clang AST:
https://reviews.llvm.org/D110364.
But that and the changes in https://reviews.llvm.org/D109781 highlight an
issue in that we don't only use the Clang AST to find source locations, we
might also use it for other things.

For example, we have custom naming of functions/lambdas/etc when printing
out their name in a diagnostic with the Clang AST node. I've worked around
this by calling LLVM's demangler, but it doesn't produce the same naming.
And for the "dontcall" attribute, the initial implementation looked into
the Clang AST to determine whether to emit a warning or an error. I've
worked around this by splitting it into two attributes, one for warning and
one for error.
But overall, if we go through with this, we'll end up not having access to
the Clang AST for future backend diagnostics. That's a tradeoff we'll have
to decide on. Forcing this does make backend diagnostics more likely to be
consistent when they don't have a Clang AST (e.g. ThinLTO post-link
compiles, IR as input to Clang).

On Tue, Sep 21, 2021 at 4:28 PM David Blaikie <dblaikie at gmail.com> wrote:

> On Tue, Sep 21, 2021 at 4:25 PM Arthur Eubanks <aeubanks at google.com>
> wrote:
>
>> But yeah - we do have the https://reviews.llvm.org/D4234
>>> "LocTrackingOnly" mode which looks like it could be used for Rpass
>>> diagnostics, for instance. (& removing use of the AST from the LLVM
>>> diagnostic system might help make it more consistent behavior even when
>>> doing LTO or other separations between AST parsing and IR transformations)
>>>
>>
>> Thanks for the pointer to that. I was getting confused because
>> "-Rpass=foo" was triggering LocTrackingOnly but "-Rpass" wasn't, which I've
>> fixed. So -Rpass is no longer a concern.
>>
>> So we just have to worry about non-Rpass diagnostics, e.g.
>> -fwarn-stack-size. All of these only use the function name for source
>> locations. We can either just give up and have worse diagnostics for these,
>> meaning no demangling and no source location per diagnostic, or create a
>> side table of these right before codegen as you suggested. For now I'll go
>> with creating a side table to preserve the status quo, hopefully its
>> construction runtime is not measurable.
>>
>> Inline asm is special in that it carries around a source location token
>> even in the IR, so we don't need to go through the AST to find a source
>> location which is nice
>>
>
> That all sounds pretty good to me - thanks for looking into all these
> nooks and crannies!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20210924/a7ecaae1/attachment.html>