[cfe-dev] [RFC] Clearing Clang AST before running backend optimizations/codegen to save memory

David Blaikie via cfe-dev cfe-dev at lists.llvm.org
Fri Sep 24 14:27:42 PDT 2021


On Fri, Sep 24, 2021 at 2:13 PM Arthur Eubanks <aeubanks at google.com> wrote:

> Adding a map of mangled names to source locations (only those emitted in
> the IR since those are the only ones the backend can see) doesn't
> noticeably impact compile time
> <https://llvm-compile-time-tracker.com/compare.php?from=ac51ad24a75c02152f8ece943d65de9a1c4e947a&to=fd8cfa1e33bfce6421b3a088b859a240b056be5a&stat=instructions>,
> but does noticeably increase memory usage
> <https://llvm-compile-time-tracker.com/compare.php?from=ac51ad24a75c02152f8ece943d65de9a1c4e947a&to=fd8cfa1e33bfce6421b3a088b859a240b056be5a&stat=max-rss>.
> Of course, we'll win that back with clearing the Clang AST.
>

I guess most of that is from the string data itself? Any chance that could
be shared - by referencing (using StringRef) string data in the
llvm::Module or otherwise? (bit more nuanced than that because presumably
it'd be a problem if the string data were to go away - if a function is
optimized out of an llvm::Module, etc - even if that meant the string value
would never be queried for in the map anyway, it'd still violate some map
invariants/etc)


> I found the recently implemented "dontcall" attribute which relies on the
> Clang AST. I have a draft patch to make it not rely on the Clang AST:
> https://reviews.llvm.org/D110364.
> But that and the changes in https://reviews.llvm.org/D109781 highlight an
> issue in that we don't only use the Clang AST to find source locations, we
> might also use it for other things.
>

All the more reason to make the AST deleting mode the default (maybe the
only mode, if possible)/usable everywhere - to avoid building new features
that rely on the AST. (generally this makes situations like LTO better
anyway - since they also break the idea that the AST is available during
transformations)


> For example, we have custom naming of functions/lambdas/etc when printing
> out their name in a diagnostic with the Clang AST node. I've worked around
> this by calling LLVM's demangler, but it doesn't produce the same naming.
> And for the "dontcall" attribute, the initial implementation looked into
> the Clang AST to determine whether to emit a warning or an error. I've
> worked around this by splitting it into two attributes, one for warning and
> one for error.
> But overall, if we go through with this, we'll end up not having access to
> the Clang AST for future backend diagnostics. That's a tradeoff we'll have
> to decide on. Forcing this does make backend diagnostics more likely to be
> consistent when they don't have a Clang AST (e.g. ThinLTO post-link
> compiles, IR as input to Clang).
>

yep!


>
> On Tue, Sep 21, 2021 at 4:28 PM David Blaikie <dblaikie at gmail.com> wrote:
>
>> On Tue, Sep 21, 2021 at 4:25 PM Arthur Eubanks <aeubanks at google.com>
>> wrote:
>>
>>> But yeah - we do have the https://reviews.llvm.org/D4234
>>>> "LocTrackingOnly" mode which looks like it could be used for Rpass
>>>> diagnostics, for instance. (& removing use of the AST from the LLVM
>>>> diagnostic system might help make it more consistent behavior even when
>>>> doing LTO or other separations between AST parsing and IR transformations)
>>>>
>>>
>>> Thanks for the pointer to that. I was getting confused because
>>> "-Rpass=foo" was triggering LocTrackingOnly but "-Rpass" wasn't, which I've
>>> fixed. So -Rpass is no longer a concern.
>>>
>>> So we just have to worry about non-Rpass diagnostics, e.g.
>>> -fwarn-stack-size. All of these only use the function name for source
>>> locations. We can either just give up and have worse diagnostics for these,
>>> meaning no demangling and no source location per diagnostic, or create a
>>> side table of these right before codegen as you suggested. For now I'll go
>>> with creating a side table to preserve the status quo, hopefully its
>>> construction runtime is not measurable.
>>>
>>> Inline asm is special in that it carries around a source location token
>>> even in the IR, so we don't need to go through the AST to find a source
>>> location which is nice
>>>
>>
>> That all sounds pretty good to me - thanks for looking into all these
>> nooks and crannies!
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20210924/dffb657b/attachment.html>


More information about the cfe-dev mailing list