[cfe-dev] [RFC] Clearing Clang AST before running backend optimizations/codegen to save memory

Arthur Eubanks via cfe-dev cfe-dev at lists.llvm.org
Mon Sep 27 11:49:08 PDT 2021

On Fri, Sep 24, 2021 at 2:27 PM David Blaikie <dblaikie at gmail.com> wrote:

> On Fri, Sep 24, 2021 at 2:13 PM Arthur Eubanks <aeubanks at google.com>
> wrote:
>> Adding a map of mangled names to source locations (only those emitted in
>> the IR since those are the only ones the backend can see) doesn't
>> noticeably impact compile time
>> <https://llvm-compile-time-tracker.com/compare.php?from=ac51ad24a75c02152f8ece943d65de9a1c4e947a&to=fd8cfa1e33bfce6421b3a088b859a240b056be5a&stat=instructions>,
>> but does noticeably increase memory usage
>> <https://llvm-compile-time-tracker.com/compare.php?from=ac51ad24a75c02152f8ece943d65de9a1c4e947a&to=fd8cfa1e33bfce6421b3a088b859a240b056be5a&stat=max-rss>.
>> Of course, we'll win that back with clearing the Clang AST.
> I guess most of that is from the string data itself? Any chance that could
> be shared - by referencing (using StringRef) string data in the
> llvm::Module or otherwise? (bit more nuanced than that because presumably
> it'd be a problem if the string data were to go away - if a function is
> optimized out of an llvm::Module, etc - even if that meant the string value
> would never be queried for in the map anyway, it'd still violate some map
> invariants/etc)
Yeah, reusing strings could have issues. However, if we use a hash of the
strings as the keys then for the most part we the memory usage
look better.

>> I found the recently implemented "dontcall" attribute which relies on the
>> Clang AST. I have a draft patch to make it not rely on the Clang AST:
>> https://reviews.llvm.org/D110364.
>> But that and the changes in https://reviews.llvm.org/D109781 highlight
>> an issue in that we don't only use the Clang AST to find source locations,
>> we might also use it for other things.
> All the more reason to make the AST deleting mode the default (maybe the
> only mode, if possible)/usable everywhere - to avoid building new features
> that rely on the AST. (generally this makes situations like LTO better
> anyway - since they also break the idea that the AST is available during
> transformations)
>> For example, we have custom naming of functions/lambdas/etc when printing
>> out their name in a diagnostic with the Clang AST node. I've worked around
>> this by calling LLVM's demangler, but it doesn't produce the same naming.
>> And for the "dontcall" attribute, the initial implementation looked into
>> the Clang AST to determine whether to emit a warning or an error. I've
>> worked around this by splitting it into two attributes, one for warning and
>> one for error.
>> But overall, if we go through with this, we'll end up not having access
>> to the Clang AST for future backend diagnostics. That's a tradeoff we'll
>> have to decide on. Forcing this does make backend diagnostics more likely
>> to be consistent when they don't have a Clang AST (e.g. ThinLTO post-link
>> compiles, IR as input to Clang).
> yep!
The tradeoff is a real tradeoff though, we may want to use the Clang AST to
provide more specific diagnostics when we have access to the Clang
AST/sources. I'm waiting for somebody to object, but if nobody objects then
perhaps we're good.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20210927/beaa893f/attachment.html>

More information about the cfe-dev mailing list