[Lldb-commits] [lldb] [WIP] [lldb][TypeSystemClang] Create clang::SourceLocation from DWARF and attach to AST (PR #127829)

Tue Feb 25 02:06:21 PST 2025

labath wrote:

Yeah, I expect this could be a bit of a problem on several fronts:
- even if the memory usage is low enough to not matter, the fact that we have to hit the filesystem to load the file is most likely going to slow things down, particularly when you're using a remote filesystem (like we are). Normally, all one needs to parse debug info is the DWARF data, but now we would be fetching potentially thousands of files without actually using most most of them
- this setup also makes it hard to display the source files in the same way that the rest of lldb does. In particular the `target.source-map` setting is inaccessible to the debug info code, and if I'm not mistaken the source manager in lldb always takes the current value of the setting (whereas the mapping in the clang ast is fixed at construction time)
- mmapping the files (I suspect this is the reason this doesn't have a big impact on memory footprint) also has some unfortunate side effects on various systems. On windows, it prevents the file from being deleted (some editors implement "editing" as deleting and recreating a file), and on most unix systems (with the exception of Darwin which has `MAP_RESILIENT_MEDIA`) accessing the file after it has been truncated (which is the other way to implement "editing a file") can cause a SIGBUS.
- I wonder whether we can exhaust the SourceLocation space in this way (and what happens if we do). 4GB  sounds like a lot, but I know that some people manage to do that with just a single compile unit (I think it involves `#include`ing the same file repeatedly), so I wouldn't be surprised if we hit it after placing the entire large binary into a single AST context. (Maybe it's not as acute as we're not modelling `#include`s, but still..)

With the current implementation, I expect that we would need this to be controlled by some sort of a setting, although I'd really rather not do that. I wonder if we could cheat by pointing all files to some fake memory buffer (containing a bunch of newlines or something), and then hijacking the error printing logic to look up the "real" contents of the file (taking into account the current source map, file checksums and everything).

https://github.com/llvm/llvm-project/pull/127829