[Lldb-commits] [lldb] [WIP] [lldb][TypeSystemClang] Create clang::SourceLocation from DWARF and attach to AST (PR #127829)

Tue Feb 25 10:15:31 PST 2025

jimingham wrote:

Why do we need to touch source files to add a source file attribution from debug info to a declaration?  We don't require the actual presence of source files in order to make source file attributions anywhere else in the debug info handling.

Jim

> On Feb 25, 2025, at 2:06 AM, Pavel Labath ***@***.***> wrote:
> 
> 
> labath
>  left a comment 
> (llvm/llvm-project#127829)
> Yeah, I expect this could be a bit of a problem on several fronts:
> 
> even if the memory usage is low enough to not matter, the fact that we have to hit the filesystem to load the file is most likely going to slow things down, particularly when you're using a remote filesystem (like we are). Normally, all one needs to parse debug info is the DWARF data, but now we would be fetching potentially thousands of files without actually using most most of them
> this setup also makes it hard to display the source files in the same way that the rest of lldb does. In particular the target.source-map setting is inaccessible to the debug info code, and if I'm not mistaken the source manager in lldb always takes the current value of the setting (whereas the mapping in the clang ast is fixed at construction time)
> mmapping the files (I suspect this is the reason this doesn't have a big impact on memory footprint) also has some unfortunate side effects on various systems. On windows, it prevents the file from being deleted (some editors implement "editing" as deleting and recreating a file), and on most unix systems (with the exception of Darwin which has MAP_RESILIENT_MEDIA) accessing the file after it has been truncated (which is the other way to implement "editing a file") can cause a SIGBUS.
> I wonder whether we can exhaust the SourceLocation space in this way (and what happens if we do). 4GB sounds like a lot, but I know that some people manage to do that with just a single compile unit (I think it involves #includeing the same file repeatedly), so I wouldn't be surprised if we hit it after placing the entire large binary into a single AST context. (Maybe it's not as acute as we're not modelling #includes, but still..)
> With the current implementation, I expect that we would need this to be controlled by some sort of a setting, although I'd really rather not do that. I wonder if we could cheat by pointing all files to some fake memory buffer (containing a bunch of newlines or something), and then hijacking the error printing logic to look up the "real" contents of the file (taking into account the current source map, file checksums and everything).
> 
> —
> Reply to this email directly, view it on GitHub, or unsubscribe.
> You are receiving this because you are on a team that was mentioned.
> 
>  <https://github.com/llvm/llvm-project/pull/127829#issuecomment-2681426134> <https://github.com/notifications/unsubscribe-auth/ADUPVW4LRQST7324NFPME6D2RQ6C7AVCNFSM6AAAAABXOWNWGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBRGQZDMMJTGQ>
> 
> labath
>  left a comment 
> (llvm/llvm-project#127829)
>  <https://github.com/llvm/llvm-project/pull/127829#issuecomment-2681426134>
> Yeah, I expect this could be a bit of a problem on several fronts:
> 
> even if the memory usage is low enough to not matter, the fact that we have to hit the filesystem to load the file is most likely going to slow things down, particularly when you're using a remote filesystem (like we are). Normally, all one needs to parse debug info is the DWARF data, but now we would be fetching potentially thousands of files without actually using most most of them
> this setup also makes it hard to display the source files in the same way that the rest of lldb does. In particular the target.source-map setting is inaccessible to the debug info code, and if I'm not mistaken the source manager in lldb always takes the current value of the setting (whereas the mapping in the clang ast is fixed at construction time)
> mmapping the files (I suspect this is the reason this doesn't have a big impact on memory footprint) also has some unfortunate side effects on various systems. On windows, it prevents the file from being deleted (some editors implement "editing" as deleting and recreating a file), and on most unix systems (with the exception of Darwin which has MAP_RESILIENT_MEDIA) accessing the file after it has been truncated (which is the other way to implement "editing a file") can cause a SIGBUS.
> I wonder whether we can exhaust the SourceLocation space in this way (and what happens if we do). 4GB sounds like a lot, but I know that some people manage to do that with just a single compile unit (I think it involves #includeing the same file repeatedly), so I wouldn't be surprised if we hit it after placing the entire large binary into a single AST context. (Maybe it's not as acute as we're not modelling #includes, but still..)
> With the current implementation, I expect that we would need this to be controlled by some sort of a setting, although I'd really rather not do that. I wonder if we could cheat by pointing all files to some fake memory buffer (containing a bunch of newlines or something), and then hijacking the error printing logic to look up the "real" contents of the file (taking into account the current source map, file checksums and everything).
> 
> —
> Reply to this email directly, view it on GitHub <https://github.com/llvm/llvm-project/pull/127829#issuecomment-2681426134>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW4LRQST7324NFPME6D2RQ6C7AVCNFSM6AAAAABXOWNWGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBRGQZDMMJTGQ>.
> You are receiving this because you are on a team that was mentioned.
> 

https://github.com/llvm/llvm-project/pull/127829