[PATCH] D152708: [RFC][Draft] Enable primitive support for Two-Level Line Tables in LLVM

Tue Jun 13 02:45:45 PDT 2023

StephenTozer added a comment.

In D152708#4414402 <https://reviews.llvm.org/D152708#4414402>, @dblaikie wrote:

> It is unfortunate to hear that TLLT are a significant size increase, though not entirely surprising - it's a bunch of extra info to encode. I'll be glad to have this example to experiment with.

For what it's worth, we haven't tested this with any larger programs, so this is more of a rough estimate of size, but the .debug_line section increased in size on the order of 50% for our small test cases. On the other hand, I think that in larger programs the .debug_line section is likely be significantly smaller than the .debug_info section anyway, so if you are interested in producing inline frames without using .debug_info it is probably an improvement in most respects - as an example with a single-source input, `flops.c` (taken from the LLVM test suite repo), the .debug_line section increased from 3338 to 4758 bytes, and the overall size of the DWARF output from 9467 to 10893.

> Are there any known (or vague/unknown) limitations on the implementation with respect to the actual output/on-disk representation?

Possibly, but not outside of the proposal itself - this patch was written over the span of 4 days so doubtless there are inefficiencies in the code itself and there may be bugs in the output (especially if you try it on a larger codebase), but in theory the output follows the spec faithfully and shouldn't be wasting up space. With that said, the main cause of the size increase isn't really the additional information, but the fact that the TLLT does repeat itself a lot by design; generally speaking most instructions are not inlined callsites, and so the additional information for those lines (the context and subprogram fields) has a much smaller impact on size than the fact that we are emitting the `(InstructionAddress, SourceLocation)` pair twice for every instruction.

We didn't dig deeper into why the representation needs to be split into two separate line number programs rather than either a single line table with some additional column to convey the Instruction->SourceLocation attribution, or keeping two line tables but using a single line number program that can either emit Logical and Actual rows simultaneously. Either of these methods would reduce the repetition.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152708/new/

https://reviews.llvm.org/D152708