[PATCH] D148042: [MSP430] Get the DWARF pointer size from MCAsmInfo instead of DataLayout.

Thu Apr 20 09:21:22 PDT 2023

kuilpd added a comment.

In D148042#4281618 <https://reviews.llvm.org/D148042#4281618>, @dblaikie wrote:

>> Next, LLDB indeed assumes that pointer size coincides with ELF class and I would not consider it as a bug :)
>
> I would - DWARF has a field for pointer size & it doesn't have to be/isn't tied to the ELF class.
>
>> And it seems there is no way to specify that inside ELF file that ELF32 file could contain 16-bit pointers.
>
> There is at the DWARF level.

DWARF has a pointer size field, but ELF doesn't. When parsing ELF header, LLDB just assumes the pointer size to be the same as ELF type, 32 or 64 bit, no other options.
https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/ObjectFile/ELF/ELFHeader.cpp#L120
16-bit pointers can be put in ELF32, but LLDB fails to read them while trying to extract them as 32-bit values. Wouldn't call it a bug per se, it's just how it's made, seeing as right now only ELF32 and ELF64 exist.
LLDB does use the DWARF's pointer size when extracting DWARF information, but if DWARF says it has 16-bit pointers when LLDB already assumed it's 32-bit because of ELF32, it breaks things. Can't remember which exact things, but a lot of debug info stops working. Sometimes LLDB uses ELF pointer width, sometimes DWARF, everything is interleaved there as it assumes the sizes in both are the same.
And if there is no DWARF information in the file, there seems to be no way of specifying in an ELF32 header that it has 16-bit pointers.

> But I guess you mean for the actual code in an ELF32 file you have to have 32 bit pointers for relocations to be applied, etc? So what does it mean for for the program to have 16 bit pointers but be using 32 bit relocations?

I'm not sure, but I haven't encountered any problems. Pointers are just treated as 32-bit wide in ELF/DWARF information only, but once all the information has been correctly loaded by LLDB, it uses the pointer size that is specified for the target architecture.
https://github.com/llvm/llvm-project/blob/main/lldb/source/Utility/ArchSpec.cpp#L158

>> So, it seems we need a way to distinguish between "platform address size" that it used for code generation, optimizations, etc. and "address size used for DWARF emission" (both for data and code!).
>
> I still don't know if we should/do need to distinguish those things, but maybe we do. I'm still not sure how the 16 bit pointers work (disregarding DWARF for a minute) with a 32 bit ELF file...

Right now as I understand the pointer size in DataLayout is used for code generation, and CodePointerSize in MCAsmInfo if used for DWARF purposes, namely in AsmPrinter, so there is already a distinction. But since for all other targets these values seem to be the same it might have been used interchangeably, so this patch suggest to actually use CodePointerSize in DWARF-related information.

> If this is a regression, it'd be good to have context for where these things were changed to use DataLayout (a link to the commit(s)) and an explanation of why changing them back won't harm whatever motivated them to be changed to use DataLayout in the first place.

>From what I've found, these were initially written using DataLayout, because CodePointerSize didn't exist yet. But when it was introduced, most DWARF-related code was rewritten using it, but I assume that not in all places.
For example, DwarfDebug.cpp used to use DataLayout as well:
https://github.com/llvm/llvm-project/blob/release/4.x/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1694
but it got changed to CodePointerSize here and in all other places where DataLayout was used:
https://github.com/llvm/llvm-project/blob/release/5.x/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1747
So I think the changes this patch suggests just fix what was overlooked. It just never broke any code because the values are the same for other targets.
This might be the commit that changed things, just not all of them:
https://github.com/llvm/llvm-project/commit/dc77b2e960b6b6d4622c89ee8d324848ac6a0609

> Then, yes, as mentioned in that thread, adding a single entry point for these things would be important (just fixing the 5 sites directly seems like a bad idea if these should all be kept in sync and there are easy ways to mistakenly use the wrong value as has been the case in the past, perhaps).

If the patch just fixes what was overlooked, then changing directly is the way to go. But maybe CodePointerSize should be renamed to DWARFCodePointerSize or something else to avoid confusion from now on.

> But I'd like to understand the problem better too/first - it doesn't seem right to me that DWARF would need a different size from the system/target itself. (& no, I don't especially know what the "codepointer" size is as distinct from the address size either)

The need is to simplify implementation of 16-bit targets for use in LLDB. 
This way:

1. There is no need to change how LLDB handles ELF headers
2. Allows compatibility with GCC binaries (for MSP430, specifically)
3. Doesn't change anything for other targets

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148042/new/

https://reviews.llvm.org/D148042