[llvm] [DebugInfo] Emit negative DW_AT_bit_offset in explicit signed form (PR #87994)

Mon Apr 8 07:08:41 PDT 2024

llvmbot wrote:




@llvm/pr-subscribers-debuginfo

Author: Victor Campos (vhscampos)

<details>
<summary>Changes</summary>

Before this patch, the value of DW_AT_bit_offset, used for bitfields before DWARF version 4, was always emitted as an unsigned integer using the form DW_FORM_data<n>. If the value was originally a signed integer, for instance in the case of negative offsets, it was up to debug information consumers to re-cast it to a signed integer.

This is problematic since the burden of deciding if the value should be read as signed or unsigned was put onto the debug info consumers: the DWARF specification doesn't define DW_AT_bit_offset's underlying type. If a debugger decided to interpret this attribute in the form data<n> as unsigned, then negative offsets would be completely broken.

The DWARF specification version 3 mentions in the Data Representation section, page 127:

> If one of the DW_FORM_data<n> forms is used to represent a signed or
unsigned integer, it can be hard for a consumer to discover the context necessary to determine which interpretation is intended. Producers are therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for signed and unsigned integers respectively, rather than DW_FORM_data<n>.

Therefore, the proposal is to use DW_FORM_sdata, which is explicitly signed. This is an indication to consumers that the offset must be parsed unambiguously as a signed integer.

Finally, gcc already uses DW_FORM_sdata for negative offsets, fixing the potential ambiguity altogether.

This patch mimics gcc's behaviour by emitting negative values of DW_AT_bit_offset using the DW_FORM_sdata form. This eliminates any potential misinterpretation.

One could argue that all values should use DW_FORM_sdata, but for the sake of parity with gcc, it is safe to restrict the change to negative values.

---
Full diff: https://github.com/llvm/llvm-project/pull/87994.diff


4 Files Affected:

- (modified) llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp (+9-2) 
- (modified) llvm/test/DebugInfo/ARM/bitfield.ll (+1-1) 
- (modified) llvm/test/DebugInfo/NVPTX/packed_bitfields.ll (+1-1) 
- (modified) llvm/test/DebugInfo/X86/packed_bitfields.ll (+1-1) 


``````````diff

diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
index c40beeeb925e0e..a21c7d80cbf25f 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
@@ -30,6 +30,7 @@
 #include "llvm/Target/TargetLoweringObjectFile.h"
 #include <cassert>
 #include <cstdint>
+#include <limits>
 #include <string>
 #include <utility>
 
@@ -1644,7 +1645,8 @@ DIE &DwarfUnit::constructMemberDIE(DIE &Buffer, const DIDerivedType *DT) {
         addUInt(MemberDie, dwarf::DW_AT_byte_size, std::nullopt, FieldSize / 8);
       addUInt(MemberDie, dwarf::DW_AT_bit_size, std::nullopt, Size);
 
-      uint64_t Offset = DT->getOffsetInBits();
+      assert(DT->getOffsetInBits() <= std::numeric_limits<int64_t>::max());
+      int64_t Offset = DT->getOffsetInBits();
       // We can't use DT->getAlignInBits() here: AlignInBits for member type
       // is non-zero if and only if alignment was forced (e.g. _Alignas()),
       // which can't be done with bitfields. Thus we use FieldSize here.
@@ -1664,7 +1666,12 @@ DIE &DwarfUnit::constructMemberDIE(DIE &Buffer, const DIDerivedType *DT) {
         if (Asm->getDataLayout().isLittleEndian())
           Offset = FieldSize - (Offset + Size);
 
-        addUInt(MemberDie, dwarf::DW_AT_bit_offset, std::nullopt, Offset);
+        if (Offset < 0)
+          addSInt(MemberDie, dwarf::DW_AT_bit_offset, dwarf::DW_FORM_sdata,
+                  Offset);
+        else
+          addUInt(MemberDie, dwarf::DW_AT_bit_offset, std::nullopt,
+                  (uint64_t)Offset);
         OffsetInBytes = FieldOffset >> 3;
       } else {
         addUInt(MemberDie, dwarf::DW_AT_data_bit_offset, std::nullopt, Offset);
diff --git a/llvm/test/DebugInfo/ARM/bitfield.ll b/llvm/test/DebugInfo/ARM/bitfield.ll
index 5bd06b785b159c..672c61db6f4912 100644
--- a/llvm/test/DebugInfo/ARM/bitfield.ll
+++ b/llvm/test/DebugInfo/ARM/bitfield.ll
@@ -12,7 +12,7 @@
 ; CHECK:          DW_AT_name {{.*}} "reserved"
 ; CHECK:          DW_AT_byte_size  {{.*}} (0x04)
 ; CHECK:          DW_AT_bit_size   {{.*}} (0x1c)
-; CHECK:          DW_AT_bit_offset {{.*}} (0xfffffffffffffff8)
+; CHECK:          DW_AT_bit_offset {{.*}} (-8)
 ; CHECK:          DW_AT_data_member_location {{.*}} (DW_OP_plus_uconst 0x0)
 
 %struct.anon = type { i8, [5 x i8] }
diff --git a/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll b/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll
index e2097d7f49b48f..62ffa0a4001f18 100644
--- a/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll
+++ b/llvm/test/DebugInfo/NVPTX/packed_bitfields.ll
@@ -16,7 +16,7 @@
 ; CHECK-NEXT: .b8 1    // DW_AT_byte_size
 ; CHECK-NEXT: .b8 6    // DW_AT_bit_size
 ; Negative offset must be encoded as an unsigned integer.
-; CHECK-NEXT: .b64 0xffffffffffffffff // DW_AT_bit_offset
+; CHECK-NEXT: .b8 127  // DW_AT_bit_offset
 ; CHECK-NEXT: .b8 2    // DW_AT_data_member_location
 
 %struct.anon = type { i16 }
diff --git a/llvm/test/DebugInfo/X86/packed_bitfields.ll b/llvm/test/DebugInfo/X86/packed_bitfields.ll
index 0e541f09d22703..614fa59c367848 100644
--- a/llvm/test/DebugInfo/X86/packed_bitfields.ll
+++ b/llvm/test/DebugInfo/X86/packed_bitfields.ll
@@ -15,7 +15,7 @@
 ; CHECK-NOT: DW_TAG_member
 ; CHECK:      DW_AT_byte_size  {{.*}} (0x01)
 ; CHECK-NEXT: DW_AT_bit_size   {{.*}} (0x06)
-; CHECK-NEXT: DW_AT_bit_offset {{.*}} (0xffffffffffffffff)
+; CHECK-NEXT: DW_AT_bit_offset {{.*}} (-1)
 ; CHECK-NEXT: DW_AT_data_member_location {{.*}} ({{.*}}0x0{{0*}})
 
 ; ModuleID = 'repro.c'

``````````

</details>


https://github.com/llvm/llvm-project/pull/87994