[Lldb-commits] [PATCH] D70840: [LLDB] [DWARF] Strip out the thumb bit from addresses on ARM

Mon Dec 2 09:02:05 PST 2019

clayborg added a comment.

So some background on how address masks are handled in LLDB:

Typically the way we have tried to take care of the extra Thumb bit for ARM using:

  lldb::addr_t Address::GetCallableLoadAddress(Target *target, bool is_indirect = false) const;
  lldb::addr_t GetOpcodeLoadAddress(Target *target, AddressClass addr_class = AddressClass::eInvalid) const;

The first will add the extra bit to an address if needed. The latter will strip the bit if needed. This does require a target though and the target uses the "Architecture" class for ARM to do the work of using the mask. Not sure if we want to try to get an architecture class and use that here for stripping the bit instead of using an address mask?

Also, any lldb_private::Address can be asked for its address class:

  AddressClass Address::GetAddressClass() const;

This will return "eCode" for ARM code and "eCodeAlternateISA" for Thumb code. This is resolved by the ObjectFile::GetAddressClass:

  /// Get the address type given a file address in an object file.
  ///
  /// Many binary file formats know what kinds This is primarily for ARM
  /// binaries, though it can be applied to any executable file format that
  /// supports different opcode types within the same binary. ARM binaries
  /// support having both ARM and Thumb within the same executable container.
  /// We need to be able to get \return
  ///     The size of an address in bytes for the currently selected
  ///     architecture (and object for archives). Returns zero if no
  ///     architecture or object has been selected.
  virtual AddressClass GetAddressClass(lldb::addr_t file_addr);

So currently no code in LLDB tries to undo the address masks that might be in the object file or debug info, and we take care of it after the fact. Early in the ARM days there used to be extra symbols that were added to the symbol table with names like "$a" for ARM, "$t" for Thumb and "$d" for data. There would be multiple of these symbols in an ordered vector of symbols that would create a CPU map. This was early in the ARM days before the branch instruction would watch for bit zero. Later ARM architectures started using bit zero to indicate which mode to put the processor in instead of using an explicit "branch to arm" or "branch to thumb" instructions. When this new stuff came out bit zero started showing up in symbol tables. So the current code allows for either the old style (CPU map in symbol table with $a $t and $d symbols) and the new style (bit zero set and no CPU map).

================
Comment at: lldb/include/lldb/Symbol/LineTable.h:333
+
+  bool m_clear_address_zeroth_bit = false;
 };
----------------
Might be nice to let the line table parse itself first, and then in a post production step clean up all the addresses? Maybe

```
void LineTable::Finalize(Architecture *arch);
```

Then we let the architecture plug-in handle any stripping using:

```
lldb::addr_t Architecture::GetOpcodeLoadAddress(lldb::addr_t addr, AddressClass addr_class) const;
```

The ArchitectureArm plugin does this:

```
addr_t ArchitectureArm::GetOpcodeLoadAddress(addr_t opcode_addr,
                                             AddressClass addr_class) const {
  switch (addr_class) {
  case AddressClass::eData:
  case AddressClass::eDebug:
    return LLDB_INVALID_ADDRESS;
  default: break;
  }
  return opcode_addr & ~(1ull);
}
```

================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugAranges.cpp:19-20
 // Constructor
-DWARFDebugAranges::DWARFDebugAranges() : m_aranges() {}
+DWARFDebugAranges::DWARFDebugAranges(dw_addr_t addr_mask)
+    : m_aranges(), m_addr_mask(addr_mask) {}

----------------
Use Architecture plug-in instead of hard coded mask.

================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugAranges.h:53
   RangeToDIE m_aranges;
+  dw_addr_t m_addr_mask;
 };
----------------
See comment in LineTable.h above.

================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp:41-42

-  m_cu_aranges_up = std::make_unique<DWARFDebugAranges>();
+  m_cu_aranges_up =
+      std::make_unique<DWARFDebugAranges>(m_dwarf.GetAddressMask());
   const DWARFDataExtractor &debug_aranges_data =
----------------
Use Architecture plug-in instead of hard coded mask.

================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:258
         case DW_AT_low_pc:
-          lo_pc = form_value.Address();
+          lo_pc = form_value.Address() & m_addr_mask;

----------------
Use Architecture plug-in instead of hard coded mask.

================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:273
               form_value.Form() == DW_FORM_GNU_addr_index) {
-            hi_pc = form_value.Address();
+            hi_pc = form_value.Address() & m_addr_mask;
           } else {
----------------
ditto...

================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h:44-45

+  void SetAddrMask(dw_addr_t addr_mask) { m_addr_mask = addr_mask; }
+
   void BuildAddressRangeTable(const DWARFUnit *cu,
----------------
remove

================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h:190
   dw_tag_t m_tag = llvm::dwarf::DW_TAG_null;
+  dw_addr_t m_addr_mask = ~0ull;
 };
----------------
Use Architecture plug-in instead of hard coded mask.

================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp:62
   // parse
+  m_first_die.SetAddrMask(m_dwarf.GetAddressMask());
   const DWARFDataExtractor &data = GetData();
----------------
remove

================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp:159
   DWARFDebugInfoEntry die;
+  die.SetAddrMask(m_dwarf.GetAddressMask());

----------------
remove

================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp:726
   if (m_func_aranges_up == nullptr) {
-    m_func_aranges_up.reset(new DWARFDebugAranges());
+    m_func_aranges_up.reset(new DWARFDebugAranges(m_dwarf.GetAddressMask()));
     const DWARFDebugInfoEntry *die = DIEPtr();
----------------
post produce with Architecture plug-in as mentioned in LineTable.h?

================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp:4006-4014
+
+dw_addr_t SymbolFileDWARF::GetAddressMask() const {
+  if (ArchSpec arch = m_objfile_sp->GetArchitecture()) {
+    if (arch.GetTriple().getArch() == llvm::Triple::arm ||
+        arch.GetTriple().getArch() == llvm::Triple::thumb)
+      return ~1ull;
+  }
----------------
remove

================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h:321

+  dw_addr_t GetAddressMask() const;
+
----------------
Use Architecture plug-in instead of hard coded mask.

================
Comment at: lldb/source/Symbol/LineTable.cpp:23-27
+  if (ArchSpec arch = m_comp_unit->GetModule()->GetArchitecture()) {
+    if (arch.GetTriple().getArch() == llvm::Triple::arm ||
+        arch.GetTriple().getArch() == llvm::Triple::thumb)
+      m_clear_address_zeroth_bit = true;
+  }
----------------
Use Architecture plug-in instead of hard coded mask or post produce using Architecture plug-in.

================
Comment at: lldb/source/Symbol/LineTable.cpp:39-40
                                 bool is_terminal_entry) {
+  if (m_clear_address_zeroth_bit)
+    file_addr &= ~1ull;
   Entry entry(file_addr, line, column, file_idx, is_start_of_statement,
----------------
Use Architecture plug-in instead of hard coded mask or post produce using Architecture plug-in.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70840/new/

https://reviews.llvm.org/D70840