[Lldb-commits] [PATCH] D108233: WIP: Add minidump save-core functionality to ELF object files

Mon Aug 30 03:46:41 PDT 2021

Aj0SK marked 8 inline comments as done and an inline comment as not done.
Aj0SK added inline comments.

================
Comment at: lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp:519
+lldb_private::Status
+MinidumpFileBuilder::AddMemoryList(const lldb::ProcessSP &process_sp) {
+  Status error;
----------------
clayborg wrote:
> Aj0SK wrote:
> > Aj0SK wrote:
> > > clayborg wrote:
> > > > This should take in the CoreStyle and only emit what was asked for.
> > > > 
> > > > If style is set to "eSaveCoreStackOnly", then grab only the memory regions for each thread stack pointer and save only those. 
> > > > 
> > > > We can't tell from LLDB APIs if a page is dirty, so if we get "eSaveCoreDirtyOnly", then we will need to save all memory regions that have write permissions.
> > > > 
> > > > If it is either "eSaveCoreFull"  or "eSaveCoreUnspecified" then save everything.
> > > I agree that this code should take care of a CoreStyle but it poses a significant problem to this implementation as it was mainly designed to save smaller Minidumps. This solution stores all of the information in memory before dumping them (this eases an implementation quite a bit because of the way how Minidump pointers (offsets) are working). This implementation, on my machine, exhausted memory before the minidump could be written in case of a full memory dump. At this time, we don't plan to reimplement this solution in the way it would allow to store all the data on disc at the time of minidump creation so there are two possible solutions that I see:
> > > 
> > > 1. If the type of the CoreStyle is full or dirty, this plugin will return false from the SaveCore function. Then maybe the default CoreStyle for this plugin should be changed to "eSaveCoreStackOnly".
> > > 
> > > 2. To the best of my knowledge, it should be theoretically possible to reimplement Dump() method to sort of take special care of a MemoryListStream, dumping also the memory information at the end of the minidump file as I think the memory dump is the most stressful for the memory and otherwise there is no problem with this.
> > To state my preference, I would rather stick to the first option and landed a minimum viable product. The only reason for this is that I have this version way better tested and also I am not entirely sure about how minidump deals with the whole memory dumps as it can be a little problematic for a big memory chunks... Then, I would rather add the necessary changes in the next patch...
> That is fine. I was just looking at the implementation below and I think I might know why this was taking up too much space. I will comment below, and maybe this will fix the issues of storing way too much info in the miniump
In fact, we are mostly interested in the client-server setup where every spared MB counts. In case it would be also benefiting for LLDB, I can also come up with an implementation that doesn't save full 8MB per stack but also limits this... Most of the solutions, that were used for implementing this functionality(breakpad, crashpad) limit stack information etc. [[ https://chromium.googlesource.com/breakpad/breakpad/+/refs/heads/main/src/google_breakpad/common/minidump_format.h#276 | breakpad ]].

================
Comment at: lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp:522
+  MemoryRegionInfo range_info;
+  error = process_sp->GetMemoryRegionInfo(0, range_info);
+
----------------
clayborg wrote:
> FYI: memory region for address zero often has no permissions, so there is no need to actually try and save this memory. For any memory region be sure the check if it has permissions using:
> 
> ```
> if (region_info.GetLLDBPermissions() != 0)
> ```
> before trying to read or save this off. I will comment below
Thanks, good to know.

================
Comment at: lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp:530
+  // Get interesting addresses
+  std::vector<size_t> interesting_addresses;
+  auto thread_list = process_sp->GetThreadList();
----------------
clayborg wrote:
> Should we have a std::set<lldb::addr_t> to store the base of any regions that have already been saved? Many PC values could be in the same regions (like when many threads are waiting in syscalls).
In the proposed approach, I believe so. In the old approach this was taken care of by the fact that we only went through the every segment only once.

================
Comment at: lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp:549
+    bool is_interesting = false;
+    for (size_t interesting_address : interesting_addresses)
+      if (interesting_address >= addr && interesting_address < addr + size) {
----------------
Aj0SK wrote:
> clayborg wrote:
> > Shouldn't this be the main loop that would replace the "while (range_info.GetRange().GetRangeBase() != LLDB_INVALID_ADDRESS)" loop? I would think that the main loop wiould be something like:
> > 
> > ```
> > std::set<addr_t> visited_region_base_addresses;
> > for (size_t interesting_address : interesting_addresses) {
> >   error = process_sp->GetMemoryRegionInfo(interesting_address, range_info);
> >   // Skip failed memory region requests or any regions with no permissions.
> >   if (error.Fail() || range_info.GetPermissions() == 0)
> >     continue;
> >   const addr_t addr = range_info.GetRange().GetRangeBase();
> >   // Skip any regions we have already saved out.
> >   if (visited_region_base_addresses.insert(addr).second == false)
> >     continue;
> >   const addr_t size = range_info.GetRange().GetByteSize();
> >   if (size == 0)
> >     continue;
> >   auto data_up = std::make_unique<DataBufferHeap>(size, 0);
> >   const size_t bytes_read = process_sp->ReadMemory(addr, data_up->GetBytes(), size, error);
> >   if (bytes_read == 0)
> >     continue;
> >   // We have a good memory region with valid bytes to store.
> >   LocationDescriptor memory_dump;
> >   memory_dump.DataSize = static_cast<llvm::support::ulittle32_t>(size);
> >   memory_dump.RVA =static_cast<llvm::support::ulittle32_t>(GetCurrentDataEndOffset());
> >   MemoryDescriptor memory_desc;
> >   memory_desc.StartOfMemoryRange = static_cast<llvm::support::ulittle64_t>(addr);
> >   memory_desc.Memory = memory_dump;
> >   mem_descriptors.push_back(memory_desc);
> >   m_data.AppendData(data_up->GetBytes(), bytes_read);
> > }
> > ```
> > 
> > 
> This kind of an approach was something I also thought about and in fact, I was mainly inspired by this [[ https://github.com/llvm-mirror/lldb/blob/d01083a850f577b85501a0902b52fd0930de72c7/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp#L6051 | approach ]]. Also, the previous comment about the need of saving the loaded modules addresses in a set doesn't apply to my solution, am I correct? Just to clear it a little bit, I like your solution though and it's better when I think about it. Thanks for the suggestion!
One thing, that could be a little problematic with the new approach is, if the possibility to save full/dirty pages dump would be requested... In this case, we still need to go through all of the sections...

================
Comment at: lldb/source/Plugins/ObjectFile/Minidump/MinidumpFileBuilder.cpp:568
+      auto data_up = std::make_unique<DataBufferHeap>(size, 0);
+      process_sp->ReadMemory(addr, data_up->GetBytes(), size, error);
+
----------------
clayborg wrote:
> You should listen to the return value from this ReadMemory call and only write this many bytes to the memory region. See my above proposed full loop source code abiove, as it reads the memory first before creating the LocationDescriptor and MemoryDescriptor and it skips to the next address in the loop if things go bad during memory read. It also stores the number if bytes that were actually read in case the process memory read comes back with fewer or no bytes.
Also done this on other places where I use ReadMemory...

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108233/new/

https://reviews.llvm.org/D108233