[Lldb-commits] [PATCH] D106584: [lldb] Improve checking of file cache read eligibility for mach-O

Sat Jul 24 01:23:56 PDT 2021

jasonmolenda added a comment.

Sorry for the delay in replying, I needed to think & look at this a bit.

In D106584#2900239 <https://reviews.llvm.org/D106584#2900239>, @augusto2112 wrote:

> Hi Jason,
>
> I could place this check on the caller, but I think here is the right place for it, maybe I can convince you.
>
>> the perf impact of forcing all reads to be out of memory would be dramatically bad for a remote system debug session, this can't be done.
>
> On macOS, this would only affect images that have the LC_SEGMENT_SPLIT_INFO load command.

I tried expanding some shared caches and played around a bit to make sure I understood what we're dealing with.  You know all of this, but just to make sure everyone is following along, we have three possible states:

1. Binary on disk, before it gets included in a shared cache.  Will have LC_SEGMENT_SPLIT_INFO.
2. Binary embedded in the shared cache binary blog (either on disk or in memory)
3. Binary on disk, extracted from the shared cache.  LC_SEGMENT_SPLIT_INFO is absent, cannot be reconstructed from the shared cache binary blob.

With simulator process debugging, the debugger can find the pre-shared-cache distinct binaries on the local system, with the LC_SEGMENT_SPLIT_INFOs.

With iOS debugging, the debugger will have post-shared-cache distinct binaries on the local system, without LC_SEGMENT_SPLIT_INFOs.

I'm not sure what watch debugging looks like -- instead of copying the shared cache off of a watch and expanding it, Xcode downloads a disk image from Apple with the shared cache contents.  I don't know if that disk image has pre-shared-cache or post-shared-cache binaries; it's likely that pre-shared-cache binaries are included in the disk image.

In your bug, you've got a table which has relative offsets in it.  The offsets are to things in the binary image.  They may cross segment boundaries. The binary-on-disk will have TEXT, DATA, and LINKEDIT next to one another.  When it's incorporated into the shared cache, the segments will be separated - and these offsets will be updated in the process of creating the shared cache.  (are the offsets in the pre-shared-cache binary even valid?  Not important)  You have a bug where you're reading these offsets and getting them from the pre-shared-cache on-disk-binary, which are incorrect, and want to force reading them from the shared cache memory.

(post-shared-cache on-disk-binaries may have the same offsets as it was in the original shared cache -- but it's actually going to be a MORE subtle bug in that case because people plug in different iPhones and Xcode only extracts the shared cache for one of them if they're all running the same build.  But different model iphones will have different shared caches, so using the offsets from the post-shared-cache on-disk-binary-image from one device for another device will be wrong.)

The simulator version of this bug will be fixed, with less perf hit but still probably too much, by ignoring the pre-shared-cache on-disk-binary -- we'll have to read all of the symbol names out of memory via debugserver.  If this was over a USB cable, you'd be looking at a minute hit - but for a local debug session, it's probably closer to the order of 5-6 seconds.  I'm just guessing, I haven't measured it.  But it's a lot of packet traffic.

You'll still have the bug in the scenario outlined above, where I have two different iOS devices running the same iOS build.  Xcode will expand the shared cache for the first one I plug in, and lldb will use the same binary images (without LC_SEGMENT_SPLIT_INFOs) for both debug sessions.  And it's a questionmark about watch debugging - I suspect those will have LC_SEGMENT_SPLIT_INFOs and so app launch perf would take a gigantic hit to read symbol names all out of memory at debug time.

I think we need to concentrate on special casing the code that reads these offsets, to force it to use live memory every time.  I know in theory this problem could happen anywhere -- we could have these cross-segments offsets anywhere in the binary -- but in practice, we know it currently happens in one spot that the debugger looks at.

I admit I'd be a lot happier if we could always use the file cache for read-only Sections and know that the data in the file matches the data in the memory.  But if we only have one type of data where this isn't safe (so far?! :), I'm not willing to toss the optimization.  At the very least, we'd need to find ways to retain the same perf characteristics as we have right now, and I think that might be tricky.

>> I do like the debug-lldb check that the file cache read matches the live memory part. :)
>
> At least that we can keep then :) Do you think it's should be a crash, or maybe a log? Wasn't sure what was best.

I think an assert is good, there's no scenario where this should get hit -- if it does, then we really have a bug in using the filecache data and we need to halt immediately in CI or on an lldb developer's desktop.  I'm a little less sure about thing this on for DEBUG builds all the time because it does change lldb's perf behavior a lot, specifically adding lots of packet traffic, but I think we should give it a try and see if it turns out to be too much of a change.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106584/new/

https://reviews.llvm.org/D106584