[llvm-dev] [RFC] Lazy-loading of debug info metadata

Mon Apr 18 13:59:41 PDT 2016

>>>>>> 
>>>>>> On Fri, Apr 15, 2016 at 9:50 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>>>>>> Since I haven't heard any objections to the direction, I'm planning to
>>>>>> commit this (step 4) when I find some cycles; likely over the weekend.
>>>>>> 
>>>>>> To make this more concrete (in case someone is silently concerned) I've
>>>>>> posted my WIP patches below.  They apply cleanly to r266414.  There are
>>>>>> a few mechanical changes missing that are tracked in the commit
>>>>>> messages (such as LangRef, or new tests in some cases).  To get
>>>>>> `ninja check` to pass after the final commit you'll need to run the
>>>>>> attached upgrade script.  The clang changes are truly uninteresting so
>>>>>> I've left them out.
>>>>>> 
>>>>>>  - 0001/2: Prep commits I need to flush out (sorry for the noise).
>>>>>>  - 0003: Add an explicit type map for ODR type uniquing.
>>>>>>  - 0004: Prep.
>>>>>>  - 0005: Add ODR type uniquing of members.
>>>>>> 
> 
>> FYI, I hit some major problems with all of this.
>> 
>> The short version is that I was testing -flto memory usage with the wrong
>> command-line options (-gline-tables-only instead of -g), and now that
>> I've done it correctly I've discovered a major memory blowup.  I'm
>> still investigating.
>> 
>> I also have a question at the end about a semantic problem I uncovered.
>> 
>> --
>> 
>> The attached patches were supposed to finish this off -- they've been
>> mostly ready since Saturday -- but I had a big surprise when I finally
>> configured correctly (apparently the Apple CMake caches default to
>> line-tables only for RelWithDebInfo): 30GB peak for linking clang.
>> 
>> Then I went back and ran a fresh bootstrap of ToT, and got 27GB.  This is
>> pretty terrible (we were down to around 17GB back in October).  I'd been
>> doing spot checks of memory usage (glancing at top) and it was
>> surprisingly low; I just assumed someone had made improvements when I
>> wasn't looking.  Shame on me :(.
>> 
>> (I do I have a bot setup that is supposed to catch major regressions like
>> this (it doesn't exactly track memory usage directly, but it does a half-
>> bootstrap of -flto=full -g).  Except at some point it switched to using
>> a CMake cache, and it has been using -gline-tables-only as well:
>>  http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build/
>>  http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build/8599/consoleFull
> 

> Scratch this, that's the wrong bot.  This is the right one, and it still
> seems to be doing the right thing.  It just doesn't seem to be blowing up
> as badly:
>  http://lab.llvm.org:8080/green/job/llvm-stage2-cmake-RgLTO_build/buildTimeTrend

(Not blowing up at all.)

>> )
>> 
>> I hope the blowup is from the work I've done over the last couple of
>> weeks; I'll continue investigating today and revert whatever I have to.
>> Sorry for the noise and the breakage.  (It's also possible this is not a
>> regression from my recent work, but that would be harder to recover from;
>> let's hope it was from me.)
> 

What a waste of time.  It looks like the command-line tool I use to
collect memory usage doesn't give correct numbers anymore.  I spent
the morning trying to find a revision that *didn't* use a ton of
memory.  "Activity Monitor" and "top" give completely different
answers than my tool.  "Instruments.app" isn't scaling to linking
clang with `-flto -g`, but on smaller cases (like linking
verify-uselistorder) it agrees with "Activity Monitor" and "top".

I guess there hasn't been a major regression after all?

On the upside, I took the opportunity to collect new numbers for
running `llc` on `verify-uselistorder.lto.opt.bc`, which is the
series of commits starting with (and referencing) r236629.  It looks
like we've made improvements since I last checked (not entirely sure,
since I used a different verify-uselistorder.lto.opt.bc as input).

  - r236629 mentions a starting point of 1160 MB.
  - ...
  - r240736 mentions 718 MB.
  - ...
  - r266588 gave me 665 MB (with the new input).

Actual LTO is higher (~750 MB) since it runs the module linker and
the optimizer (and ld64 uses a little bit of memory itself).

(LTO for clang is still somewhere in the neighbourhood of ~16 GB.)