[PATCH] D96033: [clang-repl] Land initial infrastructure for incremental parsing

Wed May 19 09:31:52 PDT 2021

teemperor added a comment.

In D96033#2768940 <https://reviews.llvm.org/D96033#2768940>, @phosek wrote:

> In D96033#2767884 <https://reviews.llvm.org/D96033#2767884>, @teemperor wrote:
>
>> In D96033#2766502 <https://reviews.llvm.org/D96033#2766502>, @phosek wrote:
>>
>>> In D96033#2766372 <https://reviews.llvm.org/D96033#2766372>, @v.g.vassilev wrote:
>>>
>>>> In D96033#2766332 <https://reviews.llvm.org/D96033#2766332>, @phosek wrote:
>>>>
>>>>> We've started seeing `LLVM ERROR: out of memory` on our 2-stage LTO Linux builders after this change landed. It looks like linking `clang-repl` always fails on our bot, but I've also seen OOM when linking `ClangCodeGenTests` and `FrontendTests`. Do you have any idea why this could be happening? We'd appreciate any help since our bots have been broken for several days now.
>>>>
>>>> Ouch. Are the bot logs public? If not maybe a stacktrace could be useful. `clang-repl` combines a lot of libraries across llvm and clang that usually are compiled separately. For instance we put in memory most of the clang frontend, the backend and the JIT. Could it be we are hitting some real limit?
>>>
>>> Yes, they are, see https://luci-milo.appspot.com/p/fuchsia/builders/prod/clang-linux-x64, but there isn't much information in there unfortunately. It's possible that we're hitting some limit, but these bots use 32-core instances with 128GB RAM which I'd hope is enough even for the LTO build.
>>
>> I think the specs are fine for just building with LTO, but I am not sure if that's enough to for the worst case when running `ninja -j320` with an LTO build (which is what your job is doing). Can you try limiting your link jobs to something like 16 or 32 (e.g., `-DLLVM_PARALLEL_LINK_JOBS=32`)
>>
>> (FWIW, your go build script also crashes with OOM errors so you really are running low on memory on that node)`
>
> `-j320` is only used for the first stage compiler which uses distributed compilation and no LTO, the second stage which uses LTO and where we see this issue uses Ninja default, so `-j32` in this case.

I admit I don't really know the CI system on your node, but I assumed you're using `-j320` from this output which I got by clicking on "execution details" on the aborted stage of this build <https://luci-milo.appspot.com/ui/p/fuchsia/builders/prod/clang-linux-x64/b8846868883354028928/overview>:

  Executing command [
    '/b/s/w/ir/x/w/cipd/ninja',
    '-j320',
    'stage2-check-clang',
    'stage2-check-lld',
    'stage2-check-llvm',
    'stage2-check-polly',
  ]
  escaped for shell: /b/s/w/ir/x/w/cipd/ninja -j320 stage2-check-clang stage2-check-lld stage2-check-llvm stage2-check-polly
  in dir /b/s/w/ir/x/w/staging/llvm_build
  at time 2021-05-18T20:53:37.215574

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96033/new/

https://reviews.llvm.org/D96033