>> Is this a run with debug info? i.e. are you passing -g to the per-TU
>> pipeline? I'm inclined to think this is mostly an additive effect
>> adding matchers here and there that don't really hurt small testcases
>> but we pay the debt over time (in particular for LTO). Side note, I
>> noticed (and others did as well) that instcombine is way slower with
>> `-g` on (one of the reasons could be we walking much longer use lists,
>> due to the dbg use). Do you have numbers of instcombine ran on IR with
>> and without debug info?
> I do have the numbers for the same app with and without debug info. The results above are for the no-debug version.
> Total execution time of -O3 is 34% slower with debug info. The size of the debug IR is 162M vs 39M no-debug. Both profiles look relatively similar with the exception of bit code writer and verifier taking a larger share in the -g case.
> Looking at InstCombine, it’s 23% slower. One notable thing is that CallInst takes significantly larger share with -g: 5s vs 13s, which translates to about half of the InstCombine slowdown. Need to understand why.

Ah, it’s all those calls to @llvm.dbg.* functions. I’ll explore if they can be safely ignored by InstCombine.

> ComputeKnownBits takes about the same time and other visitors have elevated times I would guess due to the need to propagate debug info.
>>> I wanted to see what transformations InstCombine actually performs. Using -debug option turned out not to be very scalable. Never mind the large output size of the trace, running "opt -debug -instcombine” on anything other than a small IR is excruciatingly slow. Out of curiosity I profiled it too: 96% of the time is spent decoding and printing instructions. Is this a known problem? If so, what are the alternatives for debugging large scale problem? If not, it’s possibly another item to add to the to-do list.
>> You may consider adding statistics (those should be much more
>> scalable) although more coarse.
>> Thanks!
