[LLVMdev] [lld] Current performance issues

Fri Dec 6 18:07:57 PST 2013

On 12/6/2013 7:30 PM, Michael Spencer wrote:
> So I started doing performance analysis again, and we've slowed down
> quite a bit. My current test is statically linking clang for Linux on
> Windows. I currently care mostly about Windows performance as that's
> where we run it.
While we are at this, I think Memory usage also needs to be measured.
> Here's a rough breakdown of time usage (doesn't add up to %100 because
> of rounding):
I think in addition to this if we have a way to run non-dependent passes 
to run concurrently (Concurrent PassManager) it would be awesome.

> %8.8 - fs::get_magic from the driver.
Wow!
> %0.8 - Reading the files on the command line.
> %29 - Resolver. This ~%90 of this is reading objects out of archives.
> This can be parallelized, and I have an outdated patch which does
> this.
How do you plan to read them in parallel ? The archive member is needed 
only when a symbol is undefined and that is defined in the archive. If 
you read objects in advance, this might increase the memory footprint.
> %51 - Passes. Mostly the layout pass. And in the layout pass it's
> mostly due to cache misses. I've already tried parallelizing the sort,
> it doesn't help much.
This is because the Ordering pass is done serially. Goes over the 
follow-on, and builds the preceded by table, and the ingroup reference 
table. I was thinking on this for a while and thought if we could build 
the tables in parallel (follow-on,preceded-by,in-group) and merge them 
serially, it might be faster.

Thoughts ?

> %9   - Writer. Most of this is in prep work.
With Linker scripts I think this might be even more, depending on how we 
really implement all the complex semantics.

> The actual writing to
> disk part and applying relocations is very small.
> %1   - Unaccounted for.
>
> I'm going to do some work to solve the get_magic and resolver issue
> with threads. I think we really need to look into how the layout pass
> is handled. If the cache effects are bad enough, we may actually need
> to change to a non-virtual POD based interface for atoms. Meaning that
> readers fill in atom data at the start, instead of figuring it out at
> runtime.
Couldnt follow about the non-virtual POD interface, can you give more info ?

Thanks

Shankar Easwaran

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation