[PATCH] D51089: [LLD] [COFF] Add support for creating range extension thunks for ARM

Wed Aug 29 02:19:54 PDT 2018

ruiu added inline comments.

================
Comment at: COFF/Chunks.cpp:50
+// indicates.
+void SectionChunk::finalizeContents() {
+  if (!RelocTargets.empty())
----------------
mstorsjo wrote:
> ruiu wrote:
> > We generally don't micro-optimize code, but for relocations, we do, because the number of relocations can be an order of tens of millions for large programs. Spending one more microsecond for each relocation adds up to one second if your program has one million relocations. This function is a bit concerning in that regard. Could you measure the performance impact?
> > 
> > Also, it looks odd that you do this in `finalizeContents`, as it doesn't correspond to finalizing contents. Perhaps this function should be given a new name.
> I tried measuring it, and I think it's making things slower, but it's mostly within measurement noise.
> 
> My testcase was linking a 66 MB clang.exe (for x86_64). Before this change, the fastest link was in 480 ms, after the change the fastest link was 520 ms. But in both cases the runtimes occasionally go up to over 600 ms. So it's not huge but I think it's consistently measurable.
> 
> Do you have any other suggestions on how to achieve this without affecting performance? Only use the RelocTargets vector if `Machine==ARMNT`? Or make it a `DenseMap` for overridden targets, which is empty for all other cases than when we have added thunks?
> 
> Yes, it's a bit odd with `finalizeContents`, maybe a new method `initRelocTargets` which just gets called once before we start doing `assignAddresses`?
Of 520ms, it'd be interesting to know how much time this function is spending. gprof might help, but I'm not sure if it works on Windows. 480ms to 520ms isn't I think a marginal difference; it's 10% slowdown if the measurement is accurate.

One idea to make it faster (and could potentially be faster than it is now) is to parallelize it. I believe you can make it a separate function, say `readRelocTargets`, and call on all input sections in parallel. It should be safe to do because filling this new vector doesn't affect other threads.

https://reviews.llvm.org/D51089