[clang] [lld] [llvm] [Windows] Add support for emitting PGO/LTO magic strings in the Windows PE debug directory (PR #114260)

Wed Nov 6 07:01:55 PST 2024

mikolaj-pirog wrote:

> > I would like to benchmark `lld` after this change, since I have added a loop that goes through every section of every object file. Could someone point in the direction of a good benchmark for that? I was thinking I can benchmark on the linking of `clang` or any other big project as a reference
> 
> I think benchmarking `clang.exe` itself would be a good testbed. Build the toolchain once in Release (first stage) then in another build folder, build it a second time (second stage), but using `clang-cl.exe` and `lld-link.exe` from the first build folder. Use ninja not MSBuild. Once the second stage has completed, delete `clang.exe` from output folder and pass `ninja clang -v -d keeprsp` on the command-line. That will show the LLD command line which you can re-run and profile. You can also use `lld-link ... --time-trace` and add a more specific `llvm::TimeTraceScope` to enclose the code that parses all the sections.
> 
> If you have trouble building all this I can provide more detailed instructions, please let me know.

Thanks for the suggestions. I have roughly benchmarked this and this change basically doesn't have an impact on lld performance. The overall runtime is essentially the same (around ~2s for RelWithDebInfo build). I have benchmarked under VTune and the function in question take too little time for VTune to record any valuable data (reporting they take 0.0s in both cases). The function that calls createMiscChunks (where my change resides), Writer::run(), doesn't appear in the ~70 most expensive function, and it also calls a bunch of other stuff on top of createMiscChunks. I have used the `--time-trace` functionality to measure more accurately: a script runs the linking x times, saving the trace file and greps the timer for the whole `Writer::run()` function. Here are the results (main first, my change second):
![image](https://github.com/user-attachments/assets/c82822e1-4d89-49cd-896f-59ebc7e54cfd)
![Screenshot 2024-11-06 154022](https://github.com/user-attachments/assets/ae498d52-1409-44d9-b810-a1630604f25b)

So, the whole function `Write::run()` function is slower by 10ms (comparing best vs best, worst vs worst). I think this change introduces miniscule slowdown when compared to the whole linker machinery. I don't think it's necessary to measure under `hyperfine` or any other benchmarking tool, given the results presented (even if my results are off by 4x, they are still miniscule). 

https://github.com/llvm/llvm-project/pull/114260