[PATCH] D59780: Support Intel Control-flow Enforcement Technology
Rui Ueyama via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 18 21:13:11 PDT 2019
ruiu added a comment.
Xiang,
I believe that your benchmark is inappropriate and does not correctly capture correct performance characteristics of PLT entries of different sizes.
First of all, your microbenchmark is too artificial. You created lots of PLT entries for functions that don't do anything but immediately return, and your program doesn't do anything but hammer PLT entries all the time. That usage pattern is very different from real programs.
Second of all, and more importantly, I couldn't reproduce your result with the PLT of size 32 bytes. One possibility is that the performance you noticed was due to the fact that half of the 24-byte PLT entries are not aligned to 16 byte boundaries. Intel x86-64 Optimization Manual recommends that you align jump target to 16 bytes, so it might not be a surprise, though.
My machine is AMD Ryzen Threadripper 2990WX, so your benchmark's behavior on my machine might be different from yours, but still I could observe that your benchmark slows down only when I choose to use 24-byte PLT entry, as you can see below.
$ for i in `seq 1 10`; do echo -n "PLT16 "; ./main.16; echo -n "PLT24 "; ./main.24; echo -n "PLT32 "; ./main.32; echo; done
PLT16 Time cost 2667549
PLT24 Time cost 2891321
PLT32 Time cost 2656223
PLT16 Time cost 2682222
PLT24 Time cost 3245131
PLT32 Time cost 2682468
PLT16 Time cost 2659353
PLT24 Time cost 2895083
PLT32 Time cost 2657525
PLT16 Time cost 2680843
PLT24 Time cost 2867365
PLT32 Time cost 2676404
PLT16 Time cost 2661513
PLT24 Time cost 2931767
PLT32 Time cost 2653236
PLT16 Time cost 2680713
PLT24 Time cost 2872741
PLT32 Time cost 2678326
PLT16 Time cost 2661513
PLT24 Time cost 2892384
PLT32 Time cost 2656584
PLT16 Time cost 2683864
PLT24 Time cost 2887594
PLT32 Time cost 2658880
PLT16 Time cost 2682057
PLT24 Time cost 2911524
PLT32 Time cost 2679250
PLT16 Time cost 2654767
PLT24 Time cost 2915021
PLT32 Time cost 2659874
main.16 is linked with the regular lld. main.24 is linked with a modified lld whose output PLT size is 24 bytes. main.32 is linked with a modified lld whose output PLT size is 32 bytes.
As you can see, even with your synthesized microbenchmark, I couldn't see the difference of 16-byte PLT and 32-byte PLT. With that result, I have to say that I'm even more skeptical of the 2-PLT scheme than before.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D59780/new/
https://reviews.llvm.org/D59780
More information about the llvm-commits
mailing list