[PATCH] D59780: Support Intel Control-flow Enforcement Technology

Rui Ueyama via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 18 21:13:11 PDT 2019


ruiu added a comment.

Xiang,

I believe that your benchmark is inappropriate and does not correctly capture correct performance characteristics of PLT entries of different sizes.

First of all, your microbenchmark is too artificial. You created lots of PLT entries for functions that don't do anything but immediately return, and your program doesn't do anything but hammer PLT entries all the time. That usage pattern is very different from real programs.

Second of all, and more importantly, I couldn't reproduce your result with the PLT of size 32 bytes. One possibility is that the performance you noticed was due to the fact that half of the 24-byte PLT entries are not aligned to 16 byte boundaries. Intel x86-64 Optimization Manual recommends that you align jump target to 16 bytes, so it might not be a surprise, though.

My machine is AMD Ryzen Threadripper 2990WX, so your benchmark's behavior on my machine might be different from yours, but still I could observe that your benchmark slows down only when I choose to use 24-byte PLT entry, as you can see below.

  $ for i in `seq 1 10`; do echo -n "PLT16 "; ./main.16; echo -n "PLT24 "; ./main.24; echo -n "PLT32 "; ./main.32; echo; done
  PLT16 Time cost 2667549
  PLT24 Time cost 2891321
  PLT32 Time cost 2656223
  
  PLT16 Time cost 2682222
  PLT24 Time cost 3245131
  PLT32 Time cost 2682468
  
  PLT16 Time cost 2659353
  PLT24 Time cost 2895083
  PLT32 Time cost 2657525
  
  PLT16 Time cost 2680843
  PLT24 Time cost 2867365
  PLT32 Time cost 2676404
  
  PLT16 Time cost 2661513
  PLT24 Time cost 2931767
  PLT32 Time cost 2653236
  
  PLT16 Time cost 2680713
  PLT24 Time cost 2872741
  PLT32 Time cost 2678326
  
  PLT16 Time cost 2661513
  PLT24 Time cost 2892384
  PLT32 Time cost 2656584
  
  PLT16 Time cost 2683864
  PLT24 Time cost 2887594
  PLT32 Time cost 2658880
  
  PLT16 Time cost 2682057
  PLT24 Time cost 2911524
  PLT32 Time cost 2679250
  
  PLT16 Time cost 2654767
  PLT24 Time cost 2915021
  PLT32 Time cost 2659874

main.16 is linked with the regular lld. main.24 is linked with a modified lld whose output PLT size is 24 bytes. main.32 is linked with a modified lld whose output PLT size is 32 bytes.

As you can see, even with your synthesized microbenchmark, I couldn't see the difference of 16-byte PLT and 32-byte PLT. With that result, I have to say that I'm even more skeptical of the 2-PLT scheme than before.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59780/new/

https://reviews.llvm.org/D59780





More information about the llvm-commits mailing list