[PATCH] D59780: Support Intel Control-flow Enforcement Technology

Thu Apr 18 21:13:11 PDT 2019

ruiu added a comment.

Xiang,

I believe that your benchmark is inappropriate and does not correctly capture correct performance characteristics of PLT entries of different sizes.

First of all, your microbenchmark is too artificial. You created lots of PLT entries for functions that don't do anything but immediately return, and your program doesn't do anything but hammer PLT entries all the time. That usage pattern is very different from real programs.

Second of all, and more importantly, I couldn't reproduce your result with the PLT of size 32 bytes. One possibility is that the performance you noticed was due to the fact that half of the 24-byte PLT entries are not aligned to 16 byte boundaries. Intel x86-64 Optimization Manual recommends that you align jump target to 16 bytes, so it might not be a surprise, though.

My machine is AMD Ryzen Threadripper 2990WX, so your benchmark's behavior on my machine might be different from yours, but still I could observe that your benchmark slows down only when I choose to use 24-byte PLT entry, as you can see below.

  $ for i in `seq 1 10`; do echo -n "PLT16 "; ./main.16; echo -n "PLT24 "; ./main.24; echo -n "PLT32 "; ./main.32; echo; done
  PLT16 Time cost 2667549
  PLT24 Time cost 2891321
  PLT32 Time cost 2656223

  PLT16 Time cost 2682222
  PLT24 Time cost 3245131
  PLT32 Time cost 2682468

  PLT16 Time cost 2659353
  PLT24 Time cost 2895083
  PLT32 Time cost 2657525

  PLT16 Time cost 2680843
  PLT24 Time cost 2867365
  PLT32 Time cost 2676404

  PLT16 Time cost 2661513
  PLT24 Time cost 2931767
  PLT32 Time cost 2653236

  PLT16 Time cost 2680713
  PLT24 Time cost 2872741
  PLT32 Time cost 2678326

  PLT16 Time cost 2661513
  PLT24 Time cost 2892384
  PLT32 Time cost 2656584

  PLT16 Time cost 2683864
  PLT24 Time cost 2887594
  PLT32 Time cost 2658880

  PLT16 Time cost 2682057
  PLT24 Time cost 2911524
  PLT32 Time cost 2679250

  PLT16 Time cost 2654767
  PLT24 Time cost 2915021
  PLT32 Time cost 2659874

main.16 is linked with the regular lld. main.24 is linked with a modified lld whose output PLT size is 24 bytes. main.32 is linked with a modified lld whose output PLT size is 32 bytes.

As you can see, even with your synthesized microbenchmark, I couldn't see the difference of 16-byte PLT and 32-byte PLT. With that result, I have to say that I'm even more skeptical of the 2-PLT scheme than before.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59780/new/

https://reviews.llvm.org/D59780