[PATCH] D59780: Support Intel Control-flow Enforcement Technology
Xiang Zhang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 18 20:20:45 PDT 2019
xiangzhangllvm added a comment.
Hi friends, I write a simple test to test the performance:
To simply test the performance affect I changed the PLT size from 16 Bytes to 24Bytes in LLVM 8.0.0.
[xiangzh1 at scels74 /export/iusers/xiangzh1/LLVM/LLVMORG]$diff llvm800/tools/lld/ELF/Arch/X86_64.cpp llvm800-PLT24/tools/lld/ELF/Arch/X86_64.cpp
67c67
< PltEntrySize = 16;
---
> PltEntrySize = 24; //16-->24
157a158,159
> 0x66, 0x90, //nop
> 0x66, 0x0f, 0x1f, 0x44, 0, 0, // nop
[xiangzh1 at scels74 /export/iusers/xiangzh1/LLVM/LLVMORG]$
Then I write a simple case like following:
Main.c
#include<stdio.h>
#include<time.h>
#define FUNCTION_PLT( ID ) fplt##ID();
int main(){
int i;
clock_t t1, t2;
t1 = clock();
for(i = 0; i < 1000000; i++){
FUNCTION_PLT( 0 )
FUNCTION_PLT( 1 )
FUNCTION_PLT( 2 )
...
FUNCTION_PLT( n )
}
t2 = clock();
printf("Time cost %ld\n", t2-t1 );
return 0;
}
Fplt.c
#define FUNCTION_PLT( ID ) int fplt##ID(){return ID;}
FUNCTION_PLT( 0 )
FUNCTION_PLT( 1 )
FUNCTION_PLT( 2 )
.....
FUNCTION_PLT( n )
After that, I build them with 16Byte PLT-LLD and 24Byte PLT-LLD:
/export/iusers/xiangzh1/LLVM/LLVMORG/build800/bin/clang -c Main.c Fplt.c -w;
/export/iusers/xiangzh1/LLVM/LLVMORG/build800/bin/ld.lld -shared Fplt.o -o Fplt.so;
/export/iusers/xiangzh1/LLVM/LLVMORG/build800-plt24/bin/clang -fuse-ld=lld Main.o Fplt.so -o a24.out;
/export/iusers/xiangzh1/LLVM/LLVMORG/build800/bin/clang -fuse-ld=lld Main.o Fplt.so -o a16.out;
Then I copy them to my personal PC which just only I using it.
The follow text is copied from my personal PC (originally, I not change any data):
xiangzh1 at xiangzh1:~/test/PLT/test$ export LD_LIBRARY_PATH=./
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2924013
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2721932
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2855578
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2330382
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2958377
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2361670
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2509809
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2451191
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2779484
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2551673
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2752218
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2752107
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2584999
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2630829
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 3148183
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2342366
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2678235
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2518696
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2767221
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2712448
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2667558
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2818396
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2740905
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2420335
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a24.out
Time cost 2586176
xiangzh1 at xiangzh1:~/test/PLT/test$ ./a16.out
Time cost 2556211
xiangzh1 at xiangzh1:~/test/PLT/test$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
Stepping: 9
CPU MHz: 845.906
CPU max MHz: 4200.0000
CPU min MHz: 800.0000
BogoMIPS: 7200.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d
xiangzh1 at xiangzh1:~/test/PLT/test$ ls
a16.out a24.d a24.dump a24.out cmd.txt Fplt-bk.c Fplt-bk.o Fplt.c Fplt.o Fplt.so Main-bk.c Main-bk.o Main.c Main.o perf.data perf.data.a24 perf.data.old pp tt
xiangzh1 at xiangzh1:~/test/PLT/test$ head -n 15 Main.c
#include<stdio.h>
#include<time.h>
#define FUNCTION_PLT( ID ) fplt##ID();
int main(){
int i;
clock_t t1, t2;
t1 = clock();
for(i = 0; i < 1000000; i++){
FUNCTION_PLT( 0 )
FUNCTION_PLT( 1 )
FUNCTION_PLT( 2 )
FUNCTION_PLT( 3 )
xiangzh1 at xiangzh1:~/test/PLT/test$ tail -n 10 Main.c
FUNCTION_PLT( 724 )
FUNCTION_PLT( 725 )
FUNCTION_PLT( 726 )
FUNCTION_PLT( 727 )
FUNCTION_PLT( 728 )
}
t2 = clock();
printf("Time cost %ld\n", t2-t1 );
return 0;
}
xiangzh1 at xiangzh1:~/test/PLT/test$ head -n 10 Fplt.c
#define FUNCTION_PLT( ID ) int fplt##ID(){return ID;}
FUNCTION_PLT( 0 )
FUNCTION_PLT( 1 )
FUNCTION_PLT( 2 )
FUNCTION_PLT( 3 )
FUNCTION_PLT( 4 )
FUNCTION_PLT( 5 )
FUNCTION_PLT( 6 )
FUNCTION_PLT( 7 )
FUNCTION_PLT( 8 )
xiangzh1 at xiangzh1:~/test/PLT/test$ tail -n 10 Fplt.c
FUNCTION_PLT( 719 )
FUNCTION_PLT( 720 )
FUNCTION_PLT( 721 )
FUNCTION_PLT( 722 )
FUNCTION_PLT( 723 )
FUNCTION_PLT( 724 )
FUNCTION_PLT( 725 )
FUNCTION_PLT( 726 )
FUNCTION_PLT( 727 )
FUNCTION_PLT( 728 )
xiangzh1 at xiangzh1:~/test/PLT/test$
I found most time the big plt cost more time than small plt file.
This change just base on LLVM8.0.0-LLD, but the the principle is the same.
I also catch the cache miss in main function: F8720325: cache-miss1.png <https://reviews.llvm.org/F8720325>
F8720324: time-cmp.png <https://reviews.llvm.org/F8720324>
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D59780/new/
https://reviews.llvm.org/D59780
More information about the llvm-commits
mailing list