[llvm] [TailDuplicator] Add a limit on the size of predecessors (PR #78582)
Quentin Dian via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 25 06:28:18 PST 2024
DianQK wrote:
> We should not implement profile-guided optimizations based on hypotheticals. Did you _actually_ run benchmarks with this patch and saw regressions? (How large?) If not, we should do the straightforward thing until there is evidence that something more complex is justified.
I've written down the compilation time in the issue. I simply tried the runtime benchmark.
The `oom_manual.c` file is to replace the default branch by `__builtin_unreachable()`.
`clang -v`:
```
clang version 17.0.6
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /nix/store/j7mhyvnbbm4pk7vriilrc5wvh2kims5p-clang-17.0.6/bin
```
`main.c`:
```c
int src(void) {
return -1;
}
extern int f1(unsigned int *b);
int main(int argc, char **argv) {
int r = argc;
unsigned int b[] = { -1, -2, -3 };
for (int i = 0; i < 1000000; i++) {
r += f1(b);
}
return r;
}
```
`build.sh`:
```sh
clang -O1 oom_manual.c main.c -o oom_manual
clang -O1 oom_manual2.c main.c -o oom_manual2
ls -lh oom_manual oom_manual2
```
output:
```
-rwxr-xr-x 1 dianqk users 56K Jan 25 22:15 oom_manual
-rwxr-xr-x 1 dianqk users 192K Jan 25 22:15 oom_manual2
```
`hyperfine.sh`:
```sh
hyperfine -i -N --runs 200 --warmup 50 ./oom_manual ./oom_manual2
```
output:
```
Benchmark 1: ./oom_manual
Time (mean ± σ): 12.4 ms ± 0.0 ms [User: 12.2 ms, System: 0.2 ms]
Range (min … max): 12.4 ms … 12.6 ms 200 runs
Benchmark 2: ./oom_manual2
Time (mean ± σ): 14.9 ms ± 0.3 ms [User: 14.7 ms, System: 0.2 ms]
Range (min … max): 14.8 ms … 18.8 ms 200 runs
Summary
./oom_manual ran
1.20 ± 0.02 times faster than ./oom_manual2
```
`perf.sh`:
```bash
function run_perf() {
echo "perf stat $1"
perf stat -x \; \
-e instructions \
-e instructions:u \
-e cycles \
-e task-clock \
-e branches \
-e branch-misses \
$1
}
run_perf ./oom_manual
run_perf ./oom_manual2
```
output:
```
perf stat ./oom_manual
251193019;;instructions:u;12839070;100.00;3.63;insn per cycle
251193019;;instructions:u;12839070;100.00;3.63;insn per cycle
69202217;;cycles:u;12839070;100.00;5.390;GHz
12.84;msec;task-clock:u;12839070;100.00;0.981;CPUs utilized
56047969;;branches:u;12839070;100.00;4.365;G/sec
2876;;branch-misses:u;12839070;100.00;0.01;of all branches
perf stat ./oom_manual2
424193026;;instructions:u;15665683;100.00;5.12;insn per cycle
424193026;;instructions:u;15665683;100.00;5.12;insn per cycle
82892122;;cycles:u;15665683;100.00;5.291;GHz
15.67;msec;task-clock:u;15665683;100.00;0.981;CPUs utilized
32047975;;branches:u;15665683;100.00;2.046;G/sec
2827;;branch-misses:u;15665683;100.00;0.01;of all branches
```
I am trying to change the code to see the results of different scenarios.
https://github.com/llvm/llvm-project/pull/78582
More information about the llvm-commits
mailing list