<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 12 May 2017, at 1:53 PM, Michael Clark <<a href="mailto:michaeljclark@mac.com" class="">michaeljclark@mac.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class=""><br class=""><blockquote type="cite" class="">On 12 May 2017, at 1:48 PM, Tim Northover <<a href="mailto:t.p.northover@gmail.com" class="">t.p.northover@gmail.com</a>> wrote:<br class=""><br class="">On 11 May 2017 at 18:30, Michael Clark via llvm-dev<br class=""><<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:<br class=""><blockquote type="cite" class="">I note that on your bug that you have stated that the branch is faster than<br class="">the conditional move. Faster code is a side effect of the fix in this<br class="">particular case.<br class=""></blockquote><br class="">On the contrary: the faster code is pretty much the only reason this<br class="">can happen before the rest of the FENV support lands.<br class=""><br class="">It's been said before, but I'll reiterate: LLVM IR does not model the<br class="">FENV on its instructions. CodeGen and other passes are free to<br class="">de-conditionalize exceptions, remove them, or add spurious ones just<br class="">for the giggles. What LLVM does now is not incorrect.<br class=""></blockquote><br class="">OK. So we are in fact lucky that the correct case is actually faster, and it’s a bug in the predicate lowering i.e. speculative execution and conditional move being slower than a branch.<br class=""><br class="">I’m curious how the select lowering models the cost, when I figure out where to look in the codebase…<br class=""></div></div></blockquote></div><div class=""><br class=""></div><div class="">Just as a few data points on the x86 branch predictor.</div><div class=""><br class=""></div><div class="">I have 6 small integer benchmarks that I am using to test a RISC-V to x86 binary translator and I was using perf last night to read the performance counters. I had these stats in my command line history as I was curious about branch predictor accuracy. It seems branch prediction accuracy in all my experiments is > 99%. Note the test programs are compiled by RISC-V GCC. RISC-V has no conditional moves and branch mis-predict latency is only 3 cycles on Rocket, so its also an architecture that prefers branches over predication. We are translating RISC-V branches to x86 branches. We don’t use conditional moves in any of our translations. I believe a predicted branch is just 1 cycle latency on x86. Here is the translator: <a href="http://rv8.io/" class="">http://rv8.io/</a> (BTW - the RISC-V interpreter rv-sim seems to be a pathological test case for the Clang/LLVM optimiser, with the Clang/LLVM code running at just over half the speed of the GCC generated code, of course the translator is not really affected by the speed of Clang, as we spend most time in the JIT code).</div><div class=""><br class=""></div><br class=""><div class="">$ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-sha512 <br class=""><br class=""> Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-sha512':<br class=""><br class=""> 2,386,668,826 cycles <br class=""> 8,226,368,806 instructions # 3.45 insn per cycle <br class=""> 556,426,385 branches <br class=""> 1,120,630 branch-misses # 0.20% of all branches <br class=""><br class=""> 0.766480608 seconds time elapsed<br class=""><br class="">$ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-aes <br class=""><br class=""> Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-aes':<br class=""><br class=""> 3,390,012,091 cycles <br class=""> 8,165,055,539 instructions # 2.41 insn per cycle <br class=""> 166,612,327 branches <br class=""> 393,687 branch-misses # 0.24% of all branches <br class=""><br class=""> 0.999783799 seconds time elapsed<br class=""><br class="">$ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-primes <br class=""><br class=""> Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-primes':<br class=""><br class=""> 585,513,229 cycles <br class=""> 1,570,274,312 instructions # 2.68 insn per cycle <br class=""> 199,550,674 branches <br class=""> 1,373,005 branch-misses # 0.69% of all branches <br class=""><br class=""> 0.180905897 seconds time elapsed<br class=""><br class="">$ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-miniz <br class=""><br class=""> Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-miniz':<br class=""><br class=""> 7,181,383,837 cycles <br class=""> 12,171,106,005 instructions # 1.69 insn per cycle <br class=""> 1,309,704,230 branches <br class=""> 10,246,710 branch-misses # 0.78% of all branches <br class=""><br class=""> 2.120649526 seconds time elapsed<br class=""><br class="">$ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-dhrystone <br class=""><br class=""> Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-dhrystone':<br class=""><br class=""> 1,705,866,284 cycles <br class=""> 5,902,622,960 instructions # 3.46 insn per cycle <br class=""> 852,430,738 branches <br class=""> 65,576 branch-misses # 0.01% of all branches <br class=""><br class=""> 0.530201822 seconds time elapsed<br class=""><br class="">$ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-qsort <br class=""><br class=""> Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-qsort':<br class=""><br class=""> 951,218,523 cycles <br class=""> 2,060,457,742 instructions # 2.17 insn per cycle <br class=""> 432,171,433 branches <br class=""> 3,844,290 branch-misses # 0.89% of all branches <br class=""><br class=""> 0.288089656 seconds time elapsed<br class=""><br class=""></div></body></html>