<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 12 May 2017, at 1:53 PM, Michael Clark <<a href="mailto:michaeljclark@mac.com" class="">michaeljclark@mac.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class=""><br class=""><blockquote type="cite" class="">On 12 May 2017, at 1:48 PM, Tim Northover <<a href="mailto:t.p.northover@gmail.com" class="">t.p.northover@gmail.com</a>> wrote:<br class=""><br class="">On 11 May 2017 at 18:30, Michael Clark via llvm-dev<br class=""><<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:<br class=""><blockquote type="cite" class="">I note that on your bug that you have stated that the branch is faster than<br class="">the conditional move. Faster code is a side effect of the fix in this<br class="">particular case.<br class=""></blockquote><br class="">On the contrary: the faster code is pretty much the only reason this<br class="">can happen before the rest of the FENV support lands.<br class=""><br class="">It's been said before, but I'll reiterate: LLVM IR does not model the<br class="">FENV on its instructions. CodeGen and other passes are free to<br class="">de-conditionalize exceptions, remove them, or add spurious ones just<br class="">for the giggles. What LLVM does now is not incorrect.<br class=""></blockquote><br class="">OK. So we are in fact lucky that the correct case is actually faster, and it’s a bug in the predicate lowering i.e. speculative execution and conditional move being slower than a branch.<br class=""><br class="">I’m curious how the select lowering models the cost, when I figure out where to look in the codebase…<br class=""></div></div></blockquote></div><div class=""><br class=""></div><div class="">Just as a few data points on the x86 branch predictor.</div><div class=""><br class=""></div><div class="">I have 6 small integer benchmarks that I am using to test a RISC-V to x86 binary translator and I was using perf last night to read the performance counters. I had these stats in my command line history as I was curious about branch predictor accuracy. It seems branch prediction accuracy in all my experiments is > 99%. Note the test programs are compiled by RISC-V GCC. RISC-V has no conditional moves and branch mis-predict latency is only 3 cycles on Rocket, so its also an architecture that prefers branches over predication. We are translating RISC-V branches to x86 branches. We don’t use conditional moves in any of our translations. I believe a predicted branch is just 1 cycle latency on x86. Here is the translator: <a href="http://rv8.io/" class="">http://rv8.io/</a> (BTW  - the RISC-V interpreter rv-sim seems to be a pathological test case for the Clang/LLVM optimiser, with the Clang/LLVM code running at just over half the speed of the GCC generated code, of course the translator is not really affected by the speed of Clang, as we spend most time in the JIT code).</div><div class=""><br class=""></div><br class=""><div class="">$ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-sha512 <br class=""><br class=""> Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-sha512':<br class=""><br class="">     2,386,668,826      cycles                                                     <br class="">     8,226,368,806      instructions              #    3.45  insn per cycle        <br class="">       556,426,385      branches                                                   <br class="">         1,120,630      branch-misses             #    0.20% of all branches       <br class=""><br class="">       0.766480608 seconds time elapsed<br class=""><br class="">$ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-aes <br class=""><br class=""> Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-aes':<br class=""><br class="">     3,390,012,091      cycles                                                     <br class="">     8,165,055,539      instructions              #    2.41  insn per cycle        <br class="">       166,612,327      branches                                                   <br class="">           393,687      branch-misses             #    0.24% of all branches       <br class=""><br class="">       0.999783799 seconds time elapsed<br class=""><br class="">$ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-primes <br class=""><br class=""> Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-primes':<br class=""><br class="">       585,513,229      cycles                                                     <br class="">     1,570,274,312      instructions              #    2.68  insn per cycle        <br class="">       199,550,674      branches                                                   <br class="">         1,373,005      branch-misses             #    0.69% of all branches       <br class=""><br class="">       0.180905897 seconds time elapsed<br class=""><br class="">$ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-miniz <br class=""><br class=""> Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-miniz':<br class=""><br class="">     7,181,383,837      cycles                                                     <br class="">    12,171,106,005      instructions              #    1.69  insn per cycle        <br class="">     1,309,704,230      branches                                                   <br class="">        10,246,710      branch-misses             #    0.78% of all branches       <br class=""><br class="">       2.120649526 seconds time elapsed<br class=""><br class="">$ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-dhrystone <br class=""><br class=""> Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-dhrystone':<br class=""><br class="">     1,705,866,284      cycles                                                     <br class="">     5,902,622,960      instructions              #    3.46  insn per cycle        <br class="">       852,430,738      branches                                                   <br class="">            65,576      branch-misses             #    0.01% of all branches       <br class=""><br class="">       0.530201822 seconds time elapsed<br class=""><br class="">$ perf stat -e cycles,instructions,branches,branch-misses ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-qsort <br class=""><br class=""> Performance counter stats for './build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-qsort':<br class=""><br class="">       951,218,523      cycles                                                     <br class="">     2,060,457,742      instructions              #    2.17  insn per cycle        <br class="">       432,171,433      branches                                                   <br class="">         3,844,290      branch-misses             #    0.89% of all branches       <br class=""><br class="">       0.288089656 seconds time elapsed<br class=""><br class=""></div></body></html>