[llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
Michael Clark via llvm-dev
llvm-dev at lists.llvm.org
Thu Apr 20 19:23:08 PDT 2017
> On 21 Apr 2017, at 12:30 PM, Kaylor, Andrew <andrew.kaylor at intel.com> wrote:
>
> I think it’s generally true that whenever branches can reliably be predicted branching is faster than a cmov that involves speculative execution, and I would guess that your assessment regarding looping on input values is probably correct.
>
> I believe the code that actually creates most of the transformation you’re interested in here is in SelectionDAGLegalize::ExpandNode() in LegalizeDAG.cpp. The X86 backend sets a table entry indicating that FP_TO_UINT should be expanded for these value types, but the actual expansion is in target-independent code. This is what it looks like in the version I last fetched:
>
> case ISD::FP_TO_UINT: {
> SDValue True, False;
> EVT VT = Node->getOperand(0).getValueType();
> EVT NVT = Node->getValueType(0);
> APFloat apf(DAG.EVTToAPFloatSemantics(VT),
> APInt::getNullValue(VT.getSizeInBits()));
> APInt x = APInt::getSignBit(NVT.getSizeInBits());
> (void)apf.convertFromAPInt(x, false, APFloat::rmNearestTiesToEven);
> Tmp1 = DAG.getConstantFP(apf, dl, VT);
> Tmp2 = DAG.getSetCC(dl, getSetCCResultType(VT),
> Node->getOperand(0),
> Tmp1, ISD::SETLT);
> True = DAG.getNode(ISD::FP_TO_SINT, dl, NVT, Node->getOperand(0));
> // TODO: Should any fast-math-flags be set for the FSUB?
> False = DAG.getNode(ISD::FP_TO_SINT, dl, NVT,
> DAG.getNode(ISD::FSUB, dl, VT,
> Node->getOperand(0), Tmp1));
> False = DAG.getNode(ISD::XOR, dl, NVT, False,
> DAG.getConstant(x, dl, NVT));
> Tmp1 = DAG.getSelect(dl, NVT, Tmp2, True, False);
> Results.push_back(Tmp1);
> break;
> }
>
> The tricky bit here is that this code is asking for a Select and then something else will decide whether that select should be implemented as a branch or a cmov.
Good. I had found ISD::FP_TO_UINT but had not found the target-independent code as I was digging in llvm/lib/Target/X86. I had in fact just started looking at the target-independent code after realising it was likely not target specific. This issue could potentially effect any hard float target with IEEE-754 accrued exceptions and conditional moves as the unconditional FSUB will set INEXACT.
I can see comments in lib/Target/X86//X86ISelLowering.cpp LowerSELECT regarding selection of branch or cmov and wonder if the DAG can be matched there or whether the fix is in target-independent code.
It seems like a SELECT node with any sufficiently large number of child nodes should use a branch instead of a conditional move. I wonder about the cost model for predicate logic and cmov. Modern branch predictors are actually pretty good so if LLVM X86 is using predication when the cost of a branch is less it could result in a loss of performance. I’m now curious about more general possibility of controlling whether SELECT is lowered to branches or predication using cmov. Can this be controlled? Anecdotally, the RISC-V CPU architects recommend branches over predicate logic as in their case (Rocket) branch mis-predict is only 3 cycles.
BTW - semi off-topic. The RISC-V interpreter I am working on seems to be a pathological test case for the LLVM/Clang optimiser (-O3) compared with GCC (-O3) with LLVM/Clang producing code that runs nearly twice as slow as GCC. I don’t know exactly what I’ve done for this to happen; too many switch statements I suspect. Branchy code versus predication perhaps? Branchiness might also explain GCC’s lead on SciMark Monte Carlo assuming Monte Carlo is branchy. Now I am guessing, although after some googling I see that clang generates x86_64 asm that prefers predication versus branches in gcc. Note this CPU simulator test requires the RISC-V GCC toolchain to be installed.
Here is a step by step for anyone interested in a pathological optimiser test case for Clang:
- https://github.com/riscv/riscv-gnu-toolchain/ <https://github.com/riscv/riscv-gnu-toolchain/>
- https://github.com/michaeljclark/riscv-meta/ <https://github.com/michaeljclark/riscv-meta/>
$ git clone https://github.com/riscv/riscv-gnu-toolchain.git
$ git clone https://github.com/michaeljclark/riscv-meta.git
$ cd riscv-gnu-toolchain
$ export RISCV=/opt/riscv-gnu-toolchain
$ ./configure --prefix=$RISCV
$ make
$ cd ..
$ cd riscv-meta
$ git submodule update --init --recursive
$ export RISCV=/opt/riscv-gnu-toolchain
$ make -j4 CXX=g++ V=1
$ make test-build
$ time ./build/linux_x86_64/bin/rv-sim build/riscv64-unknown-elf/bin/test-sha512
ebdd6f20865ff41e3613b633b93c9b89c15d58fd9d64497f5b22554a7fe33757357cfa622f6fb4f40beadc02d18539ecd79e2da126b662839d296c41acbc2
real 0m28.280s
user 0m28.280s
sys 0m0.000s
$ make clean
$ make -j4 CXX=clang++-3.9 V=1
$ make test-build
$ time ./build/linux_x86_64/bin/rv-sim build/riscv64-unknown-elf/bin/test-sha512
ebdd6f20865ff41e3613b633b93c9b89c15d58fd9d64497f5b22554a7fe33757357cfa622f6fb4f40beadc02d18539ecd79e2da126b662839d296c41acbc2
real 0m52.533s
user 0m52.532s
sys 0m0.000s
$ g++ --version
g++ (Debian 6.3.0-6) 6.3.0 20170205
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ clang++-3.9 --version
clang version 3.9.0-6 (tags/RELEASE_390/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
There is also a RISC-V -> x86_64 JIT engine (x86_64 JIT currently for the RISC-V integer ISAt, hard float coming soon…):
$ time ./build/linux_x86_64/bin/rv-jit build/riscv64-unknown-elf/bin/test-sha512
ebdd6f20865ff41e3613b633b93c9b89c15d58fd9d64497f5b22554a7fe33757357cfa622f6fb4f40beadc02d18539ecd79e2da126b662839d296c41acbc2
real 0m0.838s
user 0m0.840s
sys 0m0.000s
Clang and GCC produce typical native code that performs the same.
$ clang -O3 src/test/test-sha512.c -o test-sha512
$ time ./test-sha512
ebdd6f20865ff41e3613b633b93c9b89c15d58fd9d64497f5b22554a7fe33757357cfa622f6fb4f40beadc02d18539ecd79e2da126b662839d296c41acbc2
real 0m0.285s
user 0m0.280s
sys 0m0.004s
$ gcc -O3 src/test/test-sha512.c -o test-sha512
$ time ./test-sha512
ebdd6f20865ff41e3613b633b93c9b89c15d58fd9d64497f5b22554a7fe33757357cfa622f6fb4f40beadc02d18539ecd79e2da126b662839d296c41acbc2
real 0m0.285s
user 0m0.284s
sys 0m0.000s
Michael.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170421/f47e7108/attachment.html>
More information about the llvm-dev
mailing list