[PATCH] D103799: [CostModel] Express cost(urem) as cost(div+mul+sub) when set to Expand.
Sander de Smalen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 6 09:26:49 PDT 2021
sdesmalen added a comment.
I had another look at the costs for X86.
These are a factor three bigger because the cost of a scalar REM is now calculated to be expanded to div+sub+mul rather than the default `1` for 'unknown', which is then multiplied in `X86TTIImpl::getArithmeticInstrCost` by an arbitrary `20 * LT.first * LT.second.getVectorNumElements()`. The comment of that code (https://github.com/llvm/llvm-project/blob/6f3f9535fcafcde11d3b3ef72fdc0f357813e9da/llvm/lib/Target/X86/X86TargetTransformInfo.cpp#L1044) says:
// It is not a good idea to vectorize division. We have to scalarize it and
// in the process we will often end up having to spilling regular
// registers. The overhead of division is going to dominate most kernels
// anyways so try hard to prevent vectorization of division - it is
// generally a bad idea. Assume somewhat arbitrarily that we have to be able
// to hide "20 cycles" for each lane.
The options are to:
1. Tweak this number '20' to keep the cost similar to what they are now (that would require special-casing the code in X86TTI for UREM/SREM, since the condition is currently bundled with SDIV/UDIV).
2. Leave it as-is, since the values are already largely arbitrary an arbitrary number and the higher number satisfies the goal of preventing vectorization of division.
Does anyone have a suggestion and/or preference?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D103799/new/
https://reviews.llvm.org/D103799
More information about the llvm-commits
mailing list