[llvm-dev] Rotates, once again
Manuel Jacob via llvm-dev
llvm-dev at lists.llvm.org
Wed May 16 10:27:41 PDT 2018
On 2018-05-16 00:34, Sanjay Patel via llvm-dev wrote:
> Vectorization goes overboard because the throughput cost model used by
> the
> vectorizers doesn't match the 6 IR instructions that correspond to 1
> x86
> rotate instruction. Instead, we have:
>
> [...]
>
> The broken cost model also affects unrolling and inlining. Size costs
> are
> overestimated for a target that has a rotate instruction.
> This cost problem isn't limited to rotate patterns (it's come up in the
> context of min/max/abs/fma too). But it would be simpler if we had a
> rotate
> intrinsic, and the 6-to-1 margin is the biggest I've seen.
Given that this is a general problem that occurs with other instruction
sequences as well, wouldn't it make more sense to make the cost model
handle more than one instruction, as suggested in PR31274 [1]?
[1] https://bugs.llvm.org/show_bug.cgi?id=31274
In all these cases (rotate, min, max, abs, fma, add-with-overflow, and
probably many more) there's a tradeoff between elaborating them as
simpler IR instructions and modelling them as its own instruction /
intrinsic. A big disadvantage of introducing new instructions /
intrinsics is that all optimizations have to be told about this,
increasing the compiler code base and maintainability burden. On the
other hand, too few instructions can make optimization difficult as well
(in theory, one instruction like "subtract and branch if not equal to
zero" could emulate all the others, but this wouldn't be very helpful
for optimization). Since you put a lot of thought into how to
canonicalize IR, can you eleborate more on this tradeoff? Can we find a
set of criteria to determine whether an instruction pattern should get
an intrinsic or not?
-Manuel
More information about the llvm-dev
mailing list