[llvm-dev] Rotates, once again

Wed May 16 10:27:41 PDT 2018

On 2018-05-16 00:34, Sanjay Patel via llvm-dev wrote:
> Vectorization goes overboard because the throughput cost model used by 
> the
> vectorizers doesn't match the 6 IR instructions that correspond to 1 
> x86
> rotate instruction. Instead, we have:
> 
> [...]
> 
> The broken cost model also affects unrolling and inlining. Size costs 
> are
> overestimated for a target that has a rotate instruction.
> This cost problem isn't limited to rotate patterns (it's come up in the
> context of min/max/abs/fma too). But it would be simpler if we had a 
> rotate
> intrinsic, and the 6-to-1 margin is the biggest I've seen.

Given that this is a general problem that occurs with other instruction 
sequences as well, wouldn't it make more sense to make the cost model 
handle more than one instruction, as suggested in PR31274 [1]?

[1] https://bugs.llvm.org/show_bug.cgi?id=31274

In all these cases (rotate, min, max, abs, fma, add-with-overflow, and 
probably many more) there's a tradeoff between elaborating them as 
simpler IR instructions and modelling them as its own instruction / 
intrinsic.  A big disadvantage of introducing new instructions / 
intrinsics is that all optimizations have to be told about this, 
increasing the compiler code base and maintainability burden.  On the 
other hand, too few instructions can make optimization difficult as well 
(in theory, one instruction like "subtract and branch if not equal to 
zero" could emulate all the others, but this wouldn't be very helpful 
for optimization).  Since you put a lot of thought into how to 
canonicalize IR, can you eleborate more on this tradeoff?  Can we find a 
set of criteria to determine whether an instruction pattern should get 
an intrinsic or not?

-Manuel