<div dir="ltr"><div><div>Hi Harold -<br><br></div>I think it would be better to implement this as a DAG combine rather than an IR combine. That way, you can create an "ISD::ROTR" node directly. You can also bail out on the transform if a target doesn't support ROTR by checking something like:<br>  bool HasROTR = TLI.isOperationLegalOrCustom(ISD::ROTR, VT);<br><br></div>That's from DAGCombiner::MatchRotate() -<br><div><br><a href="http://reviews.llvm.org/diffusion/L/browse/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp;256564$3977">http://reviews.llvm.org/diffusion/L/browse/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp;256564$3977</a><br><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Dec 27, 2015 at 4:59 PM, Harold Aptroot via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The pattern x % constant == 0 (where the constant is odd) can be optimized<br>

more than it currently is, using some modular arithmetic trick, to:<br>

<br>

   x * inv(constant) <=u MaxValue / constant<br>

<br>

This patch sort of does that (hopefully, I did test it and it seemed to work,<br>

and please give me feedback on it, it's only my first try at an llvm patch),<br>

however I ran into a problem. This trick can in theory be easily generalized<br>

to even `constant` (let's ignore powers of two though, they can be done better<br>

the obvious way), but to do so a bitwise rotation is needed.<br>

But when I build it from two shifts:<br>

<br>

       Value *Mul = Builder->CreateMul(A, Builder->getInt(Multiplier));<br>

       Value *RShift = Builder->CreateLShr(Mul, Builder->getInt32(Ctz));<br>

       Value *LShift = Builder->CreateShl(Mul, Builder->getInt32(W - Ctz));<br>

       Rotated = Builder->CreateOr(RShift, LShift);<br>

<br>

Somewhere along the line, the left shift gets merged with the multiplication,<br>

resulting in two multiplications (and some extra stuff), and no rotation.<br>

Having two multiplications mostly defeats this optimization, though it's still<br>

a bit less code than what would be generated currently.<br>

It would be really nice if it could reliably generate something like this on x86:<br>

<br>

   imul eax, edi, foo<br>

   ror eax, bar<br>

   cmp eax, baz<br>

   ; use flags<br>

<br>

How should this be handled?<br>

<br>

(the attached patch is limited to odd divisors to avoid this problem)<br>

<br>

Regards,<br>

Harold <br>_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

<br></blockquote></div><br></div>