[llvm-dev] Rotates, once again

David Jones via llvm-dev llvm-dev at lists.llvm.org
Mon May 14 06:31:25 PDT 2018


It might be worth generalizing this to an arbitrary bit-select.

My application needs to extract contiguous bits of multi-precision numbers.
e.g. given a 128-bit number stored in an array a[2] and we want to select
64 bits starting from position N:

    result = (a[1] << (64-N)) | (a[0] >> N);

which is basically the same as your rotate if a[1] == a[0].  Some machines
(e.g. x86 and PA-RISC) have this instruction in hardware. On x86 this is a
double-edged sword, as SHLD/SHRD may be microcoded and it may be faster
just to expand to 2 shifts and an OR. Even so, it might help if I could
encode this more directly.


On Mon, May 14, 2018 at 4:53 AM, Fabian Giesen via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi everyone!
>
> I recently ran into some interesting issues with generation of rotate
> instructions - the details are in the bug tracker (
> https://bugs.llvm.org/show_bug.cgi?id=37387 and related bugs) for those
> interested - and it brought up the issue of rotates in the IR again. Now
> this is a proposal that has been made (and been rejected) several times,
> but I've been told that this time round we ran into issues not previously
> considered, and that I should bring the topic up on llvm-dev.
>
> I'll try to summarize the status quo first and then bring up the tricky
> cases that might tip the balance in favor of rotate intrinsics.
>
> Rotates
> =======
>
> Rotates, or circular shifts, are bit shifts where bits "dropped out" on
> one side of the operand are shifted back in on the other.
>
> They occur less frequently in real-world code than regular (non-circular)
> bit shifts, but are nevertheless directly supported in most popular ISAs.
> They are useful primitives for cryptography, non-cryptographic hashing,
> pseudo-random number generation, bit I/O, and probably other cases as well.
>
> They do not have a direct representation in C/C++, but several compilers
> support them via intrinsics (including MSVC, which Clang-CL tries to be
> compatible to), and some languages such as Rust include them explicitly.
>
> The status quo
> ==============
>
> Rotates by compile-time constant amounts are fairly easy to express in
> C-like languages by ORing together two opposite-direction shifts. For
> example, rotating a uint32_t  x left by 3 positions in C boils down to
>
>   (x << 3) | (x >> 29)
>
> which is simple enough. The backends with rotate support detect this type
> of pattern at the DAG stage and map it to a rotate instruction.
>
> This, in a nutshell, is the argument against a direct rotate in the IR:
> the existing primitives are sufficient to model it, and doing so allows
> LLVM to leverage transforms and analyses such as SimplifyDemandedBits that
> know how to deal with the more common regular shifts to work on bit rotate
> expressions as well. Adding a new primitive would mean that a whole bunch
> of optimization passes would need to know how to deal with it (or else they
> might lose on optimization opportunities).
>
> Variable-distance rotates
> =========================
>
> Things become murkier when we consider not just rotates by a compile-time
> constant, but also rotates where the amount is determined at run time.
>
> For one thing, it's not immediately obvious how to write a well-defined
> corresponding C expression: the most "natural" version (for a 32-bit left
> rotate) is
>
>   (x << k) | (x >> (32 - k))
>
> but this is undefined for k==0 due to the right-shift of a 32-bit value by
> 32. That said, both GCC and Clang will generally compile this particular
> expression into a rotate-left as of this writing, but having UB is
> unfortunate. A trickier variant that is always well-defined is
>
>   (x << (k & 31)) | (x >> (-k & 31))
>
> but this is a substantially more complicated sub-dag  than the OR of two
> shifts you would see for a compile-time constant rotate distance, and there
> are more opportunities for things to go wrong along the way. Bug 37387
> contains a few examples, the simplest of which was given by Sanjay:
>
> void rotateInLoop(unsigned *x, unsigned N, unsigned *a, int b) {
>   for (unsigned i = 0; i < N; ++i)
>     x[ (a[i] >> b) | (a[i] << (32 - b)) ] = i; // shift amt is loop
> invariant
> }
>
> "32 - b" is loop-invariant and gets hoisted, and what's left of the
> expression doesn't match a known "rotate" pattern (since the DAG can't see
> across BB boundaries at the moment), so instead we get two shifts and an OR
> inside the loop.
>
> The counterargument here was that GlobalISel should eventually fix this
> particular problem, but nevertheless, as of today, LICM breaks recognition
> of the rotate pattern in this (very simple) loop, and there is reason to
> believe that other transforms (existing or forthcoming) might well break
> the pattern as well; the issues referenced in the linked bug illustrate
> some of the existing failure modes.
>
> This situation is frustrating for users of variable-distance rotates;
> right now, there simply is no way to write code in a source language that
> will reliably turn into rotates; you either have to watch the generated
> code closely, or just throw up your hands and use platform-dependent inline
> assembly. (It is assumed that the code in question is in a hot spot.)
>
> The proposal
> ============
>
> The idea would be to add rotate intrinsics, primarily intended to be used
> for the variable-distance case, which is otherwise fragile.
>
> Frontends can emit them directly (for source language-level intrinsics, or
> when matched from "known good" rotate patterns) and the backends can match
> them and select appropriate instructions.
>
> For the constant-distance case, the preferred form would still be the
> OR-of-shifts variant, which plays nice with SimplifyDemandedBits and is
> known to existing optimization passes.
>
> SDB does not currently do anything with shifts by non-constant amounts
> anyway; and, for the use cases in question, a rotate that is not touched by
> other optimization passes is still preferable to a more complex expression
> that can be partially optimized, but then ends up in a form that the
> backend doesn't know how to turn back into a rotate, which is a net loss.
>
> Thoughts?
>
> -Fabian
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180514/98e876e3/attachment.html>


More information about the llvm-dev mailing list