[llvm-dev] Rotates, once again

Mon Jul 2 09:27:43 PDT 2018

I'm guessing nobody has started implementing any of the suggested rotate
functionality since there are still open questions, but let me know if I'm
wrong.

We're still getting patches that try to work around the current limitations
( https://reviews.llvm.org/D48705 ), so we should move forward since we've
approximated/justified the cost and benefits.

Let's settle on the intrinsic definition(s).

1. There was a suggestion to generalize rotate to a "valign" or "double
shift" (that's what x86 means with its poorly worded "double precision
shift"). How does that work with vector types? The options are a full
vector-size shift or a per-element shift. If it's the full vector, do we
still want/need a specialized rotate intrinsic for per-element? If it's
per-element, do we still want/need the other form for a full vector?

2. What is the behavior for a shift/rotate amount that is equal or greater
than the bit-width of the operand (or the bit width of a vector element
type?)? Can we modulo that operand by the bit width, or does that not map
well to the hardware semantics?

On Thu, May 17, 2018 at 5:23 PM, John Regehr <regehr at cs.utah.edu> wrote:

> Thanks Sanjay!
>
> At this point the cost/benefit tradeoff for rotate intrinsics seems pretty
> good.
>
> John
>
>
> On 05/17/2018 11:14 AM, Sanjay Patel wrote:
>
>> A rotate intrinsic should be relatively close in cost/complexity to the
>> existing bswap.
>>
>> A grep of intrinsic::bswap says we'd probably add code in:
>> InstCombine
>> InstructionSimplify
>> ConstantFolding
>> DemandedBits
>> ValueTracking
>> VectorUtils
>> SelectionDAGBuilder
>>
>> But I don't think it's fair to view those additions as pure added cost.
>> As an example, consider that we have to add hacks to EarlyCSE to recognize
>> multi-IR-instruction min/max/abs patterns. Intrinsics just work as-is
>> there. So if you search for 'matchSelectPattern', you get an idea (I see 32
>> hits in 10 files) of the cost of *not* having intrinsics for those
>> operations that we've decided are not worthy of intrinsics.
>>
>>
>> On Wed, May 16, 2018 at 2:20 PM, John Regehr via llvm-dev <
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>     On 5/16/18 1:58 PM, Sanjay Patel via llvm-dev wrote:
>>
>>         An informal metric might be: if the operation is supported as a
>>         primitive op or built-in in source languages and it is supported
>>         as a single target instruction, can we guarantee that 1-to-1
>>         translation through optimization?
>>
>>
>>     It seems perfectly reasonable for LLVM users to expect this to
>>     happen reliably.
>>
>>     I'd like to take a look at the other side of the equation: the cost
>>     of adding a new intrinsic in terms of teaching passes to see through
>>     it, so we don't lose optimizations that worked before the intrinsic
>>     was added.
>>
>>     For example, clearly ValueTracking needs a few lines added so that
>>     computeKnownBits and friends don't get stopped by a rotate. Anyone
>>     have a reasonably complete list of files that need similar changes?
>>
>>     John
>>
>>     _______________________________________________
>>     LLVM Developers mailing list
>>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180702/f18fe963/attachment-0001.html>