[llvm-dev] Rotates, once again

Mon Jul 2 15:16:17 PDT 2018

I also agree that the per-element rotate for vectors is what we want for
this intrinsic.

So I have this so far:

declare i32 @llvm.catshift.i32(i32 %a, i32 %b, i32
%shift_amount)declare <2 x i32> @llvm.catshift.v2i32(<2 x i32> %a, <2
x i32> %b, <2 x i32> %shift_amount)

For scalars, @llvm.catshift concatenates %a and %b, shifts the concatenated
value right by the number of bits specified by %shift_amount modulo the
bit-width, and truncates to the original bit-width.
For vectors, that operation occurs for each element of the vector:
   result[i] = trunc(concat(a[i], b[i]) >> c[i])
If %a == %b, this is equivalent to a bitwise rotate right. Rotate left may
be implemented by subtracting the shift amount from the bit-width of the
scalar type or vector element type.

On Mon, Jul 2, 2018 at 2:37 PM, Fabian Giesen <fabiang at radgametools.com>
wrote:

> 1. I'm not sure what you mean by "full vector" here - using the same shift
> distance for all lanes (as opposed to per-lane distances), or doing a
> treat-the-vector-as-bag-of-bits shift that doesn't have any internal lane
> boundaries? If the latter, that doesn't really help you much with
> implementing a per-lane rotate.
>
> I think the most useful generalization of a vector funnel shift in this
> context is lane-wise
>
>    result[i] = trunc(concat(a[i], b[i]) >> c[i])
>
> (or the equivalent for a left shift); the special case a==b is a rotate.
>
> 2. For operand sizes that have native rotate instructions, at least x86,
> x86-64, ARM A32/T32 and AArch64 A64 agree that rotate distances are modulo
> the operand width. I believe PPC and MIPS do the same but am not sure (it's
> been a while), no clue about other architectures.
>
> It certainly seems the most natural way to define it, since rotates are
> cyclic to begin with.
>
> 8- and 16-bit rotates will need to be lowered into multi-instruction
> sequences on most targets (x86 and x86-64 can do them directly, but
> RISC-lineage archs usually don't have rotates at smaller than word size).
> Having explicit modulo semantics might end up forcing an explicit extra AND
> there, so that's an extra cost there, but it would certainly be nice to
> have the rotate definition be total.
>
> -Fabian
>
> On 07/02/2018 09:27 AM, Sanjay Patel wrote:
>
>> I'm guessing nobody has started implementing any of the suggested rotate
>> functionality since there are still open questions, but let me know if I'm
>> wrong.
>>
>> We're still getting patches that try to work around the current
>> limitations (https://reviews.llvm.org/D48705 <
>> https://reviews.llvm.org/D48705> ), so we should move forward since
>> we've approximated/justified the cost and benefits.
>>
>> Let's settle on the intrinsic definition(s).
>>
>> 1. There was a suggestion to generalize rotate to a "valign" or "double
>> shift" (that's what x86 means with its poorly worded "double precision
>> shift"). How does that work with vector types? The options are a full
>> vector-size shift or a per-element shift. If it's the full vector, do we
>> still want/need a specialized rotate intrinsic for per-element? If it's
>> per-element, do we still want/need the other form for a full vector?
>>
>> 2. What is the behavior for a shift/rotate amount that is equal or
>> greater than the bit-width of the operand (or the bit width of a vector
>> element type?)? Can we modulo that operand by the bit width, or does that
>> not map well to the hardware semantics?
>>
>> On Thu, May 17, 2018 at 5:23 PM, John Regehr <regehr at cs.utah.edu <mailto:
>> regehr at cs.utah.edu>> wrote:
>>
>>     Thanks Sanjay!
>>
>>     At this point the cost/benefit tradeoff for rotate intrinsics
>>     seems pretty good.
>>
>>     John
>>
>>
>>     On 05/17/2018 11:14 AM, Sanjay Patel wrote:
>>
>>         A rotate intrinsic should be relatively close in
>>         cost/complexity to the existing bswap.
>>
>>         A grep of intrinsic::bswap says we'd probably add code in:
>>         InstCombine
>>         InstructionSimplify
>>         ConstantFolding
>>         DemandedBits
>>         ValueTracking
>>         VectorUtils
>>         SelectionDAGBuilder
>>
>>         But I don't think it's fair to view those additions as pure
>>         added cost. As an example, consider that we have to add hacks
>>         to EarlyCSE to recognize multi-IR-instruction min/max/abs
>>         patterns. Intrinsics just work as-is there. So if you search
>>         for 'matchSelectPattern', you get an idea (I see 32 hits in 10
>>         files) of the cost of *not* having intrinsics for those
>>         operations that we've decided are not worthy of intrinsics.
>>
>>
>>         On Wed, May 16, 2018 at 2:20 PM, John Regehr via llvm-dev
>>         <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>         <mailto:llvm-dev at lists.llvm.org
>>
>>         <mailto:llvm-dev at lists.llvm.org>>> wrote:
>>
>>             On 5/16/18 1:58 PM, Sanjay Patel via llvm-dev wrote:
>>
>>                 An informal metric might be: if the operation is
>>         supported as a
>>                 primitive op or built-in in source languages and it is
>>         supported
>>                 as a single target instruction, can we guarantee that
>>         1-to-1
>>                 translation through optimization?
>>
>>
>>             It seems perfectly reasonable for LLVM users to expect this to
>>             happen reliably.
>>
>>             I'd like to take a look at the other side of the equation:
>>         the cost
>>             of adding a new intrinsic in terms of teaching passes to
>>         see through
>>             it, so we don't lose optimizations that worked before the
>>         intrinsic
>>             was added.
>>
>>             For example, clearly ValueTracking needs a few lines added
>>         so that
>>             computeKnownBits and friends don't get stopped by a
>>         rotate. Anyone
>>             have a reasonably complete list of files that need similar
>>         changes?
>>
>>             John
>>
>>             _______________________________________________
>>             LLVM Developers mailing list
>>         llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>         <mailto:llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
>>         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>             <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180702/fbf41a24/attachment.html>