[llvm-dev] Rotates, once again
Sanjay Patel via llvm-dev
llvm-dev at lists.llvm.org
Mon Jul 2 15:16:17 PDT 2018
I also agree that the per-element rotate for vectors is what we want for
this intrinsic.
So I have this so far:
declare i32 @llvm.catshift.i32(i32 %a, i32 %b, i32
%shift_amount)declare <2 x i32> @llvm.catshift.v2i32(<2 x i32> %a, <2
x i32> %b, <2 x i32> %shift_amount)
For scalars, @llvm.catshift concatenates %a and %b, shifts the concatenated
value right by the number of bits specified by %shift_amount modulo the
bit-width, and truncates to the original bit-width.
For vectors, that operation occurs for each element of the vector:
result[i] = trunc(concat(a[i], b[i]) >> c[i])
If %a == %b, this is equivalent to a bitwise rotate right. Rotate left may
be implemented by subtracting the shift amount from the bit-width of the
scalar type or vector element type.
On Mon, Jul 2, 2018 at 2:37 PM, Fabian Giesen <fabiang at radgametools.com>
wrote:
> 1. I'm not sure what you mean by "full vector" here - using the same shift
> distance for all lanes (as opposed to per-lane distances), or doing a
> treat-the-vector-as-bag-of-bits shift that doesn't have any internal lane
> boundaries? If the latter, that doesn't really help you much with
> implementing a per-lane rotate.
>
> I think the most useful generalization of a vector funnel shift in this
> context is lane-wise
>
> result[i] = trunc(concat(a[i], b[i]) >> c[i])
>
> (or the equivalent for a left shift); the special case a==b is a rotate.
>
> 2. For operand sizes that have native rotate instructions, at least x86,
> x86-64, ARM A32/T32 and AArch64 A64 agree that rotate distances are modulo
> the operand width. I believe PPC and MIPS do the same but am not sure (it's
> been a while), no clue about other architectures.
>
> It certainly seems the most natural way to define it, since rotates are
> cyclic to begin with.
>
> 8- and 16-bit rotates will need to be lowered into multi-instruction
> sequences on most targets (x86 and x86-64 can do them directly, but
> RISC-lineage archs usually don't have rotates at smaller than word size).
> Having explicit modulo semantics might end up forcing an explicit extra AND
> there, so that's an extra cost there, but it would certainly be nice to
> have the rotate definition be total.
>
> -Fabian
>
> On 07/02/2018 09:27 AM, Sanjay Patel wrote:
>
>> I'm guessing nobody has started implementing any of the suggested rotate
>> functionality since there are still open questions, but let me know if I'm
>> wrong.
>>
>> We're still getting patches that try to work around the current
>> limitations (https://reviews.llvm.org/D48705 <
>> https://reviews.llvm.org/D48705> ), so we should move forward since
>> we've approximated/justified the cost and benefits.
>>
>> Let's settle on the intrinsic definition(s).
>>
>> 1. There was a suggestion to generalize rotate to a "valign" or "double
>> shift" (that's what x86 means with its poorly worded "double precision
>> shift"). How does that work with vector types? The options are a full
>> vector-size shift or a per-element shift. If it's the full vector, do we
>> still want/need a specialized rotate intrinsic for per-element? If it's
>> per-element, do we still want/need the other form for a full vector?
>>
>> 2. What is the behavior for a shift/rotate amount that is equal or
>> greater than the bit-width of the operand (or the bit width of a vector
>> element type?)? Can we modulo that operand by the bit width, or does that
>> not map well to the hardware semantics?
>>
>> On Thu, May 17, 2018 at 5:23 PM, John Regehr <regehr at cs.utah.edu <mailto:
>> regehr at cs.utah.edu>> wrote:
>>
>> Thanks Sanjay!
>>
>> At this point the cost/benefit tradeoff for rotate intrinsics
>> seems pretty good.
>>
>> John
>>
>>
>> On 05/17/2018 11:14 AM, Sanjay Patel wrote:
>>
>> A rotate intrinsic should be relatively close in
>> cost/complexity to the existing bswap.
>>
>> A grep of intrinsic::bswap says we'd probably add code in:
>> InstCombine
>> InstructionSimplify
>> ConstantFolding
>> DemandedBits
>> ValueTracking
>> VectorUtils
>> SelectionDAGBuilder
>>
>> But I don't think it's fair to view those additions as pure
>> added cost. As an example, consider that we have to add hacks
>> to EarlyCSE to recognize multi-IR-instruction min/max/abs
>> patterns. Intrinsics just work as-is there. So if you search
>> for 'matchSelectPattern', you get an idea (I see 32 hits in 10
>> files) of the cost of *not* having intrinsics for those
>> operations that we've decided are not worthy of intrinsics.
>>
>>
>> On Wed, May 16, 2018 at 2:20 PM, John Regehr via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> <mailto:llvm-dev at lists.llvm.org
>>
>> <mailto:llvm-dev at lists.llvm.org>>> wrote:
>>
>> On 5/16/18 1:58 PM, Sanjay Patel via llvm-dev wrote:
>>
>> An informal metric might be: if the operation is
>> supported as a
>> primitive op or built-in in source languages and it is
>> supported
>> as a single target instruction, can we guarantee that
>> 1-to-1
>> translation through optimization?
>>
>>
>> It seems perfectly reasonable for LLVM users to expect this to
>> happen reliably.
>>
>> I'd like to take a look at the other side of the equation:
>> the cost
>> of adding a new intrinsic in terms of teaching passes to
>> see through
>> it, so we don't lose optimizations that worked before the
>> intrinsic
>> was added.
>>
>> For example, clearly ValueTracking needs a few lines added
>> so that
>> computeKnownBits and friends don't get stopped by a
>> rotate. Anyone
>> have a reasonably complete list of files that need similar
>> changes?
>>
>> John
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> <mailto:llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180702/fbf41a24/attachment.html>
More information about the llvm-dev
mailing list