[llvm-dev] Rotates, once again

Thu Jul 12 08:34:14 PDT 2018

Initial patch proposal:
https://reviews.llvm.org/D49242

I tried to add anyone that replied to this thread as a potential reviewer.
Please add yourself if I missed you.

On Sun, Jul 8, 2018 at 9:23 AM, Sanjay Patel <spatel at rotateright.com> wrote:

> Yes, if we're going to define this as the more general 2-operand funnel
> shift, then we might as well add the matching DAG defs and adjust the
> existing ROTL/ROTR defs to match the modulo.
>
> I hadn't heard the "funnel shift" terminology before, but let's go with
> that because it's more descriptive/accurate than concat+shift.
>
> We will need to define both left and right variants. Otherwise, we risk
> losing the negation/subtraction of the shift amount via other transforms
> and defeat the point of defining the full operation.
>
> A few more examples to add to Fabian's:
>  - x86 AVX512 added vprol* / vpror* instructions for 32/64-bit element
> vector types with constant and variable rotate amounts. The "count
> operand modulo the data size (32 or 64) is used".
>
> - PowerPC defined scalar rotates with 'rl*' (everything is based on
> rotating left). Similarly, Altivec only has 'vrl*' instructions for vectors
> and all ops rotate modulo the element size. The funnel op is called
> "vsldoi". So again, it goes left.
>
>
>
> On Sun, Jul 8, 2018 at 7:23 AM, Simon Pilgrim <llvm-dev at redking.me.uk>
> wrote:
>
>>
>>
>> On 02/07/2018 23:36, Fabian Giesen via llvm-dev wrote:
>>
>>> On 7/2/2018 3:16 PM, Sanjay Patel wrote:
>>>
>>>> I also agree that the per-element rotate for vectors is what we want
>>>> for this intrinsic.
>>>>
>>>> So I have this so far:
>>>>
>>>> declare  i32  @llvm.catshift.i32(i32 %a, i32 %b, i32 %shift_amount)
>>>> declare  <2  x  i32>  @llvm.catshift.v2i32(<2  x i32>  %a, <2 x i32>
>>>> %b, <2 x i32> %shift_amount)
>>>>
>>>> For scalars, @llvm.catshift concatenates %a and %b, shifts the
>>>> concatenated value right by the number of bits specified by %shift_amount
>>>> modulo the bit-width, and truncates to the original bit-width.
>>>> For vectors, that operation occurs for each element of the vector:
>>>>     result[i] = trunc(concat(a[i], b[i]) >> c[i])
>>>> If %a == %b, this is equivalent to a bitwise rotate right. Rotate left
>>>> may be implemented by subtracting the shift amount from the bit-width of
>>>> the scalar type or vector element type.
>>>>
>>>
>>> Or just negating, iff the shift amount is defined to be modulo and the
>>> machine is two's complement.
>>>
>>> I'm a bit worried that while modulo is the Obviously Right Thing for
>>> rotates, the situation is less clear for general funnel shifts.
>>>
>>> I looked over some of the ISAs I have docs at hand for:
>>>
>>> - x86 (32b/64b variants) has SHRD/SHLD, so both right and left variants.
>>> Count is modulo (mod 32 for 32b instruction variants, mod 64 for 64b
>>> instruction variants). As of BMI2, we also get RORX (non-flag-setting ROR)
>>> but no ROLX.
>>>
>>> - ARM AArch64 has EXTR, which is a right funnel shift, but shift
>>> distances must be literal constants. EXTR with both source registers equal
>>> disassembles as ROR and is often special-cased in implementations. (EXTR
>>> with source 1 != source 2 often has an extra cycle of latency). There is
>>> RORV which is right rotate by a variable (register) amount; there is no
>>> EXTRV.
>>>
>>> - NVPTX has SHF (https://docs.nvidia.com/cuda/
>>> parallel-thread-execution/index.html#logic-and-shift-instructions-shf)
>>> with both left/right shift variants and with both "clamp" (clamps shift
>>> count at 32) and "wrap" (shift count taken mod 32) modes.
>>>
>>> - GCN has v_alignbit_b32 which is a right funnel shift, and it seems to
>>> be defined to take shift distances mod 32.
>>>
>>> based on that sampling, modulo behavior seems like a good choice for a
>>> generic IR instruction, and if you're going to pick one direction, right
>>> shifts are the one to use. Not sure about other ISAs.
>>>
>>> -Fabian
>>>
>> Sorry for the late reply to this thread, I'd like to mention that the
>> existing ISD ROTL/ROTR opcodes currently do not properly assume modulo
>> behaviour so that definition would need to be tidied up and made explicit;
>> the recent legalization code might need fixing as well. Are you intending
>> to add CONCATSHL/CONCATSRL ISD opcodes as well?
>>
>> Additionally the custom SSE lowering that I added doesn't assume modulo
>> (although I think the vXi8 lowering might work already), and it only lowers
>> for ROTL at the moment (mainly due to a legacy of how the XOP instructions
>> work), but adding ROTR support shouldn't be difficult.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180712/8b6e8e0a/attachment.html>