[llvm-dev] Rotates, once again
Simon Pilgrim via llvm-dev
llvm-dev at lists.llvm.org
Sun Jul 8 06:23:24 PDT 2018
On 02/07/2018 23:36, Fabian Giesen via llvm-dev wrote:
> On 7/2/2018 3:16 PM, Sanjay Patel wrote:
>> I also agree that the per-element rotate for vectors is what we want
>> for this intrinsic.
>>
>> So I have this so far:
>>
>> declare i32 @llvm.catshift.i32(i32 %a, i32 %b, i32 %shift_amount)
>> declare <2 x i32> @llvm.catshift.v2i32(<2 x i32> %a, <2 x i32>
>> %b, <2 x i32> %shift_amount)
>>
>> For scalars, @llvm.catshift concatenates %a and %b, shifts the
>> concatenated value right by the number of bits specified by
>> %shift_amount modulo the bit-width, and truncates to the original
>> bit-width.
>> For vectors, that operation occurs for each element of the vector:
>> result[i] = trunc(concat(a[i], b[i]) >> c[i])
>> If %a == %b, this is equivalent to a bitwise rotate right. Rotate
>> left may be implemented by subtracting the shift amount from the
>> bit-width of the scalar type or vector element type.
>
> Or just negating, iff the shift amount is defined to be modulo and the
> machine is two's complement.
>
> I'm a bit worried that while modulo is the Obviously Right Thing for
> rotates, the situation is less clear for general funnel shifts.
>
> I looked over some of the ISAs I have docs at hand for:
>
> - x86 (32b/64b variants) has SHRD/SHLD, so both right and left
> variants. Count is modulo (mod 32 for 32b instruction variants, mod 64
> for 64b instruction variants). As of BMI2, we also get RORX
> (non-flag-setting ROR) but no ROLX.
>
> - ARM AArch64 has EXTR, which is a right funnel shift, but shift
> distances must be literal constants. EXTR with both source registers
> equal disassembles as ROR and is often special-cased in
> implementations. (EXTR with source 1 != source 2 often has an extra
> cycle of latency). There is RORV which is right rotate by a variable
> (register) amount; there is no EXTRV.
>
> - NVPTX has SHF
> (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#logic-and-shift-instructions-shf)
> with both left/right shift variants and with both "clamp" (clamps
> shift count at 32) and "wrap" (shift count taken mod 32) modes.
>
> - GCN has v_alignbit_b32 which is a right funnel shift, and it seems
> to be defined to take shift distances mod 32.
>
> based on that sampling, modulo behavior seems like a good choice for a
> generic IR instruction, and if you're going to pick one direction,
> right shifts are the one to use. Not sure about other ISAs.
>
> -Fabian
Sorry for the late reply to this thread, I'd like to mention that the
existing ISD ROTL/ROTR opcodes currently do not properly assume modulo
behaviour so that definition would need to be tidied up and made
explicit; the recent legalization code might need fixing as well. Are
you intending to add CONCATSHL/CONCATSRL ISD opcodes as well?
Additionally the custom SSE lowering that I added doesn't assume modulo
(although I think the vXi8 lowering might work already), and it only
lowers for ROTL at the moment (mainly due to a legacy of how the XOP
instructions work), but adding ROTR support shouldn't be difficult.
More information about the llvm-dev
mailing list