[llvm-dev] Rotates, once again
Fabian Giesen via llvm-dev
llvm-dev at lists.llvm.org
Mon Jul 2 15:36:51 PDT 2018
On 7/2/2018 3:16 PM, Sanjay Patel wrote:
> I also agree that the per-element rotate for vectors is what we want for
> this intrinsic.
>
> So I have this so far:
>
> declare i32 @llvm.catshift.i32(i32 %a, i32 %b, i32 %shift_amount)
> declare <2 x i32> @llvm.catshift.v2i32(<2 x i32> %a, <2 x i32> %b, <2 x i32> %shift_amount)
>
> For scalars, @llvm.catshift concatenates %a and %b, shifts the
> concatenated value right by the number of bits specified by
> %shift_amount modulo the bit-width, and truncates to the original
> bit-width.
> For vectors, that operation occurs for each element of the vector:
> result[i] = trunc(concat(a[i], b[i]) >> c[i])
> If %a == %b, this is equivalent to a bitwise rotate right. Rotate left
> may be implemented by subtracting the shift amount from the bit-width of
> the scalar type or vector element type.
Or just negating, iff the shift amount is defined to be modulo and the
machine is two's complement.
I'm a bit worried that while modulo is the Obviously Right Thing for
rotates, the situation is less clear for general funnel shifts.
I looked over some of the ISAs I have docs at hand for:
- x86 (32b/64b variants) has SHRD/SHLD, so both right and left variants.
Count is modulo (mod 32 for 32b instruction variants, mod 64 for 64b
instruction variants). As of BMI2, we also get RORX (non-flag-setting
ROR) but no ROLX.
- ARM AArch64 has EXTR, which is a right funnel shift, but shift
distances must be literal constants. EXTR with both source registers
equal disassembles as ROR and is often special-cased in implementations.
(EXTR with source 1 != source 2 often has an extra cycle of latency).
There is RORV which is right rotate by a variable (register) amount;
there is no EXTRV.
- NVPTX has SHF
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#logic-and-shift-instructions-shf)
with both left/right shift variants and with both "clamp" (clamps shift
count at 32) and "wrap" (shift count taken mod 32) modes.
- GCN has v_alignbit_b32 which is a right funnel shift, and it seems to
be defined to take shift distances mod 32.
based on that sampling, modulo behavior seems like a good choice for a
generic IR instruction, and if you're going to pick one direction, right
shifts are the one to use. Not sure about other ISAs.
-Fabian
More information about the llvm-dev
mailing list