[LLVMdev] [llvm-commits] rotate

Tue Jul 31 10:22:29 PDT 2012

On 07/31/2012 06:17 PM, Eli Friedman wrote:
> On Tue, Jul 31, 2012 at 8:42 AM, Cameron McInally
> <cameron.mcinally at nyu.edu> wrote:
>> Andy,
>>
>> Here is the left circular shift operator patch. I apologize to the reviewer
>> in advance. The patch has a good bit of fine detail. Any
>> comments/criticisms?
>>
>> Some caveats...
>>
>> 1) This is just the bare minimum needed to make the left circular shift
>> operator work (e.g. no instruction combining).
>>
>> 2) I tried my best to select operator names in the existing style; please
>> feel free to change them as appropriate.
> We intentionally haven't included a rotate instruction in LLVM in the
> past; the justification is that it's generally straightforward for the
> backend to form rotate operations, and making the optimizer
> effectively handle the new rotation instruction adds a substantial
> amount of complexity.  You're going to need to make a strong argument
> that the current approach is insufficient if you want to commit a
> patch like this.
>
Well,

I believe something is currently broken with respect to forming rotate
instructions :

For example, using a recent clang/llvm on linux/x86_64 :

uint32_t ror32(uint32_t input, size_t rot_bits) {
  return (input >> rot_bits) | (input << ((sizeof(input) << 3) - rot_bits));
}

uint32_t rol32(uint32_t input, size_t rot_bits) {
  return (input << rot_bits) | (input >> ((sizeof(input) << 3) - rot_bits));
}

gives the expected ror and rol instructions, but their 16bits counter
parts :

uint16_t ror16(uint16_t input, size_t rot_bits) {
  return (input >> rot_bits) | (input << ((sizeof(input) << 3) - rot_bits));
}

uint16_t rol16(uint16_t input, size_t rot_bits) {
  return (input << rot_bits) | (input >> ((sizeof(input) << 3) - rot_bits));
}

fail miserably :

        .globl  ror16
        .align  16, 0x90
        .type   ror16, at function
ror16:                                  # @ror16
        .cfi_startproc
# BB#0:                                 # %entry
        movb    %sil, %cl
        movl    %edi, %eax
        shrl    %cl, %eax
        movl    $16, %ecx
        subl    %esi, %ecx
                                        # kill: CL<def> CL<kill> ECX<kill>
        shll    %cl, %edi
        orl     %eax, %edi
        movzwl  %di, %eax
        ret
.Ltmp2:
        .size   ror16, .Ltmp2-ror16
        .cfi_endproc

        .globl  rol16
        .align  16, 0x90
        .type   rol16, at function
rol16:                                  # @rol16
        .cfi_startproc
# BB#0:                                 # %entry
        movb    %sil, %cl
        movl    %edi, %eax
        shll    %cl, %eax
        movl    $16, %ecx
        subl    %esi, %ecx
                                        # kill: CL<def> CL<kill> ECX<kill>
        shrl    %cl, %edi
        orl     %eax, %edi
        movzwl  %di, %eax
        ret
.Ltmp3:
        .size   rol16, .Ltmp3-rol16
        .cfi_endproc

At a quick first glance, this seems to be related to the values being
promoted from i16 to i32 in the IR optimization passes, but this may not
be the only reason.

-- 
Arnaud de Grandmaison