[LLVMdev] Pinning registers in LLVM

Mon Jun 29 21:46:47 PDT 2009

On Jun 28, 2009, at 11:00 PM, David Terei wrote:

> Hi all,
>
> I'm working on using LLVM as a back-end for an existing compiler (GHC
> Haskell compiler)

Very cool!

> and one of the problems I'm having is pinning a
> global variable to a actual machine register. I've seen mixed
> terminology for this feature/idea, so what I mean by this is that I
> want to be able to put a global variable into a specified hardware
> register.

Lets separate two things here: 1) GCC's implementation of this feature  
2) the semantic/perf effect of doing it.

For 1) GCC implements this feature (with the example code you gave) by  
globally changing the allocatable register set for the backend and  
pinning the value to the specified physical register.  This is really  
easy for GCC to do (yay, global variables for everyone, even the  
backend) and has the "right effect".  However, this implementation is  
inappropriate in LLVM: if we wanted to take this approach, we'd have  
to encode the set of pinned physregs in the top-level module structure  
somewhere: this is not impossible, but it is kinda ugly.

#2 is the more interesting part of this.  Ignoring GCC's  
implementation of this, the semantic effect of this is that the  
calling convention of the functions in the translation unit are  
changed (so that the global is guaranteed to be in the specific  
physreg on entrance/exit of the function) and the global is guaranteed  
to be in the register in inline asms.  Interestingly (to me at  
least :), there is no guarantee that this value be in the physreg at a  
random point in the function.  There is no "defined" way to notice  
this, so the compiler can cheat and reuse the register if it wants to  
(e.g. spilling the temp value to the stack etc).  While you could  
notice this with a debugger, performance tool, etc, normal code should  
be fine.

> This declaration should thus reserve that machine register
> for exclusive use by this global variable. This is used in GHC since
> it defines an abstract machine as part of its execution model, with
> this abstract machine consisting of several virtual registers. Due to
> the frequency the virtual registers are accessed it is best for
> performance that they be permanently assigned to a physical machine
> register.

Right.  Coming back to "why do this", you want it because it is good  
for performance: these values are accessed frequently enough that  
going to globals (particularly for PIC code) is too expensive.

> A very simple example C program using this feature:
>
> --------------------------
> #include <stdio.h>
>
> register int R1 __asm__ ("esi");
>
> int main(void)
> {
> 	R1 = 3;
> 	printf("register: %d\n", R1);
> 	R1 *= 2;
> 	printf("register: %d\n", R1);
> 	return 0;
> }
> --------------------------
>
> llvm-gcc doesn't compile this program correctly, although according to
> the llvm-gcc release notes this extension was first supported by llvm-
> gcc in 1.9.

This program actually works for me if you build with -O, but it looks  
like it is an accident that it works :).  The implementation in llvm- 
gcc could definitely be fixed in this case.  However, the more  
interesting example wouldn't work: if printf were some other function  
and you read ESI in it.

If it were important to me to implement this, I'd implement this  
extension by adding a new custom calling convention to the X86 backend  
that always passed the first i32 value in ESI and always returned the  
first i32 value in ESI.  Given that, you could lower the above code to  
something like this pseudo code:

{i32,i32} @main(i32 %in_esi) {
   %esi = alloca i32
   store in_esi -> esi

   store 3 -> esi

   esi1 = load esi
   {esi2, dead} = call @printf(esi1, "register: %d\n", esi1);
   store esi2 -> esi

   esi3 = load esi
   esi4 = esi3*2
   store esi4 -> esi

   esi5 = load esi
   {esi6, dead} = call @printf(esi5, "register: %d\n", esi5);
   store esi6 -> esi

   esi7 = load esi
   ret {esi7, 0}
}

Each of printf and main would be marked with the custom CC.  After  
running mem2reg on this, you'd get:

{i32,i32} @main(i32 %in_esi) {
   {esi2, dead} = call @printf(3, "register: %d\n", 3);
   esi4 = esi2*2
   {esi6, dead} = call @printf(esi4, "register: %d\n", esi4);
   ret {esi6, 0}
}

When lowered at codegen time, the regalloc would trivially eliminate  
the copies into/out-of ESI and you'd get the code you desired.

No, I don't know of anyone planning to implement this, but it is  
conceptually quite simple :)

-Chris