[LLVMdev] allocating registers less "sparingly"

Mon Nov 5 09:21:55 PST 2007

On Nov 5, 2007, at 2:55 AM, Pekka Jääskeläinen wrote:

> Hello LLVM people,
>
> Our customizable TTA target [1] is capable of having plenty of  
> registers
> and register file ports to improve instruction level parallelism and
> reduce spills. It's totally up to the designer of the particular TTA
> processor how much the processor has registers and register file  
> resources
> along with other TTA components.
>
> We have ported LLVM 2.1 to produce an intermediate TTA program  
> format with
> registers allocated (it adapts to the architecture's register  
> files) but without
> operations bound to target's function units, nor scheduled to  
> instructions.
> The operation binding and the final scheduling is done as a post  
> pass in
> our TTA-framework called TCE [2] (not yet released to public).
>
> The problem is that the register allocators in LLVM seem to reuse  
> registers
> too much, that is, allocate them "too sparingly". This leads to false
> dependencies which unnecessarily restrict scheduling freedom in our  
> post pass
> instruction scheduler. That is, even if the target has 64  
> registers, LLVM
> might only use, say 13, and have multiple writes to the same  
> register in
> the same basic block. Is there some easy way to tune this behavior  
> so it
> reuses registers only if it really has to (when it runs out of new  
> ones)?

What you want is to remove anti-dependency before post-allocation  
scheduling. Alas, there is nothing in current LLVM implementation  
that deals with this. There are quite a few papers on this topic,  
e.g. http://citeseer.ist.psu.edu/calland97removal.html

I am also very interested in this. Do you plan to contribute your  
work back to the community? :-)

Evan

>
> I can understand that for many architectures without static  
> scheduling etc.
> the "allocate registers sparingly"-approach is the best one as it  
> might lead
> to less context data to be saved in calls, etc., but in our case  
> the calls are
> not usually the bottleneck as our target programs are at the moment  
> mainly DSP
> kernels and multimedia codecs which are usually loop-oriented. In  
> addition,
> LLVM seems to do an excellent job in inlining function calls.
>
> I look forward to your replies.
>
> The links:
>
> [1] http://en.wikipedia.org/wiki/Transport_triggered_architecture
> [2] http://tce.cs.tut.fi/papers/TCE_Overview.pdf
>
> Best regards,
> -- 
> Pekka Jääskeläinen,
> Researcher at Tampere Univ. of Technology, Finland.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev