[LLVMdev] Porting LLVM backend is no fun yet

Mon Apr 13 10:44:19 PDT 2009

On Apr 11, 2009, at 5:03 PM, Greg McGary wrote:

> As we've already seen, David Chisnall prefers hacking LLVM over GCC  
> (see http://www.informit.com/articles/article.aspx?p=1215438): "In  
> contrast, every time I look at the GCC code, it takes two people to
> prevent me from clawing my eyeballs out."
>
> I'm sorry to report that so-far I have had the opposite experience.   
> Some years ago, I ported binutils (via CGEN) and GCC to an embedded  
> RISC CPU and found it the process straightforward and pleasant.   
> CGEN was especially handy for describing a sometimes quirky RISC  
> instruction set and offered great flexibility for factoring-out  
> commonalities.  By contrast, I have found TableGen to be much more  
> rigid and brittle.  There are too many constructs that need to be  
> special-cased, and the existing ports do them in gratuitously  
> different ways.  There also seem to be too many layers of classes  
> and helper functions in proportion to what's being specified.  I  
> guess that sums-up my gripe: low signal/noise and gratuitous  
> complexity.  Sorry to be complaining rather than proposing  
> solutions.  When I better get the hang of all of this, I expect to  
> have some ideas on how to improve TableGen.  Is there a development  
> plan or wishlist for TableGen?  I see nothing on the wiki yet.

Your observations are accurate; LLVM's CodeGen is comparatively
less mature in this area. There are numerous examples of these
symptoms.

There certainly are wishlist items for TableGen and TableGen-based
instruction descriptions, though I don't know of an official list.   
Offhand,
a few things that come to mind are the ability to handle nodes with
multiple results, something analogous to GCC's multi-alternative
constraints, the ability to generate more of the Legalize tables
automatically, and the ability to generate more of the TargetInstrInfo
hooks automatically.  There's no plan for things like this at the
moment though; they will get done only when someone steps up
and implements them.

>
> I must also say that the LLVM code is considerably "denser" because  
> of the unfortunate choice of BiCapitalizedIdentifierNames.   
> Underscores lend some horizontal whitespace to names and make their  
> subtokens visually distict.  BiCapped code is kinda like German with  
> its cumbersome compound nouns.

I guess this is just a matter of familiarity, and perhaps of choosing
an advantageous font.

Dan