[cfe-dev] thoughts about n-bit bytes for clang/llvm

Fri Sep 4 07:24:49 PDT 2009

Hello experts,

I am new to Clang I would like to support a system on chip where the  
smallest accessible data type is 16-bits.  In other words sizeof(char)  
== 1 byte == 16 bits.  My understanding is that C/C++ only requires 1  
byte >= 8-bits and  sizeof(char) <= sizeof(short) <= sizeof(int) <=  
sizeof(long)  <= sizeof(long long).

In clang/TargetInfo.h:

unsigned getBoolWidth(bool isWide = false) const { return 8; }  // FIXME
unsigned getBoolAlign(bool isWide = false) const { return 8; }  // FIXME

unsigned getCharWidth() const { return 8; } // FIXME
unsigned getCharAlign() const { return 8; } // FIXME
:
unsigned getShortWidth() const { return 16; } // FIXME
unsigned getShortAlign() const { return 16; } // FIXME

These are easy enough to fix and to make them configurable the same as  
IntWidth and IntAlign are.

There are two consequences that I am aware of that arise because of  
this change.

The first is in preprocessor initialization.  InitPreprocessor defines  
__INT8_TYPE__, __INT16_TYPE_, __INT32_TYPE__, and sometimes  
__INT64_TYPE__.  It only defines INT64 if sizeof(long long) is 64  
which seems odd to me.

   // 16-bit targets doesn't necessarily have a 64-bit type.
   if (TI.getLongLongWidth() == 64)
     DefineType("__INT64_TYPE__", TI.getInt64Type(), Buf);

In my case, __INT8_TYPE__ and __INT64_TYPE__ don't exist so it doesn't  
really make sense to define them.

I think a better way of generating these definitions would be to say  
the following (psuedo-code, it doesn't  actually compile)

// Define types for char, short, int, long, long long

DefineType(  "__INT" + TI.getCharWidth()) + "_TYPE__", TI.getCharWidth 
());

if (TI.getShortWidth() > TI.getCharWidth())
   DefineType(  "__INT" + TI.getShortWidth() + "_TYPE__",  
TI.getShortWidth());

if (TI.getIntWidth() > TI.getShortWidth())
    DefineType(  "__INT" + TI.getIntWidth() + "_TYPE__", TI.getIntWidth 
());

if (TI.getLongWidth() > TI.getIntWidth())
    DefineType(  "__INT" + TI.getLongWidth() + "_TYPE__",  
TI.getLongWidth());

if (TI.getLongLongWidth() > TI.getLongWidth())
    DefineType(  "__INT" + TI.getLongLongWidth() + "_TYPE__",  
TI.getLongLongWidth());

This would result in the creation of __INT8_TYPE__, __INT16_TYPE__,  
__INT32_TYPE__, __INT64_TYPE__ for most platforms.  For my platform it  
would only create __INT16_TYPE__, __INT32_TYPE__.  It would also work  
for wacky 9-bit machines and where INT8s don't make much sense and  
architectures where long long was 128 bits.

The other place I am aware of (thanks to useful assertion) that makes  
a difference is in Lex/LiteralSupport.cpp for the char literal  
parser.  I am still wrapping my head around this, but I think fixing  
it for arbitrary size is doable.  (As a new person, I need to figure  
out a good way to test it.)

Do these changes seem reasonable to pursue?  What other things are  
broken in Clang and LLVM by changing the assumption about 8-bit bytes?

Your advice is appreciated,
Ray

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20090904/0acde8c8/attachment.html>