[LLVMdev] More Encoding Ideas

Reid Spencer reid at x10sys.com
Mon Aug 23 21:37:02 PDT 2004


On Mon, 2004-08-23 at 19:46, Robert Mykland wrote:
> At 06:43 PM 8/20/2004, Chris Lattner wrote:
> >I don't understand what you're getting at here.  You can change char to
> >default to unsigned right now with llvm-gcc -funsigned-char.  I don't
> >understand how that would change anything to be more useful though.
> 
> Well, in the old days, char strings were handled just like any other kind 
> of array of primitive types.  

And, they still are :)

> In that world, when char defaulted to signed 
> char, most of the heavily used ASCII symbols took two bytes to 
> encode.  

Um. What? ASCII is a 7-bit encoding. It defines values 0-127 which, even
with a sign bit is encoded into one byte. Recall that in the "old days"
computers had a parity bit as the 8th-bit because the memory failure
rates were so high (think vacuum tubes). 

> Thus, (and I'm guessing here), you guys decided to treat char 
> strings as a special case to save space in the bytecode file.

Actually, LLVM doesn't really treat character strings specially EXCEPT
in the bcwriter and bcreader. There is no notion in LLVM of a "string",
just primitive types and arrays of them. It is up to the front end
compiler to define what it means by a "string". In the bytecode
libraries of LLVM, we chose to interpret "[n x ubyte]" and "[n x sbyte]"
as "strings" for reading and writing efficiency. They are, however,
still just arrays of one of the two primitive single-byte types.

> If all pointer types are implied, not a problem to create them.  However, 
> in larger files it may cost a little due to slightly larger type 
> numbers.  I'm not sure about the tradeoff here, but I expect that implied 
> pointers would still save more just because of pointers to function types.

Pointers are used heavily in almost all languages. I can almost
guarantee that the "tradeoff" would be larger bytecode files. The use of
pointers to function types is not all that frequent so I wouldn't expect
it to save much.  In any event, we're not going to do anything with this
until there are solid numbers. I'm working on improving llvm-bcanalyzer
to provide them.

Reid
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040823/f72a1033/attachment.sig>


More information about the llvm-dev mailing list