[LLVMdev] More Encoding Ideas

Chris Lattner sabre at nondot.org
Fri Aug 20 17:09:03 PDT 2004


On Fri, 20 Aug 2004, Reid Spencer wrote:
> > defined would be almost always stored in one byte instead of the present
> > usual two.
>
> So, if I get you correctly, you're advocating the creation of a Type::CharTyID
> in the TypeID enumeration that is always written as a single byte? Note that
> right now all ASCII values ( <128 ) will be written as a single byte for
> UByteTyID but for SByteTyID (often the default from FE compilers like GCC),
> you're right, they'll take two bytes if the value > 63.  Or are you saying that
> we should always write UByteTyID and SByteTyID as a single byte?
>
> Long term, LLVM's distinction between signed and unsigned will go away. Talk to
> Chris about that. :)

If you're interested in the plans, they are described in some detail here:
http://nondot.org/sabre/LLVMNotes/TypeSystemChanges.txt

Note that there is no concrete timeline for this to happen, it basically
depends on when someone is ambitious enough to start working on it.

In any case, both signed and unsigned 8-bit constants can be written out
in a single byte.  Again, do you think it's worth special casing this
though?  Considering that we handle 8-bit strings specially already, there
are not a ton of 8-bit constants with value >= 128.

> > 2) I think it would be a big file size and processing speed win to have
> > implied pointer types for every literal type.  This would save a
> > tremendous amount of space in the global type table and other places
> > where pointer types are constantly being defined.  So the primitive
> > types list would change to:
> >
> > 0       void
> > 1       void* (implied)

This is a very interesting idea, particularly for languages like C++ that
have a ton of types.  Before making this change, I would want to see some
numbers though.  In particular, I don't think that types typically take up
a large amount of the .bc file size: most of it are instructions.

Are you seeing other cases?

> > This approach would have the added advantage of being able to check to
> > see whether anything is a pointer type by checking bit 0 (1 = yes) and
> > deriving its dereferenced type (just subtract 1).

I don't think this is a big win, the .bc reader doesn't have to do much of
this.

> > 3) Have the value index for labels start at 1, just like nonzero values
> > of everything else does.  This just makes the encode/decode algorithm
> > simpler and I doubt it would cost anything in file size.  I made this
> > suggestion a few emails back, hopefully in a clearer form here.
>
> Like I replied, we don't store labels as values in LLVM. Labels are just the
> names of basic blocks. Those names are stored in the function level symbol

I think that Robert's point is that this would remove a special case from
the code (which is good).  I'm indifferent about the change: if some other
changes are made to the .bc file format, this could go in as well.

> > 4) Can files have multiple 0x01 headers?  I've never seen more than
> > one.  If not, ditch this four bytes of unnecessary space per file.
>
> I think the original plan was to have multiple modules in them but this seems
> to have gone by the wayside. The result of linking two (or more) modules is a
> single module so except in some really bizare corner cases the need for
> multiple modules would go away. I suppose we could get rid of the block id
> field for the file. I'll give this some thought and see if Chris has any
> objections.

I don't have any problem with removing it.

> Long term, I intend to write some kind of bytecode archive utility similar to
> JAR files that contains multiple bytecode files, an index, and the whole thing

Sounds like a cool thing.  If you did this, make sure that llvm-nm could
read the files (of course), and, if/when you do this, you could make the
interface be llvm-ar (which was never finished).

> > I'm committed to making LLVM
> > bytecode as compact and as quick to encode/decode as possible.
>
> Thanks, we appreciate that a lot. Its high on our agenda too.

I totally agree as well.  :)

-Chris

-- 
http://llvm.org/
http://nondot.org/sabre/




More information about the llvm-dev mailing list