[LLVMdev] More Encoding Ideas
Robert Mykland
robert at ascenium.com
Fri Aug 20 17:55:02 PDT 2004
At 05:09 PM 8/20/2004, you wrote:
>On Fri, 20 Aug 2004, Reid Spencer wrote:
> > > defined would be almost always stored in one byte instead of the present
> > > usual two.
> >
> > So, if I get you correctly, you're advocating the creation of a
> Type::CharTyID
> > in the TypeID enumeration that is always written as a single byte? Note
> that
> > right now all ASCII values ( <128 ) will be written as a single byte for
> > UByteTyID but for SByteTyID (often the default from FE compilers like GCC),
> > you're right, they'll take two bytes if the value > 63. Or are you
> saying that
> > we should always write UByteTyID and SByteTyID as a single byte?
> >
> > Long term, LLVM's distinction between signed and unsigned will go away.
> Talk to
> > Chris about that. :)
>
>If you're interested in the plans, they are described in some detail here:
>http://nondot.org/sabre/LLVMNotes/TypeSystemChanges.txt
>
>Note that there is no concrete timeline for this to happen, it basically
>depends on when someone is ambitious enough to start working on it.
>
>In any case, both signed and unsigned 8-bit constants can be written out
>in a single byte. Again, do you think it's worth special casing this
>though? Considering that we handle 8-bit strings specially already, there
>are not a ton of 8-bit constants with value >= 128.
I'd rather that they not be treated specially. If char defaulted to
unsigned char, there would be little reason to create this special case.
> > > 2) I think it would be a big file size and processing speed win to have
> > > implied pointer types for every literal type. This would save a
> > > tremendous amount of space in the global type table and other places
> > > where pointer types are constantly being defined. So the primitive
> > > types list would change to:
> > >
> > > 0 void
> > > 1 void* (implied)
>
>This is a very interesting idea, particularly for languages like C++ that
>have a ton of types. Before making this change, I would want to see some
>numbers though. In particular, I don't think that types typically take up
>a large amount of the .bc file size: most of it are instructions.
>
>Are you seeing other cases?
No. This would only save a bit less than two bytes per primitive and
defined type. Maybe a few hundred bytes in a large LLVM file. Not a big
savings, but a savings. The thing I like is that along with the size
savings it appears to make the encode/decode simpler and quicker if
anything. So good news all around.
> > > This approach would have the added advantage of being able to check to
> > > see whether anything is a pointer type by checking bit 0 (1 = yes) and
> > > deriving its dereferenced type (just subtract 1).
>
>I don't think this is a big win, the .bc reader doesn't have to do much of
>this.
I know my reader does this. I'm not really sure how much time it spends
doing it. My little code generator spends a lot of time going back and
forth between pointers and literal values when turning certain kinds of
memory operations into data movement in the Ascenium array.
> > > 3) Have the value index for labels start at 1, just like nonzero values
> > > of everything else does. This just makes the encode/decode algorithm
> > > simpler and I doubt it would cost anything in file size. I made this
> > > suggestion a few emails back, hopefully in a clearer form here.
> >
> > Like I replied, we don't store labels as values in LLVM. Labels are
> just the
> > names of basic blocks. Those names are stored in the function level symbol
>
>I think that Robert's point is that this would remove a special case from
>the code (which is good). I'm indifferent about the change: if some other
>changes are made to the .bc file format, this could go in as well.
Cool.
> > > 4) Can files have multiple 0x01 headers? I've never seen more than
> > > one. If not, ditch this four bytes of unnecessary space per file.
> >
> > I think the original plan was to have multiple modules in them but this
> seems
> > to have gone by the wayside. The result of linking two (or more)
> modules is a
> > single module so except in some really bizare corner cases the need for
> > multiple modules would go away. I suppose we could get rid of the block id
> > field for the file. I'll give this some thought and see if Chris has any
> > objections.
>
>I don't have any problem with removing it.
Cool. Before you chop remember debug libraries.
> > Long term, I intend to write some kind of bytecode archive utility
> similar to
> > JAR files that contains multiple bytecode files, an index, and the
> whole thing
>
>Sounds like a cool thing. If you did this, make sure that llvm-nm could
>read the files (of course), and, if/when you do this, you could make the
>interface be llvm-ar (which was never finished).
Seconded!
Regards,
-- Robert.
Robert Mykland Voice: (831) 462-6725
Founder/CTO Ascenium Corporation
More information about the llvm-dev
mailing list