[LLVMdev] More Encoding Ideas
Robert Mykland
robert at ascenium.com
Fri Aug 20 12:52:45 PDT 2004
Dear Chris and Reid:
Some other random ideas I've had as I've been sifting through the new
bytecode format. Please let me know what you think.
1) ANSI C allows for char to default to unsigned char. This is I guess not
how it normally is in GCC. If char defaulted to unsigned char several
things would be possible. Single char constants that are defined would be
almost always stored in one byte instead of the present usual two. Also
this would allow string constants to be stored in the constant table in the
regular fashion without wasting bytes. This would prevent the need to do
those expensive linear searches through the type slot list in order to
match a character constant with its type.
1a) If it's not feasible to make char default to unsigned, perhaps it would
be possible to put all string constants at type slot zero to eliminate the
linear searching. I realize this would somewhat violate LLVM's strict
typing rules, but since these are all constants their strict type could
always be derived if needed. Also, it would save space by eliminating the
need to create a proliferating number of char array types of various lengths.
1b) Failing this, you should at least store the type with each constant
string to avoid the linear searches. This solution would add space, but
save processing time, especially with large files with extensive type lists.
2) I think it would be a big file size and processing speed win to have
implied pointer types for every literal type. This would save a tremendous
amount of space in the global type table and other places where pointer
types are constantly being defined. So the primitive types list would
change to:
0 void
1 void* (implied)
2 bool
3 bool* (implied)
4 ubyte
5 ubyte* (implied)
6 sbyte
7 sbyte* (implied)
8 ushort
9 ushort* (implied)
etc.
This approach would have the added advantage of being able to check to see
whether anything is a pointer type by checking bit 0 (1 = yes) and deriving
its dereferenced type (just subtract 1).
3) Have the value index for labels start at 1, just like nonzero values of
everything else does. This just makes the encode/decode algorithm simpler
and I doubt it would cost anything in file size. I made this suggestion a
few emails back, hopefully in a clearer form here.
4) Can files have multiple 0x01 headers? I've never seen more than
one. If not, ditch this four bytes of unnecessary space per file.
5) Don't write the compaction table for a function if there are no
entries. All my simple examples have empty compaction tables that use up 8
bytes per function. This would save space.
I hope you find these suggestions helpful. I'm committed to making LLVM
bytecode as compact and as quick to encode/decode as possible.
Regards,
-- Robert.
Robert Mykland Voice: (831) 462-6725
Founder/CTO Ascenium Corporation
More information about the llvm-dev
mailing list