[LLVMdev] More Encoding Ideas

Fri Aug 20 18:08:23 PDT 2004

On Fri, 2004-08-20 at 17:39, Robert Mykland wrote:

> No, you're not getting the point.  The bytecode is bloated by pointers to 
> all types.  Every time a type is defined pretty much a pointer is defined 
> with it already, and since the pointer type is not implied by the 
> definition of the literal, we waste two or more bytes whenever we define a 
> type.  I'm advocating that every time we define a type an implied pointer 
> type to it is also created.  We'll save two or more bytes for every type 
> defined, including all function types.  That's hundreds of bytes for a big 
> LLVM file.

Okay, so I think I get it. The slot number for a type is a pointer type
if its odd, and not a pointer type if its even. So, there's actually no
need to create new TypeIDs. This is a pretty interesting idea actually.
So, to be explicit, what you're advocating is that:

Even Slot Number:
	Type = Types[ slot_num / 2 ]
Odd Slot Number:
	Type = Pointer[ slot_num / 2 ]

Yes? 

Essentially this eliminates pointer types from the type list altogether.
Cool idea. 

Where's the patch? :)

Seriously though, unless you want to do it, I think I'll probably do
this sometime before 1.4 is released. The savings are not huge (type
table is pretty small), but every little bit helps and this will halve
the number of entries. 

> >>3) Have the value index for labels start at 1, just like nonzero values 
> >>of everything else does.  This just makes the encode/decode algorithm 
> >>simpler and I doubt it would cost anything in file size.  I made this 
> >>suggestion a few emails back, hopefully in a clearer form here.
> >
> >Like I replied, we don't store labels as values in LLVM. Labels are just 
> >the names of basic blocks. Those names are stored in the function level 
> >symbol table but there's no value slot indices for labels.  Perhaps I'm 
> >missing something here? I'm still not exactly sure what you mean by "the 
> >value index for labels". Do you mean its slot number? If so, I think 
> >you're mistaken on the file format or the doc is misleading you. Please 
> >let me know which so I can fix the doc. :)
> 
> The symbol table refers to a value index for each label.  This value index 
> starts counting at zero, instead of one like for every other data type.  So 
> my function that retrieves a thing based on its value index has to have a 
> special case in it for labels.  That's what I'm referring to.

Okay, substitute "BasicBlock" for "label" and I think I get you.  This
doesn't buy us anything for bytecode file size, but if you need it to
simplify your implementation, I'd suggest submitting a patch and I'll
apply it.

> I can see how multiple 0x01 modules in a single LLVM file would be handy if 
> you wanted to easily maintain seperate module name spaces in debug versions 
> of libraries.

I proposed namespaces to Chris some months ago and his answer was that
they just disappear into the names of things (via mangling). However, I
still think that it would save some bytecode file size if we could
eliminate the redundant (namespace) parts of each symbol. Essentially,
we'd be replacing the portion of a symbol's name with a vbr_uint to the
namespace object.

Anyway, this is a larger design question that needs a larger audience.
I'd want to hear from mnost of the other developers on this topic.

> 
> Gosh, but at the very least, since it's a special case anyway, we could 
> shrink this field down to a byte.

It will go away in version 1.4.

> Come to think of it, you could still have multiple seperate LLVM modules in 
> a file without this marker.  If there were still bytes in the file after 
> the last byte specified by the module size, you'd be in another 
> module.  Debug libraries is such a rare case that this wouldn't be too big 
> a deal.

True that.

> >>5) Don't write the compaction table for a function if there are no 
> >>entries.  All my simple examples have empty compaction tables that use up 
> >>8 bytes per function.  This would save space.
> >
> >Hmm. That's not supposed to happen. Have you got a bytecode and source 
> >file that produces this? The algorithm in bcwriter for deciding whether to 
> >emit the compaction table is pretty complicated. It should avoid the 
> >compaction table any time there's no advantage to it space wise. The 
> >compaction table's *sole* purpose is to save bytecode space and its BAD if 
> >it gets used incorrectly.
> 
> It happens all the time!  I guess this is a bug.

Could you please open a bug report and attach a test case to it? This is
a serious problem that I need to fix but in order to do that I need a
reliable test case for repeatability.

Thanks Robert.

Reid.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040820/04c55b08/attachment.sig>