[LLVMdev] Some basic questions about LLVM version 1.8 bytecode format

Reid Spencer rspencer at reidspencer.com
Wed Oct 25 17:01:19 PDT 2006


Hi Robert,

On Wed, 2006-10-25 at 16:00 -0600, Robert Mykland wrote:
> I generated LLVM bytecode for a "hello world!" program just to get the
> basic bytecode structure.  I have a few questions about the global
> info module and the global constants module where there have
> apparently been changes since 1.4.

Okay.

>   I would be happy to collect these differences and do an edit pass of
> the bytecode spec once my decoder is fully up-to-snuff again.  

Great!

> I've put an annotated bytecode file after my questions to illustrate
> what I'm trying to sort out about the bytecode.

Very nice.

> 
> 1) In the global info module, it looks like an extra bit has been
> added to global and function definitions.  I'm just guessing this
> because it appears to make the type slot info work out.  What is the
> extra bit for?  In this simple example, it appears to always be 1.

Bits 5 and higher are used for the slot table index. There is no special
significance. If you're seeing them all be 1, then you're only looking
at the ones with odd slot numbers (since bit 5 is the least significant
bit in the slot number).

There is one special case. If the linkage field is internal (value 3)
and the initializer field is 0 (false) then it indicates that the global
uses an extension word for its info. This is necessary if it has a
non-zero alignment or a section.  Unfortunately, I don't think this is
currently documented. See lib/Bytecode/Writer at line 980 for the logic.

> 
> 2) There are only two function calls in this little file, and the
> first one decodes fine, but the second one appears to have the wrong
> type slot information.  Just a guess: is this type slot info maybe
> always the actual function type slot minus 1 instead of the slot of
> the pointer to the function?

Slot 0 is reserved for arrays of sbyte .. an optimization for strings.

> 3) Looks like library dependencies section is empty even though I
> would be expecting libc to be here.  Unused?

Completely depends on your source language compiler. Its quite valid for
it to be empty. If it was generated with an old llvm-gcc3 its possible
that the deplibs feature is not in your version of llvm-gcc3. Either
that or it doesn't depend on libc?  I can't tell .. don't know how your
bytecode file was created.

> 4) Looks like constant strings are initialized in the constants
> section now, since it looks like this section ID stuff in the globals
> module is not used or has changed? 

It is used.  See the code I mentioned above.

>  Also, I'm finding my constant string is definitely in the constants
> section when I expected to find just a type slot number.

Constant strings are handled specially. Instead of having a bunch of
values in the "sbyte" slot (one for each character), which was the
original design, we now detect constant array of sbyte as a special
case, assign its type as slot 0 and write the entire string of
characters as the value (instead of a value for each char). 

> 5) Again looks like the function pointer type wants to be 0x12 instead
> of 0x11 here?

I'm not following this question.

> 
> 6) After this my decode of the last few bytes of the constants section
> just started to break down.  Any insight you can give me re the
> meaning of these last few bytes in the constants module would be
> appreciated.

Have you used llvm-bcanalyzer to read your bytecode files? It might help
you with your analysis.

> 
> Here's the bytecode file I'm looking at (annotated).  Interesting bits
> are marked with five question marks:
> 
> Signature = llvc0
> 00000000  6c 6c 76 63 30
> 
> Module block ID = 0x01 and size = 0x0a3
>  01 00 00  00 a3 00 00 00
> 
> Format information
>  50 = 01010000
>              ^ Target is little endian
>             ^- Target pointers are 32-bit
>            ^-- Target has endianess
>           ^--- Target has pointer size
>       ^^^^---- Bytecode format 5
> 
> ***********************************************************
> 
> Global type pool ID = 0x06 and size = 0x014
>  86 02  |llvc0........P..|
> 00000010  00 00
> 
> Global type pool
> 
> Number of definitions = 7
>  07
> 
> 0x0d = Pointer to array of sbyte[18]
>  10 0e
> 
> 0x0e = Array of sbyte[18]
>  0f 03 12
> 
> 0x0f = Pointer to function int ()
>   10 10
> 
> 0x10 = Function int ()
>  0d 07 00
> 
> 0x11 = Pointer to function int ( sbyte*, ... )
>  10 13
> 
> 0x12 = Pointer to sbyte
>  10  |................|
> 00000020  03
> 
> 0x13 = Function int ( sbyte*, ... )
>  0d 07 02 12 00
> 
> ***********************************************************
> 
> Module globals info ID = 0x05 and size = 0x01e
>  c5 03  00 00
> 
> Global definition
>  af 03 = 0000001110101111
>                         ^ Is a constant
>                        ^- Has an initializer
>                     ^^^-- Linkage = internal
>                    ^----- ????? <--- see question #1

There's nothing special about this bit, its part of the slot number.

>          ^^^^^^^^^^------ Type slot = sbyte[18]
>  01 = Value slot number of the initializer 
> 
> End of globals
>  00
> 
> Function definition
>  e1 03  |................| = 0000001111100001
>                                          ^^^^ Calling convention = 1
>                                         ^---- Internal
>                                        ^----- ????? <--- see question
> #1

Same thing. Part of the slot number.

>                              ^^^^^^^^^^------ Type slot = int (*)()
> 
> Function definition
> 00000030  b1 04 = 0000010010110001
>                               ^^^^ Calling convention = 1
>                              ^---- External
>                             ^----- ????? <--- see question #1
>                   ^^^^^^^^^^------ Type slot = 0x012 = sbyte*?????
> <--- see question #2
> 
> End of functions
>  00
> 
> Depends on no libraries????? <--- see question #3
>  00
> 
> Target triple = "i686-pc-linux-gnu"
>  11 69 36 38  36 2d 70 63 2d 6c 69 6e  |.....i686-pc-lin|
> 00000040  75 78 2d 67 6e 75
> 
> Section strings for globals: none????? <----- see question #4
>  00

I don't think I understand the question here.
> 
> Inline asm block: none
>  00
> 
> ***********************************************************
> 
> Module constant pool ID = 0x03 and size = 0x01f
>   e3 03 00 00
> 
> Module constant pool
> 
> One constant string sbyte[18] = "Hello RKM world!\n" ????? <--- see
> question #4
>  01 00 0e 48  |ux-gnu.........H|
> 00000050  65 6c 6c 6f 20 52 4b 4d  20 77 6f 72 6c 64 21 0a  |ello RKM
> world!.|
> 00000060  00

Yes, this is the value of the sbyte[18].
> 
> One constant sbyte*????? <--- see question #5
>  01 12 04 1a
> 
> One constant pointer to array of sbyte[18] = NULL????? <--- see
> question #6
>  01 0d 00
> 
> ????? <--- see question #6
>   07 00 06
> 
> ***********************************************************
> 
> Function definition ID = 0x02 and size = 0x023
>  62 04 00 00
> 
> Function definition
>  00  |...........b....|
> 00000070  27 01 00 00 74 11 02 01  01 05 07 00 00 24 02 00
> |'...t........$..|
> 00000080  00 00 01 0c 00 05 65 6e  74 72 79 01 07 01 03 74
> |......entry....t|
> 00000090  6d 70
> 
> ***********************************************************
> 
> Symbol table ID = 0x04 and size = 0x1a
>  44 03 00 00 00 01  11 01 06 70 72 69 6e 74  |mpD........print|
> 000000a0  66 01 0f 01 04 6d 61 69  6e 01 0d 01 03 73 74 72  |
> f....main....str|
> 000000b0
> 
> -- 
> Robert Mykland               Voice: (831) 462-6725
> Founder/CTO                   Ascenium Corporation
> "A new world of computing fulfilling people's lives"
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




More information about the llvm-dev mailing list