[PATCH] D42002: [docs] Only LLVM IR bitstreams begin with 'BC'

Brian Gesiak via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 12 10:33:04 PST 2018


modocache created this revision.
modocache added reviewers: harlanhaskins, eugenis, mehdi_amini, pcc.

The LLVM Bitcode File Format documentation states that all bitstreams
begin with the magic number 'BC', and that generic bitstream analyzer
tools may check for this number in order to determine whether the
stream is a bitstream.

However, in practice:

- Only LLVM IR bitcode begins with 'BC'. Other bitstreams -- Clang AST files and precompiled headers, Clang serialized diagnostics, Swift modules -- do not start with 'BC'. A tool that actually checked for 'BC' would only be able to recognize LLVM IR.
- The `llvm-bcanalyzer`, arguably the most used generic bitstream analyzer tool, does not check for a magic number 'BC' (except to determine whether the file is LLVM IR).

Update the bitcode format documentation to make it clear that not all
bitstreams begin with 'BC', and that tools should not rely on that
particular magic number value.

Test Plan:
Build the `docs-llvm-html` target and confirm the changes render in
a Safari web browser.


https://reviews.llvm.org/D42002

Files:
  docs/BitCodeFormat.rst


Index: docs/BitCodeFormat.rst
===================================================================
--- docs/BitCodeFormat.rst
+++ docs/BitCodeFormat.rst
@@ -62,10 +62,12 @@
 Magic Numbers
 -------------
 
-The first two bytes of a bitcode file are 'BC' (``0x42``, ``0x43``).  The second
-two bytes are an application-specific magic number.  Generic bitcode tools can
-look at only the first two bytes to verify the file is bitcode, while
-application-specific programs will want to look at all four.
+The first four bytes of a bitstream are used as an application-specific magic
+number.  Generic bitcode tools may look at the first four bytes to determine
+whether the stream is a known stream type.  However, these tools should *not*
+determine whether a bitstream is valid based on its magic number alone.  New
+application-specific bitstream formats are being developed all the time; tools
+should not reject them just because they have a hitherto unseen magic number.
 
 .. _primitives:
 
@@ -496,12 +498,9 @@
 The magic number for LLVM IR files is:
 
 :raw-html:`<tt><blockquote>`
-[0x0\ :sub:`4`, 0xC\ :sub:`4`, 0xE\ :sub:`4`, 0xD\ :sub:`4`]
+['B'\ :sub:`8`, 'C'\ :sub:`8`, 0x0\ :sub:`4`, 0xC\ :sub:`4`, 0xE\ :sub:`4`, 0xD\ :sub:`4`]
 :raw-html:`</blockquote></tt>`
 
-When combined with the bitcode magic number and viewed as bytes, this is
-``"BC 0xC0DE"``.
-
 .. _Signed VBRs:
 
 Signed VBRs


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D42002.129660.patch
Type: text/x-patch
Size: 1401 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180112/446af654/attachment.bin>


More information about the llvm-commits mailing list