[llvm-commits] CVS: llvm/docs/BitCodeFormat.html

Chris Lattner sabre at nondot.org
Sat May 12 00:49:45 PDT 2007



Changes in directory llvm/docs:

BitCodeFormat.html updated: 1.3 -> 1.4
---
Log message:

continued description


---
Diffs of the changes:  (+109 -6)

 BitCodeFormat.html |  115 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 109 insertions(+), 6 deletions(-)


Index: llvm/docs/BitCodeFormat.html
diff -u llvm/docs/BitCodeFormat.html:1.3 llvm/docs/BitCodeFormat.html:1.4
--- llvm/docs/BitCodeFormat.html:1.3	Sat May 12 00:37:42 2007
+++ llvm/docs/BitCodeFormat.html	Sat May 12 02:49:15 2007
@@ -18,6 +18,7 @@
     <li><a href="#abbrevid">Abbreviation IDs</a></li>
     <li><a href="#blocks">Blocks</a></li>
     <li><a href="#datarecord">Data Records</a></li>
+    <li><a href="#abbreviations">Abbreviations</a></li>
     </ol>
   </li>
   <li><a href="#llvmir">LLVM IR Encoding</a></li>
@@ -213,12 +214,14 @@
     current block.</li>
 <li>1 - <a href="#ENTER_SUBBLOCK">ENTER_SUBBLOCK</a> - This abbrev ID marks the
     beginning of a new block.</li>
-<li>2 - DEFINE_ABBREV - This defines a new abbreviation.</li>
-<li>3 - UNABBREV_RECORD - This ID specifies the definition of an unabbreviated
-    record.</li>
+<li>2 - <a href="#DEFINE_ABBREV">DEFINE_ABBREV</a> - This defines a new
+    abbreviation.</li>
+<li>3 - <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a> - This ID specifies the
+    definition of an unabbreviated record.</li>
 </ul>
 
-<p>Abbreviation IDs 4 and above are defined by the stream itself.</p>
+<p>Abbreviation IDs 4 and above are defined by the stream itself, and specify
+an <a href="#abbrev_records">abbreviated record encoding</a>.</p>
 
 </div>
 
@@ -303,11 +306,111 @@
 </div>
 
 <div class="doc_text">
+<p>
+Data records consist of a record code and a number of (up to) 64-bit integer
+values.  The interpretation of the code and values is application specific and
+there are multiple different ways to encode a record (with an unabbrev record
+or with an abbreviation).  In the LLVM IR format, for example, there is a record
+which encodes the target triple of a module.  The code is MODULE_CODE_TRIPLE,
+and the values of the record are the ascii codes for the characters in the
+string.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection"> <a name="UNABBREV_RECORD">UNABBREV_RECORD
+Encoding</a></div>
+
+<div class="doc_text">
+
+<p><tt>[UNABBREV_RECORD, code<sub>vbr6</sub>, numops<sub>vbr6</sub>,
+       op0<sub>vbr6</sub>, op1<sub>vbr6</sub>, ...]</tt></p>
+
+<p>An UNABBREV_RECORD provides a default fallback encoding, which is both
+completely general and also extremely inefficient.  It can describe an arbitrary
+record, by emitting the code and operands as vbrs.</p>
+
+<p>For example, emitting an LLVM IR target triple as an unabbreviated record
+requires emitting the UNABBREV_RECORD abbrevid, a vbr6 for the
+MODULE_CODE_TRIPLE code, a vbr6 for the length of the string (which is equal to
+the number of operands), and a vbr6 for each character.  Since there are no
+letters with value less than 32, each letter would need to be emitted as at
+least a two-part VBR, which means that each letter would require at least 12
+bits.  This is not an efficient encoding, but it is fully general.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection"> <a name="abbrev_records">Abbreviated Record
+Encoding</a></div>
+
+<div class="doc_text">
+
+<p><tt>[<abbrevid>, fields...]</tt></p>
+
+<p>An abbreviated record is a abbreviation id followed by a set of fields that
+are encoded according to the <a href="#abbreviations">abbreviation 
+definition</a>.  This allows records to be encoded significantly more densely
+than records encoded with the <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a>
+type, and allows the abbreviation types to be specified in the stream itself,
+which allows the files to be completely self describing.  The actual encoding
+of abbreviations is defined below.
+</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection"><a name="abbreviations">Abbreviations</a>
+</div>
 
+<div class="doc_text">
 <p>
-blah
+Abbreviations are an important form of compression for bitstreams.  The idea is
+to specify a dense encoding for a class of records once, then use that encoding
+to emit many records.  It takes space to emit the encoding into the file, but
+the space is recouped (hopefully plus some) when the records that use it are
+emitted.
 </p>
 
+<p>
+Abbreviations can be determined dynamically per client, per file.  Since the
+abbreviations are stored in the bitstream itself, different streams of the same
+format can contain different sets of abbreviations if the specific stream does
+not need it.  As a concrete example, LLVM IR files usually emit an abbreviation
+for binary operators.  If a specific LLVM module contained no or few binary
+operators, the abbreviation does not need to be emitted.
+</p>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection"><a name="DEFINE_ABBREV">DEFINE_ABBREV
+ Encoding</a></div>
+
+<div class="doc_text">
+
+<p><tt>[DEFINE_ABBREV, numabbrevops<sub>vbr5</sub>, abbrevop0, abbrevop1,
+ ...]</tt></p>
+
+<p>An abbreviation definition consists of the DEFINE_ABBREV abbrevid followed
+by a VBR that specifies the number of abbrev operands, then the abbrev
+operands themselves.  Abbreviation operands come in three forms.  They all start
+with a single bit that indicates whether the abbrev operand is a literal operand
+(when the bit is 1) or an encoding operand (when the bit is 0).</p>
+
+<ol>
+<li>Literal operands - <tt>[1<sub>1</sub>, litvalue<sub>vbr8</sub>]</tt> -
+Literal operands specify that the value in the result
+is always a single specific value.  This specific value is emitted as a vbr8
+after the bit indicating that it is a literal operand.</li>
+<li>Encoding info without data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>]</tt>
+ - blah
+</li>
+<li>Encoding info with data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>, 
+value<sub>vbr5</sub>]</tt> -
+
+</li>
+</ol>
+
 </div>
 
 
@@ -330,7 +433,7 @@
  src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
  <a href="mailto:sabre at nondot.org">Chris Lattner</a><br>
 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
-Last modified: $Date: 2007/05/12 05:37:42 $
+Last modified: $Date: 2007/05/12 07:49:15 $
 </address>
 </body>
 </html>






More information about the llvm-commits mailing list