[Mlir-commits] [mlir] f3acb54 - [mlir] Add initial support for a binary serialization format

Mon Aug 22 00:47:03 PDT 2022

Author: River Riddle
Date: 2022-08-22T00:36:26-07:00
New Revision: f3acb54c1b7b7721cea5c07f64194151dddd3c4b

URL: https://github.com/llvm/llvm-project/commit/f3acb54c1b7b7721cea5c07f64194151dddd3c4b
DIFF: https://github.com/llvm/llvm-project/commit/f3acb54c1b7b7721cea5c07f64194151dddd3c4b.diff

LOG: [mlir] Add initial support for a binary serialization format

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Differential Revision: https://reviews.llvm.org/D131747

Added: 
    mlir/docs/BytecodeFormat.md
    mlir/include/mlir/Bytecode/BytecodeReader.h
    mlir/include/mlir/Bytecode/BytecodeWriter.h
    mlir/lib/Bytecode/CMakeLists.txt
    mlir/lib/Bytecode/Encoding.h
    mlir/lib/Bytecode/Reader/BytecodeReader.cpp
    mlir/lib/Bytecode/Reader/CMakeLists.txt
    mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
    mlir/lib/Bytecode/Writer/CMakeLists.txt
    mlir/lib/Bytecode/Writer/IRNumbering.cpp
    mlir/lib/Bytecode/Writer/IRNumbering.h
    mlir/test/Bytecode/general.mlir
    mlir/test/Bytecode/invalid/invalid-attr_type_offset_section-large_offset.mlirbc
    mlir/test/Bytecode/invalid/invalid-attr_type_offset_section-trailing_data.mlirbc
    mlir/test/Bytecode/invalid/invalid-attr_type_section-index.mlirbc
    mlir/test/Bytecode/invalid/invalid-attr_type_section-trailing_data.mlirbc
    mlir/test/Bytecode/invalid/invalid-dialect_section-dialect_string.mlirbc
    mlir/test/Bytecode/invalid/invalid-dialect_section-opname_dialect.mlirbc
    mlir/test/Bytecode/invalid/invalid-dialect_section-opname_string.mlirbc
    mlir/test/Bytecode/invalid/invalid-dialect_section.mlir
    mlir/test/Bytecode/invalid/invalid-ir_section-attr.mlirbc
    mlir/test/Bytecode/invalid/invalid-ir_section-forwardref.mlirbc
    mlir/test/Bytecode/invalid/invalid-ir_section-loc.mlirbc
    mlir/test/Bytecode/invalid/invalid-ir_section-operands.mlirbc
    mlir/test/Bytecode/invalid/invalid-ir_section-opname.mlirbc
    mlir/test/Bytecode/invalid/invalid-ir_section-results.mlirbc
    mlir/test/Bytecode/invalid/invalid-ir_section-successors.mlirbc
    mlir/test/Bytecode/invalid/invalid-ir_section.mlir
    mlir/test/Bytecode/invalid/invalid-string_section-count.mlirbc
    mlir/test/Bytecode/invalid/invalid-string_section-large_string.mlirbc
    mlir/test/Bytecode/invalid/invalid-string_section-no_string.mlirbc
    mlir/test/Bytecode/invalid/invalid-string_section-trailing_data.mlirbc
    mlir/test/Bytecode/invalid/invalid-string_section.mlir
    mlir/test/Bytecode/invalid/invalid-structure-producer.mlirbc
    mlir/test/Bytecode/invalid/invalid-structure-section-duplicate.mlirbc
    mlir/test/Bytecode/invalid/invalid-structure-section-id-unknown.mlirbc
    mlir/test/Bytecode/invalid/invalid-structure-section-length.mlirbc
    mlir/test/Bytecode/invalid/invalid-structure-section-missing.mlirbc
    mlir/test/Bytecode/invalid/invalid-structure-version.mlirbc
    mlir/test/Bytecode/invalid/invalid-structure.mlir
    mlir/test/Bytecode/invalid/invalid_attr_type_offset_section.mlir
    mlir/test/Bytecode/invalid/invalid_attr_type_section.mlir

Modified: 
    mlir/include/mlir/IR/OperationSupport.h
    mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h
    mlir/lib/CMakeLists.txt
    mlir/lib/Parser/CMakeLists.txt
    mlir/lib/Parser/Parser.cpp
    mlir/lib/Tools/mlir-opt/CMakeLists.txt
    mlir/lib/Tools/mlir-opt/MlirOptMain.cpp

Removed: 
    


################################################################################
diff  --git a/mlir/docs/BytecodeFormat.md b/mlir/docs/BytecodeFormat.md
new file mode 100644
index 0000000000000..acb1819c9932c

--- /dev/null
+++ b/mlir/docs/BytecodeFormat.md
@@ -0,0 +1,314 @@
+# MLIR Bytecode Format
+
+This documents describes the MLIR bytecode format and its encoding.
+
+[TOC]
+
+## Magic Number
+
+MLIR uses the following four-byte magic number to indicate bytecode files:
+
+'\[â€˜Mâ€™<sub>8</sub>, â€˜Lâ€™<sub>8</sub>, â€˜Ã¯â€™<sub>8</sub>, â€˜Râ€™<sub>8</sub>\]'
+
+In hex:
+
+'\[â€˜4Dâ€™<sub>8</sub>, â€˜4Câ€™<sub>8</sub>, â€˜EFâ€™<sub>8</sub>, â€˜52â€™<sub>8</sub>\]'
+
+## Format Overview
+
+An MLIR Bytecode file is comprised of a byte stream, with a few simple
+structural concepts layered on top.
+
+### Primitives
+
+#### Fixed-Width Integers
+
+```
+  byte                  ::= `0x00`...`0xFF`
+```
+
+Fixed width integers are unsigned integers of a known byte size. The values are
+stored in little-endian byte order.
+
+TODO: Add larger fixed width integers as necessary.
+
+#### Variable-Width Integers
+
+Variable width integers, or `VarInt`s, provide a compact representation for
+integers. Each encoded VarInt consists of one to nine bytes, which together
+represent a single 64-bit value. The MLIR bytecode utilizes the "PrefixVarInt"
+encoding for VarInts. This encoding is a variant of the
+[LEB128 ("Little-Endian Base 128")](https://en.wikipedia.org/wiki/LEB128)
+encoding, where each byte of the encoding provides up to 7 bits for the value,
+with the remaining bit used to store a tag indicating the number of bytes used
+for the encoding. This means that small unsigned integers (less than 2^7) may be
+stored in one byte, unsigned integers up to 2^14 may be stored in two bytes,
+etc.
+
+The first byte of the encoding includes a length prefix in the low bits. This
+prefix is a bit sequence of '0's followed by a terminal '1', or the end of the
+byte. The number of '0' bits indicate the number of _additional_ bytes, not
+including the prefix byte, used to encode the value. All of the remaining bits
+in the first byte, along with all of the bits in the additional bytes, provide
+the value of the integer. Below are the various possible encodings of the prefix
+byte:
+
+```
+xxxxxxx1:  7 value bits, the encoding uses 1 byte
+xxxxxx10: 14 value bits, the encoding uses 2 bytes
+xxxxx100: 21 value bits, the encoding uses 3 bytes
+xxxx1000: 28 value bits, the encoding uses 4 bytes
+xxx10000: 35 value bits, the encoding uses 5 bytes
+xx100000: 42 value bits, the encoding uses 6 bytes
+x1000000: 49 value bits, the encoding uses 7 bytes
+10000000: 56 value bits, the encoding uses 8 bytes
+00000000: 64 value bits, the encoding uses 9 bytes
+```
+
+#### Strings
+
+Strings are blobs of characters with an associated length.
+
+### Sections
+
+```
+section {
+  id: byte
+  length: varint
+}
+```
+
+Sections are a mechanism for grouping data within the bytecode. The enable
+delayed processing, which is useful for out-of-order processing of data,
+lazy-loading, and more. Each section contains a Section ID and a length (which
+allowing for skipping over the section).
+
+TODO: Sections should also carry an optional alignment. Add this when necessary.
+
+## MLIR Encoding
+
+Given the generic structure of MLIR, the bytecode encoding is actually fairly
+simplistic. It effectively maps to the core components of MLIR.
+
+### Top Level Structure
+
+The top-level structure of the bytecode contains the 4-byte "magic number", a
+version number, a null-terminated producer string, and a list of sections. Each
+section is currently only expected to appear once within a bytecode file.
+
+```
+bytecode {
+  magic: "MLÃ¯R",
+  version: varint,
+  producer: string,
+  sections: section[]
+}
+```
+
+### String Section
+
+```
+strings {
+  numStrings: varint,
+  reverseStringLengths: varint[],
+  stringData: byte[]
+}
+```
+
+The string section contains a table of strings referenced within the bytecode,
+more easily enabling string sharing. This section is encoded first with the
+total number of strings, followed by the sizes of each of the individual strings
+in reverse order. The remaining encoding contains a single blob containing all
+of the strings concatenated together.
+
+### Dialect Section
+
+The dialect section of the bytecode contains all of the dialects referenced
+within the encoded IR, and some information about the components of those
+dialects that were also referenced.
+
+```
+dialect_section {
+  numDialects: varint,
+  dialectNames: varint[],
+  opNames: op_name_group[]
+}
+
+op_name_group {
+  dialect: varint,
+  numOpNames: varint,
+  opNames: varint[]
+}
+```
+
+Dialects are encoded as indexes to the name string within the string section.
+Operation names are encoded in groups by dialect, with each group containing the
+dialect, the number of operation names, and the array of indexes to each name
+within the string section.
+
+### Attribute/Type Sections
+
+Attributes and types are encoded using two [sections](#sections), one section
+(`attr_type_section`) containing the actual encoded representation, and another
+section (`attr_type_offset_section`) containing the offsets of each encoded
+attribute/type into the previous section. This structure allows for attributes
+and types to always be lazily loaded on demand.
+
+```
+attr_type_section {
+  attrs: attribute[],
+  types: type[]  
+}
+attr_type_offset_section {
+  numAttrs: varint,
+  numTypes: varint,
+  offsets: attr_type_offset_group[]
+}
+
+attr_type_offset_group {
+  dialect: varint,
+  numElements: varint,
+  offsets: varint[] // (offset << 1) | (hasCustomEncoding) 
+}
+
+attribute {
+  encoding: ...
+}
+type {
+  encoding: ...
+}
+```
+
+Each `offset` in the `attr_type_offset_section` above is the size of the
+encoding for the attribute or type and a flag indicating if the encoding uses
+the textual assembly format, or a custom bytecode encoding. We avoid using the
+direct offset into the `attr_type_section`, as a smaller relative offsets
+provides more effective compression. Attributes and types are grouped by
+dialect, with each `attr_type_offset_group` in the offset section containing the
+corresponding parent dialect, number of elements, and offsets for each element
+within the group.
+
+#### Attribute/Type Encodings
+
+In the abstract, an attribute/type is encoded in one of two possible ways: via
+its assembly format, or via a custom dialect defined encoding.
+
+##### Assembly Format Fallback
+
+In the case where a dialect does not define a method for encoding the attribute
+or type, the textual assembly format of that attribute or type is used as a
+fallback. For example, a type of `!bytecode.type` would be encoded as the null
+terminated string "!bytecode.type". This ensures that every attribute and type
+may be encoded, even if the owning dialect has not yet opted in to a more
+efficient serialization.
+
+TODO: We shouldn't redundantly encode the dialect name here, we should use a
+reference to the parent dialect instead.
+
+##### Dialect Defined Encoding
+
+TODO: This is not yet supported.
+
+### IR Section
+
+The IR section contains the encoded form of operations within the bytecode.
+
+#### Operation Encoding
+
+```
+op {
+  name: varint,
+  encodingMask: byte,
+  location: varint,
+
+  attrDict: varint?,
+
+  numResults: varint?,
+  resultTypes: varint[],
+
+  numOperands: varint?,
+  operands: varint[],
+
+  numSuccessors: varint?,
+  successors: varint[],
+
+  regionEncoding: varint?, // (numRegions << 1) | (isIsolatedFromAbove) 
+  regions: region[]
+}
+```
+
+The encoding of an operation is important because this is generally the most
+commonly appearing structure in the bytecode. A single encoding is used for
+every type of operation. Given this prevelance, many of the fields of an
+operation are optional. The `encodingMask` field is a bitmask which indicates
+which of the components of the operation are present.
+
+##### Location
+
+The location is encoded as the index of the location within the attribute table.
+
+##### Attributes
+
+If the operation has attribues, the index of the operation attribute dictionary
+within the attribute table is encoded.
+
+##### Results
+
+If the operation has results, the number of results and the indexes of the
+result types within the type table are encoded.
+
+##### Operands
+
+If the operation has operands, the number of operands and the value index of
+each operand is encoded. This value index is the relative ordering of the
+definition of that value from the start of the first ancestor isolated region.
+
+##### Successors
+
+If the operation has successors, the number of successors and the indexes of the
+successor blocks within the parent region are encoded.
+
+##### Regions
+
+If the operation has regions, the number of regions and if the regions are
+isolated from above are encoded together in a single varint. Afterwards, each
+region is encoded inline.
+
+#### Region Encoding
+
+```
+region {
+  numBlocks: varint,
+
+  numValues: varint?,
+  blocks: block[]
+}
+```
+
+A region is encoded first with the number of blocks within. If the region is
+non-empty, the number of values defined directly within the region are encoded,
+followed by the blocks of the region.
+
+#### Block Encoding
+
+```
+block {
+  encoding: varint, // (numOps << 1) | (hasBlockArgs)
+  arguments: block_arguments?, // Optional based on encoding
+  ops : op[]
+}
+
+block_arguments {
+  numArgs: varint?,
+  args: block_argument[]
+}
+
+block_argument {
+  typeIndex: varint,
+  location: varint
+}
+```
+
+A block is encoded with an array of operations and block arguments. The first
+field is an encoding that combines the number of operations in the block, with a
+flag indicating if the block has arguments.

diff  --git a/mlir/include/mlir/Bytecode/BytecodeReader.h b/mlir/include/mlir/Bytecode/BytecodeReader.h
new file mode 100644
index 0000000000000..68e1d1a0e53f4
--- /dev/null
+++ b/mlir/include/mlir/Bytecode/BytecodeReader.h
@@ -0,0 +1,34 @@
+//===- BytecodeReader.h - MLIR Bytecode Reader ------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This header defines interfaces to read MLIR bytecode files/streams.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef MLIR_BYTECODE_BYTECODEREADER_H
+#define MLIR_BYTECODE_BYTECODEREADER_H
+
+#include "mlir/IR/AsmState.h"
+#include "mlir/Support/LLVM.h"
+
+namespace llvm {
+class MemoryBufferRef;
+} // namespace llvm
+
+namespace mlir {
+/// Returns true if the given buffer starts with the magic bytes that signal
+/// MLIR bytecode.
+bool isBytecode(llvm::MemoryBufferRef buffer);
+
+/// Read the operations defined within the given memory buffer, containing MLIR
+/// bytecode, into the provided block.
+LogicalResult readBytecodeFile(llvm::MemoryBufferRef buffer, Block *block,
+                               const ParserConfig &config);
+} // namespace mlir
+
+#endif // MLIR_BYTECODE_BYTECODEREADER_H

diff  --git a/mlir/include/mlir/Bytecode/BytecodeWriter.h b/mlir/include/mlir/Bytecode/BytecodeWriter.h
new file mode 100644
index 0000000000000..6b86e011ffdaa
--- /dev/null
+++ b/mlir/include/mlir/Bytecode/BytecodeWriter.h
@@ -0,0 +1,36 @@
+//===- BytecodeWriter.h - MLIR Bytecode Writer ------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This header defines interfaces to write MLIR bytecode files/streams.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef MLIR_BYTECODE_BYTECODEWRITER_H
+#define MLIR_BYTECODE_BYTECODEWRITER_H
+
+#include "mlir/Support/LLVM.h"
+#include "llvm/ADT/StringRef.h"
+
+namespace mlir {
+class Operation;
+
+//===----------------------------------------------------------------------===//
+// Entry Points
+//===----------------------------------------------------------------------===//
+
+/// Write the bytecode for the given operation to the provided output stream.
+/// For streams where it matters, the given stream should be in "binary" mode.
+/// `producer` is an optional string that can be used to identify the producer
+/// of the bytecode when reading. It has no functional effect on the bytecode
+/// serialization.
+void writeBytecodeToFile(Operation *op, raw_ostream &os,
+                         StringRef producer = "MLIR" LLVM_VERSION_STRING);
+
+} // namespace mlir
+
+#endif // MLIR_BYTECODE_BYTECODEWRITER_H

diff  --git a/mlir/include/mlir/IR/OperationSupport.h b/mlir/include/mlir/IR/OperationSupport.h
index b42ef52cd3144..3ce7ff37c8252 100644
--- a/mlir/include/mlir/IR/OperationSupport.h
+++ b/mlir/include/mlir/IR/OperationSupport.h
@@ -642,11 +642,11 @@ struct OperationState {
   OperationState(Location location, OperationName name);
 
   OperationState(Location location, OperationName name, ValueRange operands,
-                 TypeRange types, ArrayRef<NamedAttribute> attributes,
+                 TypeRange types, ArrayRef<NamedAttribute> attributes = {},
                  BlockRange successors = {},
                  MutableArrayRef<std::unique_ptr<Region>> regions = {});
   OperationState(Location location, StringRef name, ValueRange operands,
-                 TypeRange types, ArrayRef<NamedAttribute> attributes,
+                 TypeRange types, ArrayRef<NamedAttribute> attributes = {},
                  BlockRange successors = {},
                  MutableArrayRef<std::unique_ptr<Region>> regions = {});
 

diff  --git a/mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h b/mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h
index 7446455503ce1..e32c7e786d3b0 100644
--- a/mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h
+++ b/mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h
@@ -50,13 +50,15 @@ using PassPipelineFn = llvm::function_ref<LogicalResult(PassManager &pm)>;
 /// - preloadDialectsInContext will trigger the upfront loading of all
 ///   dialects from the global registry in the MLIRContext. This option is
 ///   deprecated and will be removed soon.
+/// - emitBytecode will generate bytecode output instead of text.
 LogicalResult MlirOptMain(llvm::raw_ostream &outputStream,
                           std::unique_ptr<llvm::MemoryBuffer> buffer,
                           const PassPipelineCLParser &passPipeline,
                           DialectRegistry &registry, bool splitInputFile,
                           bool verifyDiagnostics, bool verifyPasses,
                           bool allowUnregisteredDialects,
-                          bool preloadDialectsInContext = false);
+                          bool preloadDialectsInContext = false,
+                          bool emitBytecode = false);
 
 /// Support a callback to setup the pass manager.
 /// - passManagerSetupFn is the callback invoked to setup the pass manager to
@@ -67,7 +69,8 @@ LogicalResult MlirOptMain(llvm::raw_ostream &outputStream,
                           DialectRegistry &registry, bool splitInputFile,
                           bool verifyDiagnostics, bool verifyPasses,
                           bool allowUnregisteredDialects,
-                          bool preloadDialectsInContext = false);
+                          bool preloadDialectsInContext = false,
+                          bool emitBytecode = false);
 
 /// Implementation for tools like `mlir-opt`.
 /// - toolName is used for the header displayed by `--help`.

diff  --git a/mlir/lib/Bytecode/CMakeLists.txt b/mlir/lib/Bytecode/CMakeLists.txt
new file mode 100644
index 0000000000000..ff7e290cad1bb
--- /dev/null
+++ b/mlir/lib/Bytecode/CMakeLists.txt
@@ -0,0 +1,2 @@
+add_subdirectory(Reader)
+add_subdirectory(Writer)

diff  --git a/mlir/lib/Bytecode/Encoding.h b/mlir/lib/Bytecode/Encoding.h
new file mode 100644
index 0000000000000..3b8de6557c1fd
--- /dev/null
+++ b/mlir/lib/Bytecode/Encoding.h
@@ -0,0 +1,81 @@
+//===- Encoding.h - MLIR binary format encoding information -----*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This header defines enum values describing the structure of MLIR bytecode
+// files.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LIB_MLIR_BYTECODE_ENCODING_H
+#define LIB_MLIR_BYTECODE_ENCODING_H
+
+#include <cstdint>
+
+namespace mlir {
+namespace bytecode {
+//===----------------------------------------------------------------------===//
+// General constants
+//===----------------------------------------------------------------------===//
+
+enum {
+  /// The current bytecode version.
+  kVersion = 0,
+};
+
+//===----------------------------------------------------------------------===//
+// Sections
+//===----------------------------------------------------------------------===//
+
+namespace Section {
+enum ID : uint8_t {
+  /// This section contains strings referenced within the bytecode.
+  kString = 0,
+
+  /// This section contains the dialects referenced within an IR module.
+  kDialect = 1,
+
+  /// This section contains the attributes and types referenced within an IR
+  /// module.
+  kAttrType = 2,
+
+  /// This section contains the offsets for the attribute and types within the
+  /// AttrType section.
+  kAttrTypeOffset = 3,
+
+  /// This section contains the list of operations serialized into the bytecode,
+  /// and their nested regions/operations.
+  kIR = 4,
+
+  /// The total number of section types.
+  kNumSections = 5,
+};
+} // namespace Section
+
+//===----------------------------------------------------------------------===//
+// IR Section
+//===----------------------------------------------------------------------===//
+
+/// This enum represents a mask of all of the potential components of an
+/// operation. This mask is used when encoding an operation to indicate which
+/// components are present in the bytecode.
+namespace OpEncodingMask {
+enum : uint8_t {
+  // clang-format off
+  kHasAttrs         = 0b00000001,
+  kHasResults       = 0b00000010,
+  kHasOperands      = 0b00000100,
+  kHasSuccessors    = 0b00001000,
+  kHasInlineRegions = 0b00010000,
+  // clang-format on
+};
+} // namespace OpEncodingMask
+
+} // namespace bytecode
+} // namespace mlir
+
+#endif

diff  --git a/mlir/lib/Bytecode/Reader/BytecodeReader.cpp b/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
new file mode 100644
index 0000000000000..3168ce6f9c2f0
--- /dev/null
+++ b/mlir/lib/Bytecode/Reader/BytecodeReader.cpp
@@ -0,0 +1,1222 @@
+//===- BytecodeReader.cpp - MLIR Bytecode Reader --------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// TODO: Support for big-endian architectures.
+// TODO: Properly preserve use lists of values.
+
+#include "mlir/Bytecode/BytecodeReader.h"
+#include "../Encoding.h"
+#include "mlir/AsmParser/AsmParser.h"
+#include "mlir/IR/BuiltinDialect.h"
+#include "mlir/IR/BuiltinOps.h"
+#include "mlir/IR/OpImplementation.h"
+#include "mlir/IR/Verifier.h"
+#include "llvm/ADT/MapVector.h"
+#include "llvm/ADT/ScopeExit.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/Support/MemoryBufferRef.h"
+#include "llvm/Support/SaveAndRestore.h"
+
+#define DEBUG_TYPE "mlir-bytecode-reader"
+
+using namespace mlir;
+
+/// Stringify the given section ID.
+static std::string toString(bytecode::Section::ID sectionID) {
+  switch (sectionID) {
+  case bytecode::Section::kString:
+    return "String (0)";
+  case bytecode::Section::kDialect:
+    return "Dialect (1)";
+  case bytecode::Section::kAttrType:
+    return "AttrType (2)";
+  case bytecode::Section::kAttrTypeOffset:
+    return "AttrTypeOffset (3)";
+  case bytecode::Section::kIR:
+    return "IR (4)";
+  default:
+    return ("Unknown (" + Twine(sectionID) + ")").str();
+  }
+}
+
+//===----------------------------------------------------------------------===//
+// EncodingReader
+//===----------------------------------------------------------------------===//
+
+namespace {
+class EncodingReader {
+public:
+  explicit EncodingReader(ArrayRef<uint8_t> contents, Location fileLoc)
+      : dataIt(contents.data()), dataEnd(contents.end()), fileLoc(fileLoc) {}
+  explicit EncodingReader(StringRef contents, Location fileLoc)
+      : EncodingReader({reinterpret_cast<const uint8_t *>(contents.data()),
+                        contents.size()},
+                       fileLoc) {}
+
+  /// Returns true if the entire section has been read.
+  bool empty() const { return dataIt == dataEnd; }
+
+  /// Returns the remaining size of the bytecode.
+  size_t size() const { return dataEnd - dataIt; }
+
+  /// Emit an error using the given arguments.
+  template <typename... Args>
+  LogicalResult emitError(Args &&...args) const {
+    return ::emitError(fileLoc).append(std::forward<Args>(args)...);
+  }
+
+  /// Parse a single byte from the stream.
+  template <typename T>
+  LogicalResult parseByte(T &value) {
+    if (empty())
+      return emitError("attempting to parse a byte at the end of the bytecode");
+    value = static_cast<T>(*dataIt++);
+    return success();
+  }
+  /// Parse a range of bytes of 'length' into the given result.
+  LogicalResult parseBytes(size_t length, ArrayRef<uint8_t> &result) {
+    if (length > size()) {
+      return emitError("attempting to parse ", length, " bytes when only ",
+                       size(), " remain");
+    }
+    result = {dataIt, length};
+    dataIt += length;
+    return success();
+  }
+  /// Parse a range of bytes of 'length' into the given result, which can be
+  /// assumed to be large enough to hold `length`.
+  LogicalResult parseBytes(size_t length, uint8_t *result) {
+    if (length > size()) {
+      return emitError("attempting to parse ", length, " bytes when only ",
+                       size(), " remain");
+    }
+    memcpy(result, dataIt, length);
+    dataIt += length;
+    return success();
+  }
+
+  /// Parse a variable length encoded integer from the byte stream. The first
+  /// encoded byte contains a prefix in the low bits indicating the encoded
+  /// length of the value. This length prefix is a bit sequence of '0's followed
+  /// by a '1'. The number of '0' bits indicate the number of _additional_ bytes
+  /// (not including the prefix byte). All remaining bits in the first byte,
+  /// along with all of the bits in additional bytes, provide the value of the
+  /// integer encoded in little-endian order.
+  LogicalResult parseVarInt(uint64_t &result) {
+    // Parse the first byte of the encoding, which contains the length prefix.
+    if (failed(parseByte(result)))
+      return failure();
+
+    // Handle the overwhelmingly common case where the value is stored in a
+    // single byte. In this case, the first bit is the `1` marker bit.
+    if (LLVM_LIKELY(result & 1)) {
+      result >>= 1;
+      return success();
+    }
+
+    // Handle the overwhelming uncommon case where the value required all 8
+    // bytes (i.e. a really really big number). In this case, the marker byte is
+    // all zeros: `00000000`.
+    if (LLVM_UNLIKELY(result == 0))
+      return parseBytes(sizeof(result), reinterpret_cast<uint8_t *>(&result));
+    return parseMultiByteVarInt(result);
+  }
+
+  /// Parse a variable length encoded integer whose low bit is used to encode an
+  /// unrelated flag, i.e: `(integerValue << 1) | (flag ? 1 : 0)`.
+  LogicalResult parseVarIntWithFlag(uint64_t &result, bool &flag) {
+    if (failed(parseVarInt(result)))
+      return failure();
+    flag = result & 1;
+    result >>= 1;
+    return success();
+  }
+
+  /// Skip the first `length` bytes within the reader.
+  LogicalResult skipBytes(size_t length) {
+    if (length > size()) {
+      return emitError("attempting to skip ", length, " bytes when only ",
+                       size(), " remain");
+    }
+    dataIt += length;
+    return success();
+  }
+
+  /// Parse a null-terminated string into `result` (without including the NUL
+  /// terminator).
+  LogicalResult parseNullTerminatedString(StringRef &result) {
+    const char *startIt = (const char *)dataIt;
+    const char *nulIt = (const char *)memchr(startIt, 0, size());
+    if (!nulIt)
+      return emitError(
+          "malformed null-terminated string, no null character found");
+
+    result = StringRef(startIt, nulIt - startIt);
+    dataIt = (const uint8_t *)nulIt + 1;
+    return success();
+  }
+
+  /// Parse a section header, placing the kind of section in `sectionID` and the
+  /// contents of the section in `sectionData`.
+  LogicalResult parseSection(bytecode::Section::ID &sectionID,
+                             ArrayRef<uint8_t> &sectionData) {
+    size_t length;
+    if (failed(parseByte(sectionID)) || failed(parseVarInt(length)))
+      return failure();
+    if (sectionID >= bytecode::Section::kNumSections)
+      return emitError("invalid section ID: ", unsigned(sectionID));
+
+    // Parse the actua section data now that we have its length.
+    return parseBytes(length, sectionData);
+  }
+
+private:
+  /// Parse a variable length encoded integer from the byte stream. This method
+  /// is a fallback when the number of bytes used to encode the value is greater
+  /// than 1, but less than the max (9). The provided `result` value can be
+  /// assumed to already contain the first byte of the value.
+  /// NOTE: This method is marked noinline to avoid pessimizing the common case
+  /// of single byte encoding.
+  LLVM_ATTRIBUTE_NOINLINE LogicalResult parseMultiByteVarInt(uint64_t &result) {
+    // Count the number of trailing zeros in the marker byte, this indicates the
+    // number of trailing bytes that are part of the value. We use `uint32_t`
+    // here because we only care about the first byte, and so that be actually
+    // get ctz intrinsic calls when possible (the `uint8_t` overload uses a loop
+    // implementation).
+    uint32_t numBytes =
+        llvm::countTrailingZeros<uint32_t>(result, llvm::ZB_Undefined);
+    assert(numBytes > 0 && numBytes <= 7 &&
+           "unexpected number of trailing zeros in varint encoding");
+
+    // Parse in the remaining bytes of the value.
+    if (failed(parseBytes(numBytes, reinterpret_cast<uint8_t *>(&result) + 1)))
+      return failure();
+
+    // Shift out the low-order bits that were used to mark how the value was
+    // encoded.
+    result >>= (numBytes + 1);
+    return success();
+  }
+
+  /// The current data iterator, and an iterator to the end of the buffer.
+  const uint8_t *dataIt, *dataEnd;
+
+  /// A location for the bytecode used to report errors.
+  Location fileLoc;
+};
+} // namespace
+
+/// Resolve an index into the given entry list. `entry` may either be a
+/// reference, in which case it is assigned to the corresponding value in
+/// `entries`, or a pointer, in which case it is assigned to the address of the
+/// element in `entries`.
+template <typename RangeT, typename T>
+static LogicalResult resolveEntry(EncodingReader &reader, RangeT &entries,
+                                  uint64_t index, T &entry,
+                                  StringRef entryStr) {
+  if (index >= entries.size())
+    return reader.emitError("invalid ", entryStr, " index: ", index);
+
+  // If the provided entry is a pointer, resolve to the address of the entry.
+  if constexpr (std::is_convertible_v<llvm::detail::ValueOfRange<RangeT>, T>)
+    entry = entries[index];
+  else
+    entry = &entries[index];
+  return success();
+}
+
+/// Parse and resolve an index into the given entry list.
+template <typename RangeT, typename T>
+static LogicalResult parseEntry(EncodingReader &reader, RangeT &entries,
+                                T &entry, StringRef entryStr) {
+  uint64_t entryIdx;
+  if (failed(reader.parseVarInt(entryIdx)))
+    return failure();
+  return resolveEntry(reader, entries, entryIdx, entry, entryStr);
+}
+
+//===----------------------------------------------------------------------===//
+// BytecodeDialect
+//===----------------------------------------------------------------------===//
+
+namespace {
+/// This struct represents a dialect entry within the bytecode.
+struct BytecodeDialect {
+  /// Load the dialect into the provided context if it hasn't been loaded yet.
+  /// Returns failure if the dialect couldn't be loaded *and* the provided
+  /// context does not allow unregistered dialects. The provided reader is used
+  /// for error emission if necessary.
+  LogicalResult load(EncodingReader &reader, MLIRContext *ctx) {
+    if (dialect)
+      return success();
+    Dialect *loadedDialect = ctx->getOrLoadDialect(name);
+    if (!loadedDialect && !ctx->allowsUnregisteredDialects()) {
+      return reader.emitError(
+          "dialect '", name,
+          "' is unknown. If this is intended, please call "
+          "allowUnregisteredDialects() on the MLIRContext, or use "
+          "-allow-unregistered-dialect with the MLIR tool used.");
+    }
+    dialect = loadedDialect;
+    return success();
+  }
+
+  /// The loaded dialect entry. This field is None if we haven't attempted to
+  /// load, nullptr if we failed to load, otherwise the loaded dialect.
+  Optional<Dialect *> dialect;
+
+  /// The name of the dialect.
+  StringRef name;
+};
+
+/// This struct represents an operation name entry within the bytecode.
+struct BytecodeOperationName {
+  BytecodeOperationName(BytecodeDialect *dialect, StringRef name)
+      : dialect(dialect), name(name) {}
+
+  /// The loaded operation name, or None if it hasn't been processed yet.
+  Optional<OperationName> opName;
+
+  /// The dialect that owns this operation name.
+  BytecodeDialect *dialect;
+
+  /// The name of the operation, without the dialect prefix.
+  StringRef name;
+};
+} // namespace
+
+/// Parse a single dialect group encoded in the byte stream.
+static LogicalResult parseDialectGrouping(
+    EncodingReader &reader, MutableArrayRef<BytecodeDialect> dialects,
+    function_ref<LogicalResult(BytecodeDialect *)> entryCallback) {
+  // Parse the dialect and the number of entries in the group.
+  BytecodeDialect *dialect;
+  if (failed(parseEntry(reader, dialects, dialect, "dialect")))
+    return failure();
+  uint64_t numEntries;
+  if (failed(reader.parseVarInt(numEntries)))
+    return failure();
+
+  for (uint64_t i = 0; i < numEntries; ++i)
+    if (failed(entryCallback(dialect)))
+      return failure();
+  return success();
+}
+
+//===----------------------------------------------------------------------===//
+// Attribute/Type Reader
+//===----------------------------------------------------------------------===//
+
+namespace {
+/// This class provides support for reading attribute and type entries from the
+/// bytecode. Attribute and Type entries are read lazily on demand, so we use
+/// this reader to manage when to actually parse them from the bytecode.
+class AttrTypeReader {
+  /// This class represents a single attribute or type entry.
+  template <typename T>
+  struct Entry {
+    /// The entry, or null if it hasn't been resolved yet.
+    T entry = {};
+    /// The parent dialect of this entry.
+    BytecodeDialect *dialect = nullptr;
+    /// A flag indicating if the entry was encoded using a custom encoding,
+    /// instead of using the textual assembly format.
+    bool hasCustomEncoding = false;
+    /// The raw data of this entry in the bytecode.
+    ArrayRef<uint8_t> data;
+  };
+  using AttrEntry = Entry<Attribute>;
+  using TypeEntry = Entry<Type>;
+
+public:
+  AttrTypeReader(Location fileLoc) : fileLoc(fileLoc) {}
+
+  /// Initialize the attribute and type information within the reader.
+  LogicalResult initialize(MutableArrayRef<BytecodeDialect> dialects,
+                           ArrayRef<uint8_t> sectionData,
+                           ArrayRef<uint8_t> offsetSectionData);
+
+  /// Resolve the attribute or type at the given index. Returns nullptr on
+  /// failure.
+  Attribute resolveAttribute(size_t index) {
+    return resolveEntry(attributes, index, "Attribute");
+  }
+  Type resolveType(size_t index) { return resolveEntry(types, index, "Type"); }
+
+private:
+  /// Resolve the given entry at `index`.
+  template <typename T>
+  T resolveEntry(SmallVectorImpl<Entry<T>> &entries, size_t index,
+                 StringRef entryType);
+
+  /// Parse the value defined within the given reader. `code` indicates how the
+  /// entry was encoded.
+  LogicalResult parseEntry(EncodingReader &reader, bool hasCustomEncoding,
+                           Attribute &result);
+  LogicalResult parseEntry(EncodingReader &reader, bool hasCustomEncoding,
+                           Type &result);
+
+  /// The set of attribute and type entries.
+  SmallVector<AttrEntry> attributes;
+  SmallVector<TypeEntry> types;
+
+  /// A location used for error emission.
+  Location fileLoc;
+};
+} // namespace
+
+LogicalResult
+AttrTypeReader::initialize(MutableArrayRef<BytecodeDialect> dialects,
+                           ArrayRef<uint8_t> sectionData,
+                           ArrayRef<uint8_t> offsetSectionData) {
+  EncodingReader offsetReader(offsetSectionData, fileLoc);
+
+  // Parse the number of attribute and type entries.
+  uint64_t numAttributes, numTypes;
+  if (failed(offsetReader.parseVarInt(numAttributes)) ||
+      failed(offsetReader.parseVarInt(numTypes)))
+    return failure();
+  attributes.resize(numAttributes);
+  types.resize(numTypes);
+
+  // A functor used to accumulate the offsets for the entries in the given
+  // range.
+  uint64_t currentOffset = 0;
+  auto parseEntries = [&](auto &&range) {
+    size_t currentIndex = 0, endIndex = range.size();
+
+    // Parse an individual entry.
+    auto parseEntryFn = [&](BytecodeDialect *dialect) {
+      auto &entry = range[currentIndex++];
+
+      uint64_t entrySize;
+      if (failed(offsetReader.parseVarIntWithFlag(entrySize,
+                                                  entry.hasCustomEncoding)))
+        return failure();
+
+      // Verify that the offset is actually valid.
+      if (currentOffset + entrySize > sectionData.size()) {
+        return offsetReader.emitError(
+            "Attribute or Type entry offset points past the end of section");
+      }
+
+      entry.data = sectionData.slice(currentOffset, entrySize);
+      entry.dialect = dialect;
+      currentOffset += entrySize;
+      return success();
+    };
+    while (currentIndex != endIndex)
+      if (failed(parseDialectGrouping(offsetReader, dialects, parseEntryFn)))
+        return failure();
+    return success();
+  };
+
+  // Process each of the attributes, and then the types.
+  if (failed(parseEntries(attributes)) || failed(parseEntries(types)))
+    return failure();
+
+  // Ensure that we read everything from the section.
+  if (!offsetReader.empty()) {
+    return offsetReader.emitError(
+        "unexpected trailing data in the Attribute/Type offset section");
+  }
+  return success();
+}
+
+template <typename T>
+T AttrTypeReader::resolveEntry(SmallVectorImpl<Entry<T>> &entries, size_t index,
+                               StringRef entryType) {
+  if (index >= entries.size()) {
+    emitError(fileLoc) << "invalid " << entryType << " index: " << index;
+    return {};
+  }
+
+  // If the entry has already been resolved, there is nothing left to do.
+  Entry<T> &entry = entries[index];
+  if (entry.entry)
+    return entry.entry;
+
+  // Parse the entry.
+  EncodingReader reader(entry.data, fileLoc);
+  if (failed(parseEntry(reader, entry.hasCustomEncoding, entry.entry)))
+    return T();
+  if (!reader.empty()) {
+    (void)reader.emitError("unexpected trailing bytes after " + entryType +
+                           " entry");
+    return T();
+  }
+  return entry.entry;
+}
+
+LogicalResult AttrTypeReader::parseEntry(EncodingReader &reader,
+                                         bool hasCustomEncoding,
+                                         Attribute &result) {
+  // Handle the fallback case, where the attribute was encoded using its
+  // assembly format.
+  if (!hasCustomEncoding) {
+    StringRef attrStr;
+    if (failed(reader.parseNullTerminatedString(attrStr)))
+      return failure();
+
+    size_t numRead = 0;
+    if (!(result = parseAttribute(attrStr, fileLoc->getContext(), numRead)))
+      return failure();
+    if (numRead != attrStr.size()) {
+      return reader.emitError(
+          "trailing characters found after Attribute assembly format: ",
+          attrStr.drop_front(numRead));
+    }
+    return success();
+  }
+
+  return reader.emitError("unexpected Attribute encoding");
+}
+
+LogicalResult AttrTypeReader::parseEntry(EncodingReader &reader,
+                                         bool hasCustomEncoding, Type &result) {
+  // Handle the fallback case, where the type was encoded using its
+  // assembly format.
+  if (!hasCustomEncoding) {
+    StringRef typeStr;
+    if (failed(reader.parseNullTerminatedString(typeStr)))
+      return failure();
+
+    size_t numRead = 0;
+    if (!(result = parseType(typeStr, fileLoc->getContext(), numRead)))
+      return failure();
+    if (numRead != typeStr.size()) {
+      return reader.emitError(
+          "trailing characters found after Type assembly format: " +
+          typeStr.drop_front(numRead));
+    }
+    return success();
+  }
+
+  return reader.emitError("unexpected Type encoding");
+}
+
+//===----------------------------------------------------------------------===//
+// Bytecode Reader
+//===----------------------------------------------------------------------===//
+
+namespace {
+/// This class is used to read a bytecode buffer and translate it into MLIR.
+class BytecodeReader {
+public:
+  BytecodeReader(Location fileLoc, const ParserConfig &config)
+      : config(config), fileLoc(fileLoc), attrTypeReader(fileLoc),
+        // Use the builtin unrealized conversion cast operation to represent
+        // forward references to values that aren't yet defined.
+        forwardRefOpState(UnknownLoc::get(config.getContext()),
+                          "builtin.unrealized_conversion_cast", ValueRange(),
+                          NoneType::get(config.getContext())) {}
+
+  /// Read the bytecode defined within `buffer` into the given block.
+  LogicalResult read(llvm::MemoryBufferRef buffer, Block *block);
+
+private:
+  /// Return the context for this config.
+  MLIRContext *getContext() const { return config.getContext(); }
+
+  /// Parse the bytecode version.
+  LogicalResult parseVersion(EncodingReader &reader);
+
+  //===--------------------------------------------------------------------===//
+  // Dialect Section
+
+  LogicalResult parseDialectSection(ArrayRef<uint8_t> sectionData);
+
+  /// Parse an operation name reference using the given reader.
+  FailureOr<OperationName> parseOpName(EncodingReader &reader);
+
+  //===--------------------------------------------------------------------===//
+  // Attribute/Type Section
+
+  /// Parse an attribute or type using the given reader. Returns nullptr in the
+  /// case of failure.
+  Attribute parseAttribute(EncodingReader &reader);
+  Type parseType(EncodingReader &reader);
+
+  template <typename T>
+  T parseAttribute(EncodingReader &reader) {
+    if (Attribute attr = parseAttribute(reader)) {
+      if (auto derivedAttr = attr.dyn_cast<T>())
+        return derivedAttr;
+      (void)reader.emitError("expected attribute of type: ",
+                             llvm::getTypeName<T>(), ", but got: ", attr);
+    }
+    return T();
+  }
+
+  //===--------------------------------------------------------------------===//
+  // IR Section
+
+  /// This struct represents the current read state of a range of regions. This
+  /// struct is used to enable iterative parsing of regions.
+  struct RegionReadState {
+    RegionReadState(Operation *op, bool isIsolatedFromAbove)
+        : RegionReadState(op->getRegions(), isIsolatedFromAbove) {}
+    RegionReadState(MutableArrayRef<Region> regions, bool isIsolatedFromAbove)
+        : curRegion(regions.begin()), endRegion(regions.end()),
+          isIsolatedFromAbove(isIsolatedFromAbove) {}
+
+    /// The current regions being read.
+    MutableArrayRef<Region>::iterator curRegion, endRegion;
+
+    /// The number of values defined immediately within this region.
+    unsigned numValues = 0;
+
+    /// The current blocks of the region being read.
+    SmallVector<Block *> curBlocks;
+    Region::iterator curBlock = {};
+
+    /// The number of operations remaining to be read from the current block
+    /// being read.
+    uint64_t numOpsRemaining = 0;
+
+    /// A flag indicating if the regions being read are isolated from above.
+    bool isIsolatedFromAbove = false;
+  };
+
+  LogicalResult parseIRSection(ArrayRef<uint8_t> sectionData, Block *block);
+  LogicalResult parseRegions(EncodingReader &reader,
+                             std::vector<RegionReadState> &regionStack,
+                             RegionReadState &readState);
+  FailureOr<Operation *> parseOpWithoutRegions(EncodingReader &reader,
+                                               RegionReadState &readState,
+                                               bool &isIsolatedFromAbove);
+
+  LogicalResult parseRegion(EncodingReader &reader, RegionReadState &readState);
+  LogicalResult parseBlock(EncodingReader &reader, RegionReadState &readState);
+  LogicalResult parseBlockArguments(EncodingReader &reader, Block *block);
+
+  //===--------------------------------------------------------------------===//
+  // String Section
+
+  LogicalResult parseStringSection(ArrayRef<uint8_t> sectionData);
+
+  /// Parse a shared string from the string section. The shared string is
+  /// encoded using an index to a corresponding string in the string section.
+  LogicalResult parseSharedString(EncodingReader &reader, StringRef &result) {
+    return parseEntry(reader, strings, result, "string");
+  }
+
+  //===--------------------------------------------------------------------===//
+  // Value Processing
+
+  /// Parse an operand reference using the given reader. Returns nullptr in the
+  /// case of failure.
+  Value parseOperand(EncodingReader &reader);
+
+  /// Sequentially define the given value range.
+  LogicalResult defineValues(EncodingReader &reader, ValueRange values);
+
+  /// Create a value to use for a forward reference.
+  Value createForwardRef();
+
+  //===--------------------------------------------------------------------===//
+  // Fields
+
+  /// This class represents a single value scope, in which a value scope is
+  /// delimited by isolated from above regions.
+  struct ValueScope {
+    /// Push a new region state onto this scope, reserving enough values for
+    /// those defined within the current region of the provided state.
+    void push(RegionReadState &readState) {
+      nextValueIDs.push_back(values.size());
+      values.resize(values.size() + readState.numValues);
+    }
+
+    /// Pop the values defined for the current region within the provided region
+    /// state.
+    void pop(RegionReadState &readState) {
+      values.resize(values.size() - readState.numValues);
+      nextValueIDs.pop_back();
+    }
+
+    /// The set of values defined in this scope.
+    std::vector<Value> values;
+
+    /// The ID for the next defined value for each region current being
+    /// processed in this scope.
+    SmallVector<unsigned, 4> nextValueIDs;
+  };
+
+  /// The configuration of the parser.
+  const ParserConfig &config;
+
+  /// A location to use when emitting errors.
+  Location fileLoc;
+
+  /// The reader used to process attribute and types within the bytecode.
+  AttrTypeReader attrTypeReader;
+
+  /// The version of the bytecode being read.
+  uint64_t version = 0;
+
+  /// The producer of the bytecode being read.
+  StringRef producer;
+
+  /// The table of IR units referenced within the bytecode file.
+  SmallVector<BytecodeDialect> dialects;
+  SmallVector<BytecodeOperationName> opNames;
+
+  /// The table of strings referenced within the bytecode file.
+  SmallVector<StringRef> strings;
+
+  /// The current set of available IR value scopes.
+  std::vector<ValueScope> valueScopes;
+  /// A block containing the set of operations defined to create forward
+  /// references.
+  Block forwardRefOps;
+  /// A block containing previously created, and no longer used, forward
+  /// reference operations.
+  Block openForwardRefOps;
+  /// An operation state used when instantiating forward references.
+  OperationState forwardRefOpState;
+};
+} // namespace
+
+LogicalResult BytecodeReader::read(llvm::MemoryBufferRef buffer, Block *block) {
+  EncodingReader reader(buffer.getBuffer(), fileLoc);
+
+  // Skip over the bytecode header, this should have already been checked.
+  if (failed(reader.skipBytes(StringRef("ML\xefR").size())))
+    return failure();
+  // Parse the bytecode version and producer.
+  if (failed(parseVersion(reader)) ||
+      failed(reader.parseNullTerminatedString(producer)))
+    return failure();
+
+  // Add a diagnostic handler that attaches a note that includes the original
+  // producer of the bytecode.
+  ScopedDiagnosticHandler diagHandler(getContext(), [&](Diagnostic &diag) {
+    diag.attachNote() << "in bytecode version " << version
+                      << " produced by: " << producer;
+    return failure();
+  });
+
+  // Parse the raw data for each of the top-level sections of the bytecode.
+  Optional<ArrayRef<uint8_t>> sectionDatas[bytecode::Section::kNumSections];
+  while (!reader.empty()) {
+    // Read the next section from the bytecode.
+    bytecode::Section::ID sectionID;
+    ArrayRef<uint8_t> sectionData;
+    if (failed(reader.parseSection(sectionID, sectionData)))
+      return failure();
+
+    // Check for duplicate sections, we only expect one instance of each.
+    if (sectionDatas[sectionID]) {
+      return reader.emitError("duplicate top-level section: ",
+                              toString(sectionID));
+    }
+    sectionDatas[sectionID] = sectionData;
+  }
+  // Check that all of the sections were found.
+  for (int i = 0; i < bytecode::Section::kNumSections; ++i) {
+    if (!sectionDatas[i]) {
+      return reader.emitError("missing data for top-level section: ",
+                              toString(bytecode::Section::ID(i)));
+    }
+  }
+
+  // Process the string section first.
+  if (failed(parseStringSection(*sectionDatas[bytecode::Section::kString])))
+    return failure();
+
+  // Process the dialect section.
+  if (failed(parseDialectSection(*sectionDatas[bytecode::Section::kDialect])))
+    return failure();
+
+  // Process the attribute and type section.
+  if (failed(attrTypeReader.initialize(
+          dialects, *sectionDatas[bytecode::Section::kAttrType],
+          *sectionDatas[bytecode::Section::kAttrTypeOffset])))
+    return failure();
+
+  // Finally, process the IR section.
+  return parseIRSection(*sectionDatas[bytecode::Section::kIR], block);
+}
+
+LogicalResult BytecodeReader::parseVersion(EncodingReader &reader) {
+  if (failed(reader.parseVarInt(version)))
+    return failure();
+
+  // Validate the bytecode version.
+  uint64_t currentVersion = bytecode::kVersion;
+  if (version < currentVersion) {
+    return reader.emitError("bytecode version ", version,
+                            " is older than the current version of ",
+                            currentVersion, ", and upgrade is not supported");
+  }
+  if (version > currentVersion) {
+    return reader.emitError("bytecode version ", version,
+                            " is newer than the current version ",
+                            currentVersion);
+  }
+  return success();
+}
+
+//===----------------------------------------------------------------------===//
+// Dialect Section
+
+LogicalResult
+BytecodeReader::parseDialectSection(ArrayRef<uint8_t> sectionData) {
+  EncodingReader sectionReader(sectionData, fileLoc);
+
+  // Parse the number of dialects in the section.
+  uint64_t numDialects;
+  if (failed(sectionReader.parseVarInt(numDialects)))
+    return failure();
+  dialects.resize(numDialects);
+
+  // Parse each of the dialects.
+  for (uint64_t i = 0; i < numDialects; ++i)
+    if (failed(parseSharedString(sectionReader, dialects[i].name)))
+      return failure();
+
+  // Parse the operation names, which are grouped by dialect.
+  auto parseOpName = [&](BytecodeDialect *dialect) {
+    StringRef opName;
+    if (failed(parseSharedString(sectionReader, opName)))
+      return failure();
+    opNames.emplace_back(dialect, opName);
+    return success();
+  };
+  while (!sectionReader.empty())
+    if (failed(parseDialectGrouping(sectionReader, dialects, parseOpName)))
+      return failure();
+  return success();
+}
+
+FailureOr<OperationName> BytecodeReader::parseOpName(EncodingReader &reader) {
+  BytecodeOperationName *opName = nullptr;
+  if (failed(parseEntry(reader, opNames, opName, "operation name")))
+    return failure();
+
+  // Check to see if this operation name has already been resolved. If we
+  // haven't, load the dialect and build the operation name.
+  if (!opName->opName) {
+    if (failed(opName->dialect->load(reader, getContext())))
+      return failure();
+    opName->opName.emplace((opName->dialect->name + "." + opName->name).str(),
+                           getContext());
+  }
+  return *opName->opName;
+}
+
+//===----------------------------------------------------------------------===//
+// Attribute/Type Section
+
+Attribute BytecodeReader::parseAttribute(EncodingReader &reader) {
+  uint64_t attrIdx;
+  if (failed(reader.parseVarInt(attrIdx)))
+    return Attribute();
+  return attrTypeReader.resolveAttribute(attrIdx);
+}
+
+Type BytecodeReader::parseType(EncodingReader &reader) {
+  uint64_t typeIdx;
+  if (failed(reader.parseVarInt(typeIdx)))
+    return Type();
+  return attrTypeReader.resolveType(typeIdx);
+}
+
+//===----------------------------------------------------------------------===//
+// IR Section
+
+LogicalResult BytecodeReader::parseIRSection(ArrayRef<uint8_t> sectionData,
+                                             Block *block) {
+  EncodingReader reader(sectionData, fileLoc);
+
+  // A stack of operation regions currently being read from the bytecode.
+  std::vector<RegionReadState> regionStack;
+
+  // Parse the top-level block using a temporary module operation.
+  OwningOpRef<ModuleOp> moduleOp = ModuleOp::create(fileLoc);
+  regionStack.emplace_back(*moduleOp, /*isIsolatedFromAbove=*/true);
+  regionStack.back().curBlocks.push_back(moduleOp->getBody());
+  regionStack.back().curBlock = regionStack.back().curRegion->begin();
+  if (failed(parseBlock(reader, regionStack.back())))
+    return failure();
+  valueScopes.emplace_back(ValueScope());
+  valueScopes.back().push(regionStack.back());
+
+  // Iteratively parse regions until everything has been resolved.
+  while (!regionStack.empty())
+    if (failed(parseRegions(reader, regionStack, regionStack.back())))
+      return failure();
+  if (!forwardRefOps.empty()) {
+    return reader.emitError(
+        "not all forward unresolved forward operand references");
+  }
+
+  // Verify that the parsed operations are valid.
+  if (failed(verify(*moduleOp)))
+    return failure();
+
+  // Splice the parsed operations over to the provided top-level block.
+  auto &parsedOps = moduleOp->getBody()->getOperations();
+  auto &destOps = block->getOperations();
+  destOps.splice(destOps.empty() ? destOps.end() : std::prev(destOps.end()),
+                 parsedOps, parsedOps.begin(), parsedOps.end());
+  return success();
+}
+
+LogicalResult
+BytecodeReader::parseRegions(EncodingReader &reader,
+                             std::vector<RegionReadState> &regionStack,
+                             RegionReadState &readState) {
+  // Read the regions of this operation.
+  for (; readState.curRegion != readState.endRegion; ++readState.curRegion) {
+    // If the current block hasn't been setup yet, parse the header for this
+    // region.
+    if (readState.curBlock == Region::iterator()) {
+      if (failed(parseRegion(reader, readState)))
+        return failure();
+
+      // If the region is empty, there is nothing to more to do.
+      if (readState.curRegion->empty())
+        continue;
+    }
+
+    // Parse the blocks within the region.
+    do {
+      while (readState.numOpsRemaining--) {
+        // Read in the next operation. We don't read its regions directly, we
+        // handle those afterwards as necessary.
+        bool isIsolatedFromAbove = false;
+        FailureOr<Operation *> op =
+            parseOpWithoutRegions(reader, readState, isIsolatedFromAbove);
+        if (failed(op))
+          return failure();
+
+        // If the op has regions, add it to the stack for processing.
+        if ((*op)->getNumRegions()) {
+          regionStack.emplace_back(*op, isIsolatedFromAbove);
+
+          // If the op is isolated from above, push a new value scope.
+          if (isIsolatedFromAbove)
+            valueScopes.emplace_back(ValueScope());
+          return success();
+        }
+      }
+
+      // Move to the next block of the region.
+      if (++readState.curBlock == readState.curRegion->end())
+        break;
+      if (failed(parseBlock(reader, readState)))
+        return failure();
+    } while (true);
+
+    // Reset the current block and any values reserved for this region.
+    readState.curBlock = {};
+    valueScopes.back().pop(readState);
+  }
+
+  // When the regions have been fully parsed, pop them off of the read stack. If
+  // the regions were isolated from above, we also pop the last value scope.
+  regionStack.pop_back();
+  if (readState.isIsolatedFromAbove)
+    valueScopes.pop_back();
+  return success();
+}
+
+FailureOr<Operation *>
+BytecodeReader::parseOpWithoutRegions(EncodingReader &reader,
+                                      RegionReadState &readState,
+                                      bool &isIsolatedFromAbove) {
+  // Parse the name of the operation.
+  FailureOr<OperationName> opName = parseOpName(reader);
+  if (failed(opName))
+    return failure();
+
+  // Parse the operation mask, which indicates which components of the operation
+  // are present.
+  uint8_t opMask;
+  if (failed(reader.parseByte(opMask)))
+    return failure();
+
+  /// Parse the location.
+  LocationAttr opLoc = parseAttribute<LocationAttr>(reader);
+  if (!opLoc)
+    return failure();
+
+  // With the location and name resolved, we can start building the operation
+  // state.
+  OperationState opState(opLoc, *opName);
+
+  // Parse the attributes of the operation.
+  if (opMask & bytecode::OpEncodingMask::kHasAttrs) {
+    DictionaryAttr dictAttr = parseAttribute<DictionaryAttr>(reader);
+    if (!dictAttr)
+      return failure();
+    opState.attributes = dictAttr;
+  }
+
+  /// Parse the results of the operation.
+  if (opMask & bytecode::OpEncodingMask::kHasResults) {
+    uint64_t numResults;
+    if (failed(reader.parseVarInt(numResults)))
+      return failure();
+    opState.types.resize(numResults);
+    for (int i = 0, e = numResults; i < e; ++i)
+      if (!(opState.types[i] = parseType(reader)))
+        return failure();
+  }
+
+  /// Parse the operands of the operation.
+  if (opMask & bytecode::OpEncodingMask::kHasOperands) {
+    uint64_t numOperands;
+    if (failed(reader.parseVarInt(numOperands)))
+      return failure();
+    opState.operands.resize(numOperands);
+    for (int i = 0, e = numOperands; i < e; ++i)
+      if (!(opState.operands[i] = parseOperand(reader)))
+        return failure();
+  }
+
+  /// Parse the successors of the operation.
+  if (opMask & bytecode::OpEncodingMask::kHasSuccessors) {
+    uint64_t numSuccs;
+    if (failed(reader.parseVarInt(numSuccs)))
+      return failure();
+    opState.successors.resize(numSuccs);
+    for (int i = 0, e = numSuccs; i < e; ++i) {
+      if (failed(parseEntry(reader, readState.curBlocks, opState.successors[i],
+                            "successor")))
+        return failure();
+    }
+  }
+
+  /// Parse the regions of the operation.
+  if (opMask & bytecode::OpEncodingMask::kHasInlineRegions) {
+    uint64_t numRegions;
+    if (failed(reader.parseVarIntWithFlag(numRegions, isIsolatedFromAbove)))
+      return failure();
+
+    opState.regions.reserve(numRegions);
+    for (int i = 0, e = numRegions; i < e; ++i)
+      opState.regions.push_back(std::make_unique<Region>());
+  }
+
+  // Create the operation at the back of the current block.
+  Operation *op = Operation::create(opState);
+  readState.curBlock->push_back(op);
+
+  // If the operation had results, update the value references.
+  if (op->getNumResults() && failed(defineValues(reader, op->getResults())))
+    return failure();
+
+  return op;
+}
+
+LogicalResult BytecodeReader::parseRegion(EncodingReader &reader,
+                                          RegionReadState &readState) {
+  // Parse the number of blocks in the region.
+  uint64_t numBlocks;
+  if (failed(reader.parseVarInt(numBlocks)))
+    return failure();
+
+  // If the region is empty, there is nothing else to do.
+  if (numBlocks == 0)
+    return success();
+
+  // Parse the number of values defined in this region.
+  uint64_t numValues;
+  if (failed(reader.parseVarInt(numValues)))
+    return failure();
+  readState.numValues = numValues;
+
+  // Create the blocks within this region. We do this before processing so that
+  // we can rely on the blocks existing when creating operations.
+  readState.curBlocks.clear();
+  readState.curBlocks.reserve(numBlocks);
+  for (uint64_t i = 0; i < numBlocks; ++i) {
+    readState.curBlocks.push_back(new Block());
+    readState.curRegion->push_back(readState.curBlocks.back());
+  }
+
+  // Prepare the current value scope for this region.
+  valueScopes.back().push(readState);
+
+  // Parse the entry block of the region.
+  readState.curBlock = readState.curRegion->begin();
+  return parseBlock(reader, readState);
+}
+
+LogicalResult BytecodeReader::parseBlock(EncodingReader &reader,
+                                         RegionReadState &readState) {
+  bool hasArgs;
+  if (failed(reader.parseVarIntWithFlag(readState.numOpsRemaining, hasArgs)))
+    return failure();
+
+  // Parse the arguments of the block.
+  if (hasArgs && failed(parseBlockArguments(reader, &*readState.curBlock)))
+    return failure();
+
+  // We don't parse the operations of the block here, that's done elsewhere.
+  return success();
+}
+
+LogicalResult BytecodeReader::parseBlockArguments(EncodingReader &reader,
+                                                  Block *block) {
+  // Parse the value ID for the first argument, and the number of arguments.
+  uint64_t numArgs;
+  if (failed(reader.parseVarInt(numArgs)))
+    return failure();
+
+  SmallVector<Type> argTypes;
+  SmallVector<Location> argLocs;
+  argTypes.reserve(numArgs);
+  argLocs.reserve(numArgs);
+
+  while (numArgs--) {
+    Type argType = parseType(reader);
+    if (!argType)
+      return failure();
+    LocationAttr argLoc = parseAttribute<LocationAttr>(reader);
+    if (!argLoc)
+      return failure();
+
+    argTypes.push_back(argType);
+    argLocs.push_back(argLoc);
+  }
+  block->addArguments(argTypes, argLocs);
+  return defineValues(reader, block->getArguments());
+}
+
+//===----------------------------------------------------------------------===//
+// String Section
+
+LogicalResult
+BytecodeReader::parseStringSection(ArrayRef<uint8_t> sectionData) {
+  EncodingReader stringReader(sectionData, fileLoc);
+
+  // Parse the number of strings in the section.
+  uint64_t numStrings;
+  if (failed(stringReader.parseVarInt(numStrings)))
+    return failure();
+  strings.resize(numStrings);
+
+  // Parse each of the strings. The sizes of the strings are encoded in reverse
+  // order, so that's the order we populate the table.
+  size_t stringDataEndOffset = sectionData.size();
+  size_t totalStringDataSize = 0;
+  for (StringRef &string : llvm::reverse(strings)) {
+    uint64_t stringSize;
+    if (failed(stringReader.parseVarInt(stringSize)))
+      return failure();
+    if (stringDataEndOffset < stringSize) {
+      return stringReader.emitError(
+          "string size exceeds the available data size");
+    }
+
+    // Extract the string from the data, dropping the null character.
+    size_t stringOffset = stringDataEndOffset - stringSize;
+    string = StringRef(
+        reinterpret_cast<const char *>(sectionData.data() + stringOffset),
+        stringSize - 1);
+    stringDataEndOffset = stringOffset;
+
+    // Update the total string data size.
+    totalStringDataSize += stringSize;
+  }
+
+  // Check that the only remaining data was for the strings
+  if (stringReader.size() != totalStringDataSize) {
+    return stringReader.emitError("unexpected trailing data between the "
+                                  "offsets for strings and their data");
+  }
+  return success();
+}
+
+//===----------------------------------------------------------------------===//
+// Value Processing
+
+Value BytecodeReader::parseOperand(EncodingReader &reader) {
+  std::vector<Value> &values = valueScopes.back().values;
+  Value *value = nullptr;
+  if (failed(parseEntry(reader, values, value, "value")))
+    return Value();
+
+  // Create a new forward reference if necessary.
+  if (!*value)
+    *value = createForwardRef();
+  return *value;
+}
+
+LogicalResult BytecodeReader::defineValues(EncodingReader &reader,
+                                           ValueRange newValues) {
+  ValueScope &valueScope = valueScopes.back();
+  std::vector<Value> &values = valueScope.values;
+
+  unsigned &valueID = valueScope.nextValueIDs.back();
+  unsigned valueIDEnd = valueID + newValues.size();
+  if (valueIDEnd > values.size()) {
+    return reader.emitError(
+        "value index range was outside of the expected range for "
+        "the parent region, got [",
+        valueID, ", ", valueIDEnd, "), but the maximum index was ",
+        values.size() - 1);
+  }
+
+  // Assign the values and update any forward references.
+  for (unsigned i = 0, e = newValues.size(); i != e; ++i, ++valueID) {
+    Value newValue = newValues[i];
+
+    // Check to see if a definition for this value already exists.
+    if (Value oldValue = std::exchange(values[valueID], newValue)) {
+      Operation *forwardRefOp = oldValue.getDefiningOp();
+
+      // Assert that this is a forward reference operation. Given how we compute
+      // definition ids (incrementally as we parse), it shouldn't be possible
+      // for the value to be defined any other way.
+      assert(forwardRefOp && forwardRefOp->getBlock() == &forwardRefOps &&
+             "value index was already defined?");
+
+      oldValue.replaceAllUsesWith(newValue);
+      forwardRefOp->moveBefore(&openForwardRefOps, openForwardRefOps.end());
+    }
+  }
+  return success();
+}
+
+Value BytecodeReader::createForwardRef() {
+  // Check for an avaliable existing operation to use. Otherwise, create a new
+  // fake operation to use for the reference.
+  if (!openForwardRefOps.empty()) {
+    Operation *op = &openForwardRefOps.back();
+    op->moveBefore(&forwardRefOps, forwardRefOps.end());
+  } else {
+    forwardRefOps.push_back(Operation::create(forwardRefOpState));
+  }
+  return forwardRefOps.back().getResult(0);
+}
+
+//===----------------------------------------------------------------------===//
+// Entry Points
+//===----------------------------------------------------------------------===//
+
+bool mlir::isBytecode(llvm::MemoryBufferRef buffer) {
+  return buffer.getBuffer().startswith("ML\xefR");
+}
+
+LogicalResult mlir::readBytecodeFile(llvm::MemoryBufferRef buffer, Block *block,
+                                     const ParserConfig &config) {
+  Location sourceFileLoc =
+      FileLineColLoc::get(config.getContext(), buffer.getBufferIdentifier(),
+                          /*line=*/0, /*column=*/0);
+  if (!isBytecode(buffer)) {
+    return emitError(sourceFileLoc,
+                     "input buffer is not an MLIR bytecode file");
+  }
+
+  BytecodeReader reader(sourceFileLoc, config);
+  return reader.read(buffer, block);
+}

diff  --git a/mlir/lib/Bytecode/Reader/CMakeLists.txt b/mlir/lib/Bytecode/Reader/CMakeLists.txt
new file mode 100644
index 0000000000000..3386cee2acfa9
--- /dev/null
+++ b/mlir/lib/Bytecode/Reader/CMakeLists.txt
@@ -0,0 +1,11 @@
+add_mlir_library(MLIRBytecodeReader
+  BytecodeReader.cpp
+
+  ADDITIONAL_HEADER_DIRS
+  ${MLIR_MAIN_INCLUDE_DIR}/mlir/Bytecode
+
+  LINK_LIBS PUBLIC
+  MLIRAsmParser
+  MLIRIR
+  MLIRSupport
+  )

diff  --git a/mlir/lib/Bytecode/Writer/BytecodeWriter.cpp b/mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
new file mode 100644
index 0000000000000..8bf37c18030bf
--- /dev/null
+++ b/mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
@@ -0,0 +1,520 @@
+//===- BytecodeWriter.cpp - MLIR Bytecode Writer --------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Bytecode/BytecodeWriter.h"
+#include "../Encoding.h"
+#include "IRNumbering.h"
+#include "mlir/IR/BuiltinDialect.h"
+#include "mlir/IR/OpImplementation.h"
+#include "llvm/ADT/CachedHashString.h"
+#include "llvm/ADT/MapVector.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/Support/Debug.h"
+#include <random>
+
+#define DEBUG_TYPE "mlir-bytecode-writer"
+
+using namespace mlir;
+using namespace mlir::bytecode::detail;
+
+//===----------------------------------------------------------------------===//
+// EncodingEmitter
+//===----------------------------------------------------------------------===//
+
+namespace {
+/// This class functions as the underlying encoding emitter for the bytecode
+/// writer. This class is a bit 
diff erent compared to other types of encoders;
+/// it does not use a single buffer, but instead may contain several buffers
+/// (some owned by the writer, and some not) that get concatted during the final
+/// emission.
+class EncodingEmitter {
+public:
+  EncodingEmitter() = default;
+  EncodingEmitter(const EncodingEmitter &) = delete;
+  EncodingEmitter &operator=(const EncodingEmitter &) = delete;
+
+  /// Write the current contents to the provided stream.
+  void writeTo(raw_ostream &os) const;
+
+  /// Return the current size of the encoded buffer.
+  size_t size() const { return prevResultSize + currentResult.size(); }
+
+  //===--------------------------------------------------------------------===//
+  // Emission
+  //===--------------------------------------------------------------------===//
+
+  /// Backpatch a byte in the result buffer at the given offset.
+  void patchByte(uint64_t offset, uint8_t value) {
+    assert(offset < size() && offset >= prevResultSize &&
+           "cannot patch previously emitted data");
+    currentResult[offset - prevResultSize] = value;
+  }
+
+  //===--------------------------------------------------------------------===//
+  // Integer Emission
+
+  /// Emit a single byte.
+  template <typename T>
+  void emitByte(T byte) {
+    currentResult.push_back(static_cast<uint8_t>(byte));
+  }
+
+  /// Emit a range of bytes.
+  void emitBytes(ArrayRef<uint8_t> bytes) {
+    llvm::append_range(currentResult, bytes);
+  }
+
+  /// Emit a variable length integer. The first encoded byte contains a prefix
+  /// in the low bits indicating the encoded length of the value. This length
+  /// prefix is a bit sequence of '0's followed by a '1'. The number of '0' bits
+  /// indicate the number of _additional_ bytes (not including the prefix byte).
+  /// All remaining bits in the first byte, along with all of the bits in
+  /// additional bytes, provide the value of the integer encoded in
+  /// little-endian order.
+  void emitVarInt(uint64_t value) {
+    // In the most common case, the value can be represented in a single byte.
+    // Given how hot this case is, explicitly handle that here.
+    if ((value >> 7) == 0)
+      return emitByte((value << 1) | 0x1);
+    emitMultiByteVarInt(value);
+  }
+
+  /// Emit a variable length integer whose low bit is used to encode the
+  /// provided flag, i.e. encoded as: (value << 1) | (flag ? 1 : 0).
+  void emitVarIntWithFlag(uint64_t value, bool flag) {
+    emitVarInt((value << 1) | (flag ? 1 : 0));
+  }
+
+  //===--------------------------------------------------------------------===//
+  // String Emission
+
+  /// Emit the given string as a nul terminated string.
+  void emitNulTerminatedString(StringRef str) {
+    emitString(str);
+    emitByte(0);
+  }
+
+  /// Emit the given string without a nul terminator.
+  void emitString(StringRef str) {
+    emitBytes({reinterpret_cast<const uint8_t *>(str.data()), str.size()});
+  }
+
+  //===--------------------------------------------------------------------===//
+  // Section Emission
+
+  /// Emit a nested section of the given code, whose contents are encoded in the
+  /// provided emitter.
+  void emitSection(bytecode::Section::ID code, EncodingEmitter &&emitter) {
+    // Emit the section code and length.
+    emitByte(code);
+    emitVarInt(emitter.size());
+
+    // Push our current buffer and then merge the provided section body into
+    // ours.
+    appendResult(std::move(currentResult));
+    for (std::vector<uint8_t> &result : emitter.prevResultStorage)
+      appendResult(std::move(result));
+    appendResult(std::move(emitter.currentResult));
+  }
+
+private:
+  /// Emit the given value using a variable width encoding. This method is a
+  /// fallback when the number of bytes needed to encode the value is greater
+  /// than 1. We mark it noinline here so that the single byte hot path isn't
+  /// pessimized.
+  LLVM_ATTRIBUTE_NOINLINE void emitMultiByteVarInt(uint64_t value);
+
+  /// Append a new result buffer to the current contents.
+  void appendResult(std::vector<uint8_t> &&result) {
+    prevResultSize += result.size();
+    prevResultStorage.emplace_back(std::move(result));
+    prevResultList.emplace_back(prevResultStorage.back());
+  }
+
+  /// The result of the emitter currently being built. We refrain from building
+  /// a single buffer to simplify emitting sections, large data, and more. The
+  /// result is thus represented using multiple distinct buffers, some of which
+  /// we own (via prevResultStorage), and some of which are just pointers into
+  /// externally owned buffers.
+  std::vector<uint8_t> currentResult;
+  std::vector<ArrayRef<uint8_t>> prevResultList;
+  std::vector<std::vector<uint8_t>> prevResultStorage;
+
+  /// An up-to-date total size of all of the buffers within `prevResultList`.
+  /// This enables O(1) size checks of the current encoding.
+  size_t prevResultSize = 0;
+};
+
+/// A simple raw_ostream wrapper around a EncodingEmitter. This removes the need
+/// to go through an intermediate buffer when interacting with code that wants a
+/// raw_ostream.
+class raw_emitter_ostream : public raw_ostream {
+public:
+  explicit raw_emitter_ostream(EncodingEmitter &emitter) : emitter(emitter) {
+    SetUnbuffered();
+  }
+
+private:
+  void write_impl(const char *ptr, size_t size) override {
+    emitter.emitBytes({reinterpret_cast<const uint8_t *>(ptr), size});
+  }
+  uint64_t current_pos() const override { return emitter.size(); }
+
+  /// The section being emitted to.
+  EncodingEmitter &emitter;
+};
+} // namespace
+
+void EncodingEmitter::writeTo(raw_ostream &os) const {
+  for (auto &prevResult : prevResultList)
+    os.write((const char *)prevResult.data(), prevResult.size());
+  os.write((const char *)currentResult.data(), currentResult.size());
+}
+
+void EncodingEmitter::emitMultiByteVarInt(uint64_t value) {
+  // Compute the number of bytes needed to encode the value. Each byte can hold
+  // up to 7-bits of data. We only check up to the number of bits we can encode
+  // in the first byte (8).
+  uint64_t it = value >> 7;
+  for (size_t numBytes = 2; numBytes < 9; ++numBytes) {
+    if (LLVM_LIKELY(it >>= 7) == 0) {
+      uint64_t encodedValue = (value << 1) | 0x1;
+      encodedValue <<= (numBytes - 1);
+      emitBytes({reinterpret_cast<uint8_t *>(&encodedValue), numBytes});
+      return;
+    }
+  }
+
+  // If the value is too large to encode in a single byte, emit a special all
+  // zero marker byte and splat the value directly.
+  emitByte(0);
+  emitBytes({reinterpret_cast<uint8_t *>(&value), sizeof(value)});
+}
+
+//===----------------------------------------------------------------------===//
+// Bytecode Writer
+//===----------------------------------------------------------------------===//
+
+namespace {
+class BytecodeWriter {
+public:
+  BytecodeWriter(Operation *op) : numberingState(op) {}
+
+  /// Write the bytecode for the given root operation.
+  void write(Operation *rootOp, raw_ostream &os, StringRef producer);
+
+private:
+  //===--------------------------------------------------------------------===//
+  // Dialects
+
+  void writeDialectSection(EncodingEmitter &emitter);
+
+  //===--------------------------------------------------------------------===//
+  // Attributes and Types
+
+  void writeAttrTypeSection(EncodingEmitter &emitter);
+
+  //===--------------------------------------------------------------------===//
+  // Operations
+
+  void writeBlock(EncodingEmitter &emitter, Block *block);
+  void writeOp(EncodingEmitter &emitter, Operation *op);
+  void writeRegion(EncodingEmitter &emitter, Region *region);
+  void writeIRSection(EncodingEmitter &emitter, Operation *op);
+
+  //===--------------------------------------------------------------------===//
+  // Strings
+
+  void writeStringSection(EncodingEmitter &emitter);
+
+  /// Get the number for the given shared string, that is contained within the
+  /// string section.
+  size_t getSharedStringNumber(StringRef str);
+
+  //===--------------------------------------------------------------------===//
+  // Fields
+
+  /// The IR numbering state generated for the root operation.
+  IRNumberingState numberingState;
+
+  /// A set of strings referenced within the bytecode. The value of the map is
+  /// unused.
+  llvm::MapVector<llvm::CachedHashStringRef, size_t> strings;
+};
+} // namespace
+
+void BytecodeWriter::write(Operation *rootOp, raw_ostream &os,
+                           StringRef producer) {
+  EncodingEmitter emitter;
+
+  // Emit the bytecode file header. This is how we identify the output as a
+  // bytecode file.
+  emitter.emitString("ML\xefR");
+
+  // Emit the bytecode version.
+  emitter.emitVarInt(bytecode::kVersion);
+
+  // Emit the producer.
+  emitter.emitNulTerminatedString(producer);
+
+  // Emit the dialect section.
+  writeDialectSection(emitter);
+
+  // Emit the attributes and types section.
+  writeAttrTypeSection(emitter);
+
+  // Emit the IR section.
+  writeIRSection(emitter, rootOp);
+
+  // Emit the string section.
+  writeStringSection(emitter);
+
+  // Write the generated bytecode to the provided output stream.
+  emitter.writeTo(os);
+}
+
+//===----------------------------------------------------------------------===//
+// Dialects
+
+/// Write the given entries in contiguous groups with the same parent dialect.
+/// Each dialect sub-group is encoded with the parent dialect and number of
+/// elements, followed by the encoding for the entries. The given callback is
+/// invoked to encode each individual entry.
+template <typename EntriesT, typename EntryCallbackT>
+static void writeDialectGrouping(EncodingEmitter &emitter, EntriesT &&entries,
+                                 EntryCallbackT &&callback) {
+  for (auto it = entries.begin(), e = entries.end(); it != e;) {
+    auto groupStart = it++;
+
+    // Find the end of the group that shares the same parent dialect.
+    DialectNumbering *currentDialect = groupStart->dialect;
+    it = std::find_if(it, e, [&](const auto &entry) {
+      return entry.dialect != currentDialect;
+    });
+
+    // Emit the dialect and number of elements.
+    emitter.emitVarInt(currentDialect->number);
+    emitter.emitVarInt(std::distance(groupStart, it));
+
+    // Emit the entries within the group.
+    for (auto &entry : llvm::make_range(groupStart, it))
+      callback(entry);
+  }
+}
+
+void BytecodeWriter::writeDialectSection(EncodingEmitter &emitter) {
+  EncodingEmitter dialectEmitter;
+
+  // Emit the referenced dialects.
+  auto dialects = numberingState.getDialects();
+  dialectEmitter.emitVarInt(llvm::size(dialects));
+  for (DialectNumbering &dialect : dialects)
+    dialectEmitter.emitVarInt(getSharedStringNumber(dialect.name));
+
+  // Emit the referenced operation names grouped by dialect.
+  auto emitOpName = [&](OpNameNumbering &name) {
+    dialectEmitter.emitVarInt(getSharedStringNumber(name.name.stripDialect()));
+  };
+  writeDialectGrouping(dialectEmitter, numberingState.getOpNames(), emitOpName);
+
+  emitter.emitSection(bytecode::Section::kDialect, std::move(dialectEmitter));
+}
+
+//===----------------------------------------------------------------------===//
+// Attributes and Types
+
+void BytecodeWriter::writeAttrTypeSection(EncodingEmitter &emitter) {
+  EncodingEmitter attrTypeEmitter;
+  EncodingEmitter offsetEmitter;
+  offsetEmitter.emitVarInt(llvm::size(numberingState.getAttributes()));
+  offsetEmitter.emitVarInt(llvm::size(numberingState.getTypes()));
+
+  // A functor used to emit an attribute or type entry.
+  uint64_t prevOffset = 0;
+  auto emitAttrOrType = [&](auto &entry) {
+    // TODO: Allow dialects to provide more optimal implementations of attribute
+    // and type encodings.
+    bool hasCustomEncoding = false;
+
+    // Emit the entry using the textual format.
+    raw_emitter_ostream(attrTypeEmitter) << entry.getValue();
+    attrTypeEmitter.emitByte(0);
+
+    // Record the offset of this entry.
+    uint64_t curOffset = attrTypeEmitter.size();
+    offsetEmitter.emitVarIntWithFlag(curOffset - prevOffset, hasCustomEncoding);
+    prevOffset = curOffset;
+  };
+
+  // Emit the attribute and type entries for each dialect.
+  writeDialectGrouping(offsetEmitter, numberingState.getAttributes(),
+                       emitAttrOrType);
+  writeDialectGrouping(offsetEmitter, numberingState.getTypes(),
+                       emitAttrOrType);
+
+  // Emit the sections to the stream.
+  emitter.emitSection(bytecode::Section::kAttrTypeOffset,
+                      std::move(offsetEmitter));
+  emitter.emitSection(bytecode::Section::kAttrType, std::move(attrTypeEmitter));
+}
+
+//===----------------------------------------------------------------------===//
+// Operations
+
+void BytecodeWriter::writeBlock(EncodingEmitter &emitter, Block *block) {
+  ArrayRef<BlockArgument> args = block->getArguments();
+  bool hasArgs = !args.empty();
+
+  // Emit the number of operations in this block, and if it has arguments. We
+  // use the low bit of the operation count to indicate if the block has
+  // arguments.
+  unsigned numOps = numberingState.getOperationCount(block);
+  emitter.emitVarIntWithFlag(numOps, hasArgs);
+
+  // Emit the arguments of the block.
+  if (hasArgs) {
+    emitter.emitVarInt(args.size());
+    for (BlockArgument arg : args) {
+      emitter.emitVarInt(numberingState.getNumber(arg.getType()));
+      emitter.emitVarInt(numberingState.getNumber(arg.getLoc()));
+    }
+  }
+
+  // Emit the operations within the block.
+  for (Operation &op : *block)
+    writeOp(emitter, &op);
+}
+
+void BytecodeWriter::writeOp(EncodingEmitter &emitter, Operation *op) {
+  emitter.emitVarInt(numberingState.getNumber(op->getName()));
+
+  // Emit a mask for the operation components. We need to fill this in later
+  // (when we actually know what needs to be emitted), so emit a placeholder for
+  // now.
+  uint64_t maskOffset = emitter.size();
+  uint8_t opEncodingMask = 0;
+  emitter.emitByte(0);
+
+  // Emit the location for this operation.
+  emitter.emitVarInt(numberingState.getNumber(op->getLoc()));
+
+  // Emit the attributes of this operation.
+  DictionaryAttr attrs = op->getAttrDictionary();
+  if (!attrs.empty()) {
+    opEncodingMask |= bytecode::OpEncodingMask::kHasAttrs;
+    emitter.emitVarInt(numberingState.getNumber(op->getAttrDictionary()));
+  }
+
+  // Emit the result types of the operation.
+  if (unsigned numResults = op->getNumResults()) {
+    opEncodingMask |= bytecode::OpEncodingMask::kHasResults;
+    emitter.emitVarInt(numResults);
+    for (Type type : op->getResultTypes())
+      emitter.emitVarInt(numberingState.getNumber(type));
+  }
+
+  // Emit the operands of the operation.
+  if (unsigned numOperands = op->getNumOperands()) {
+    opEncodingMask |= bytecode::OpEncodingMask::kHasOperands;
+    emitter.emitVarInt(numOperands);
+    for (Value operand : op->getOperands())
+      emitter.emitVarInt(numberingState.getNumber(operand));
+  }
+
+  // Emit the successors of the operation.
+  if (unsigned numSuccessors = op->getNumSuccessors()) {
+    opEncodingMask |= bytecode::OpEncodingMask::kHasSuccessors;
+    emitter.emitVarInt(numSuccessors);
+    for (Block *successor : op->getSuccessors())
+      emitter.emitVarInt(numberingState.getNumber(successor));
+  }
+
+  // Check for regions.
+  unsigned numRegions = op->getNumRegions();
+  if (numRegions)
+    opEncodingMask |= bytecode::OpEncodingMask::kHasInlineRegions;
+
+  // Update the mask for the operation.
+  emitter.patchByte(maskOffset, opEncodingMask);
+
+  // With the mask emitted, we can now emit the regions of the operation. We do
+  // this after mask emission to avoid offset complications that may arise by
+  // emitting the regions first (e.g. if the regions are huge, backpatching the
+  // op encoding mask is more annoying).
+  if (numRegions) {
+    bool isIsolatedFromAbove = op->hasTrait<OpTrait::IsIsolatedFromAbove>();
+    emitter.emitVarIntWithFlag(numRegions, isIsolatedFromAbove);
+
+    for (Region &region : op->getRegions())
+      writeRegion(emitter, &region);
+  }
+}
+
+void BytecodeWriter::writeRegion(EncodingEmitter &emitter, Region *region) {
+  // If the region is empty, we only need to emit the number of blocks (which is
+  // zero).
+  if (region->empty())
+    return emitter.emitVarInt(/*numBlocks*/ 0);
+
+  // Emit the number of blocks and values within the region.
+  unsigned numBlocks, numValues;
+  std::tie(numBlocks, numValues) = numberingState.getBlockValueCount(region);
+  emitter.emitVarInt(numBlocks);
+  emitter.emitVarInt(numValues);
+
+  // Emit the blocks within the region.
+  for (Block &block : *region)
+    writeBlock(emitter, &block);
+}
+
+void BytecodeWriter::writeIRSection(EncodingEmitter &emitter, Operation *op) {
+  EncodingEmitter irEmitter;
+
+  // Write the IR section the same way as a block with no arguments. Note that
+  // the low-bit of the operation count for a block is used to indicate if the
+  // block has arguments, which in this case is always false.
+  irEmitter.emitVarIntWithFlag(/*numOps*/ 1, /*hasArgs*/ false);
+
+  // Emit the operations.
+  writeOp(irEmitter, op);
+
+  emitter.emitSection(bytecode::Section::kIR, std::move(irEmitter));
+}
+
+//===----------------------------------------------------------------------===//
+// Strings
+
+void BytecodeWriter::writeStringSection(EncodingEmitter &emitter) {
+  EncodingEmitter stringEmitter;
+  stringEmitter.emitVarInt(strings.size());
+
+  // Emit the sizes in reverse order, so that we don't need to backpatch an
+  // offset to the string data or have a separate section.
+  for (const auto &it : llvm::reverse(strings))
+    stringEmitter.emitVarInt(it.first.size() + 1);
+  // Emit the string data itself.
+  for (const auto &it : strings)
+    stringEmitter.emitNulTerminatedString(it.first.val());
+
+  emitter.emitSection(bytecode::Section::kString, std::move(stringEmitter));
+}
+
+size_t BytecodeWriter::getSharedStringNumber(StringRef str) {
+  auto it = strings.insert({llvm::CachedHashStringRef(str), strings.size()});
+  return it.first->second;
+}
+
+//===----------------------------------------------------------------------===//
+// Entry Points
+//===----------------------------------------------------------------------===//
+
+void mlir::writeBytecodeToFile(Operation *op, raw_ostream &os,
+                               StringRef producer) {
+  BytecodeWriter writer(op);
+  writer.write(op, os, producer);
+}

diff  --git a/mlir/lib/Bytecode/Writer/CMakeLists.txt b/mlir/lib/Bytecode/Writer/CMakeLists.txt
new file mode 100644
index 0000000000000..7d260568249a0
--- /dev/null
+++ b/mlir/lib/Bytecode/Writer/CMakeLists.txt
@@ -0,0 +1,11 @@
+add_mlir_library(MLIRBytecodeWriter
+  BytecodeWriter.cpp
+  IRNumbering.cpp
+
+  ADDITIONAL_HEADER_DIRS
+  ${MLIR_MAIN_INCLUDE_DIR}/mlir/Bytecode
+
+  LINK_LIBS PUBLIC
+  MLIRIR
+  MLIRSupport
+  )

diff  --git a/mlir/lib/Bytecode/Writer/IRNumbering.cpp b/mlir/lib/Bytecode/Writer/IRNumbering.cpp
new file mode 100644
index 0000000000000..4424dc70a73a7
--- /dev/null
+++ b/mlir/lib/Bytecode/Writer/IRNumbering.cpp
@@ -0,0 +1,251 @@
+//===- IRNumbering.cpp - MLIR Bytecode IR numbering -----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "IRNumbering.h"
+#include "mlir/Bytecode/BytecodeWriter.h"
+#include "mlir/IR/BuiltinTypes.h"
+#include "mlir/IR/OpDefinition.h"
+
+using namespace mlir;
+using namespace mlir::bytecode::detail;
+
+//===----------------------------------------------------------------------===//
+// IR Numbering
+//===----------------------------------------------------------------------===//
+
+/// Group and sort the elements of the given range by their parent dialect. This
+/// grouping is applied to sub-sections of the ranged defined by how many bytes
+/// it takes to encode a varint index to that sub-section.
+template <typename T>
+static void groupByDialectPerByte(T range) {
+  if (range.empty())
+    return;
+
+  // A functor used to sort by a given dialect, with a desired dialect to be
+  // ordered first (to better enable sharing of dialects across byte groups).
+  auto sortByDialect = [](unsigned dialectToOrderFirst, const auto &lhs,
+                          const auto &rhs) {
+    if (lhs->dialect->number == dialectToOrderFirst)
+      return rhs->dialect->number != dialectToOrderFirst;
+    return lhs->dialect->number < rhs->dialect->number;
+  };
+
+  unsigned dialectToOrderFirst = 0;
+  size_t elementsInByteGroup = 0;
+  auto iterRange = range;
+  for (unsigned i = 1; i < 9; ++i) {
+    // Update the number of elements in the current byte grouping. Reminder
+    // that varint encodes 7-bits per byte, so that's how we compute the
+    // number of elements in each byte grouping.
+    elementsInByteGroup = (1 << (7 * i)) - elementsInByteGroup;
+
+    // Slice out the sub-set of elements that are in the current byte grouping
+    // to be sorted.
+    auto byteSubRange = iterRange.take_front(elementsInByteGroup);
+    iterRange = iterRange.drop_front(byteSubRange.size());
+
+    // Sort the sub range for this byte.
+    llvm::stable_sort(byteSubRange, [&](const auto &lhs, const auto &rhs) {
+      return sortByDialect(dialectToOrderFirst, lhs, rhs);
+    });
+
+    // Update the dialect to order first to be the dialect at the end of the
+    // current grouping. This seeks to allow larger dialect groupings across
+    // byte boundaries.
+    dialectToOrderFirst = byteSubRange.back()->dialect->number;
+
+    // If the data range is now empty, we are done.
+    if (iterRange.empty())
+      break;
+  }
+
+  // Assign the entry numbers based on the sort order.
+  for (auto &entry : llvm::enumerate(range))
+    entry.value()->number = entry.index();
+}
+
+IRNumberingState::IRNumberingState(Operation *op) {
+  // Number the root operation.
+  number(*op);
+
+  // Push all of the regions of the root operation onto the worklist.
+  SmallVector<std::pair<Region *, unsigned>, 8> numberContext;
+  for (Region &region : op->getRegions())
+    numberContext.emplace_back(&region, nextValueID);
+
+  // Iteratively process each of the nested regions.
+  while (!numberContext.empty()) {
+    Region *region;
+    std::tie(region, nextValueID) = numberContext.pop_back_val();
+    number(*region);
+
+    // Traverse into nested regions.
+    for (Operation &op : region->getOps()) {
+      // Isolated regions don't share value numbers with their parent, so we can
+      // start numbering these regions at zero.
+      unsigned opFirstValueID =
+          op.hasTrait<OpTrait::IsIsolatedFromAbove>() ? 0 : nextValueID;
+      for (Region &region : op.getRegions())
+        numberContext.emplace_back(&region, opFirstValueID);
+    }
+  }
+
+  // Number each of the dialects. For now this is just in the order they were
+  // found, given that the number of dialects on average is small enough to fit
+  // within a singly byte (128). If we ever have real world use cases that have
+  // a huge number of dialects, this could be made more intelligent.
+  for (auto &it : llvm::enumerate(dialects))
+    it.value().second->number = it.index();
+
+  // Number each of the recorded components within each dialect.
+
+  // First sort by ref count so that the most referenced elements are first. We
+  // try to bias more heavily used elements to the front. This allows for more
+  // frequently referenced things to be encoded using smaller varints.
+  auto sortByRefCountFn = [](const auto &lhs, const auto &rhs) {
+    return lhs->refCount > rhs->refCount;
+  };
+  llvm::stable_sort(orderedAttrs, sortByRefCountFn);
+  llvm::stable_sort(orderedOpNames, sortByRefCountFn);
+  llvm::stable_sort(orderedTypes, sortByRefCountFn);
+
+  // After that, we apply a secondary ordering based on the parent dialect. This
+  // ordering is applied to sub-sections of the element list defined by how many
+  // bytes it takes to encode a varint index to that sub-section. This allows
+  // for more efficiently encoding components of the same dialect (e.g. we only
+  // have to encode the dialect reference once).
+  groupByDialectPerByte(llvm::makeMutableArrayRef(orderedAttrs));
+  groupByDialectPerByte(llvm::makeMutableArrayRef(orderedOpNames));
+  groupByDialectPerByte(llvm::makeMutableArrayRef(orderedTypes));
+}
+
+void IRNumberingState::number(Attribute attr) {
+  auto it = attrs.insert({attr, nullptr});
+  if (!it.second) {
+    ++it.first->second->refCount;
+    return;
+  }
+  auto *numbering = new (attrAllocator.Allocate()) AttributeNumbering(attr);
+  it.first->second = numbering;
+  orderedAttrs.push_back(numbering);
+
+  // Check for OpaqueAttr, which is a dialect-specific attribute that didn't
+  // have a registered dialect when it got created. We don't want to encode this
+  // as the builtin OpaqueAttr, we want to encode it as if the dialect was
+  // actually loaded.
+  if (OpaqueAttr opaqueAttr = attr.dyn_cast<OpaqueAttr>())
+    numbering->dialect = &numberDialect(opaqueAttr.getDialectNamespace());
+  else
+    numbering->dialect = &numberDialect(&attr.getDialect());
+}
+
+void IRNumberingState::number(Block &block) {
+  // Number the arguments of the block.
+  for (BlockArgument arg : block.getArguments()) {
+    valueIDs.try_emplace(arg, nextValueID++);
+    number(arg.getLoc());
+    number(arg.getType());
+  }
+
+  // Number the operations in this block.
+  unsigned &numOps = blockOperationCounts[&block];
+  for (Operation &op : block) {
+    number(op);
+    ++numOps;
+  }
+}
+
+auto IRNumberingState::numberDialect(Dialect *dialect) -> DialectNumbering & {
+  DialectNumbering *&numbering = registeredDialects[dialect];
+  if (!numbering) {
+    numbering = &numberDialect(dialect->getNamespace());
+    numbering->dialect = dialect;
+  }
+  return *numbering;
+}
+
+auto IRNumberingState::numberDialect(StringRef dialect) -> DialectNumbering & {
+  DialectNumbering *&numbering = dialects[dialect];
+  if (!numbering) {
+    numbering = new (dialectAllocator.Allocate())
+        DialectNumbering(dialect, dialects.size() - 1);
+  }
+  return *numbering;
+}
+
+void IRNumberingState::number(Region &region) {
+  if (region.empty())
+    return;
+  size_t firstValueID = nextValueID;
+
+  // Number the blocks within this region.
+  size_t blockCount = 0;
+  for (auto &it : llvm::enumerate(region)) {
+    blockIDs.try_emplace(&it.value(), it.index());
+    number(it.value());
+    ++blockCount;
+  }
+
+  // Remember the number of blocks and values in this region.
+  regionBlockValueCounts.try_emplace(&region, blockCount,
+                                     nextValueID - firstValueID);
+}
+
+void IRNumberingState::number(Operation &op) {
+  // Number the components of an operation that won't be numbered elsewhere
+  // (e.g. we don't number operands, regions, or successors here).
+  number(op.getName());
+  for (OpResult result : op.getResults()) {
+    valueIDs.try_emplace(result, nextValueID++);
+    number(result.getType());
+  }
+
+  // Only number the operation's dictionary if it isn't empty.
+  DictionaryAttr dictAttr = op.getAttrDictionary();
+  if (!dictAttr.empty())
+    number(dictAttr);
+
+  number(op.getLoc());
+}
+
+void IRNumberingState::number(OperationName opName) {
+  OpNameNumbering *&numbering = opNames[opName];
+  if (numbering) {
+    ++numbering->refCount;
+    return;
+  }
+  DialectNumbering *dialectNumber = nullptr;
+  if (Dialect *dialect = opName.getDialect())
+    dialectNumber = &numberDialect(dialect);
+  else
+    dialectNumber = &numberDialect(opName.getDialectNamespace());
+
+  numbering =
+      new (opNameAllocator.Allocate()) OpNameNumbering(dialectNumber, opName);
+  orderedOpNames.push_back(numbering);
+}
+
+void IRNumberingState::number(Type type) {
+  auto it = types.insert({type, nullptr});
+  if (!it.second) {
+    ++it.first->second->refCount;
+    return;
+  }
+  auto *numbering = new (typeAllocator.Allocate()) TypeNumbering(type);
+  it.first->second = numbering;
+  orderedTypes.push_back(numbering);
+
+  // Check for OpaqueType, which is a dialect-specific type that didn't have a
+  // registered dialect when it got created. We don't want to encode this as the
+  // builtin OpaqueType, we want to encode it as if the dialect was actually
+  // loaded.
+  if (OpaqueType opaqueType = type.dyn_cast<OpaqueType>())
+    numbering->dialect = &numberDialect(opaqueType.getDialectNamespace());
+  else
+    numbering->dialect = &numberDialect(&type.getDialect());
+}

diff  --git a/mlir/lib/Bytecode/Writer/IRNumbering.h b/mlir/lib/Bytecode/Writer/IRNumbering.h
new file mode 100644
index 0000000000000..fd8e3b14c62e5
--- /dev/null
+++ b/mlir/lib/Bytecode/Writer/IRNumbering.h
@@ -0,0 +1,193 @@
+//===- IRNumbering.h - MLIR bytecode IR numbering ---------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file contains various utilities that number IR structures in preparation
+// for bytecode emission.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LIB_MLIR_BYTECODE_WRITER_IRNUMBERING_H
+#define LIB_MLIR_BYTECODE_WRITER_IRNUMBERING_H
+
+#include "mlir/IR/OperationSupport.h"
+#include "llvm/ADT/MapVector.h"
+
+namespace mlir {
+class BytecodeWriterConfig;
+
+namespace bytecode {
+namespace detail {
+struct DialectNumbering;
+
+//===----------------------------------------------------------------------===//
+// Attribute and Type Numbering
+//===----------------------------------------------------------------------===//
+
+/// This class represents a numbering entry for an Attribute or Type.
+struct AttrTypeNumbering {
+  AttrTypeNumbering(PointerUnion<Attribute, Type> value) : value(value) {}
+
+  /// The concrete value.
+  PointerUnion<Attribute, Type> value;
+
+  /// The number assigned to this value.
+  unsigned number = 0;
+
+  /// The number of references to this value.
+  unsigned refCount = 1;
+
+  /// The dialect of this value.
+  DialectNumbering *dialect = nullptr;
+};
+struct AttributeNumbering : public AttrTypeNumbering {
+  AttributeNumbering(Attribute value) : AttrTypeNumbering(value) {}
+  Attribute getValue() const { return value.get<Attribute>(); }
+};
+struct TypeNumbering : public AttrTypeNumbering {
+  TypeNumbering(Type value) : AttrTypeNumbering(value) {}
+  Type getValue() const { return value.get<Type>(); }
+};
+
+//===----------------------------------------------------------------------===//
+// OpName Numbering
+//===----------------------------------------------------------------------===//
+
+/// This class represents the numbering entry of an operation name.
+struct OpNameNumbering {
+  OpNameNumbering(DialectNumbering *dialect, OperationName name)
+      : dialect(dialect), name(name) {}
+
+  /// The dialect of this value.
+  DialectNumbering *dialect;
+
+  /// The concrete name.
+  OperationName name;
+
+  /// The number assigned to this name.
+  unsigned number = 0;
+
+  /// The number of references to this name.
+  unsigned refCount = 1;
+};
+
+//===----------------------------------------------------------------------===//
+// Dialect Numbering
+//===----------------------------------------------------------------------===//
+
+/// This class represents a numbering entry for an Dialect.
+struct DialectNumbering {
+  DialectNumbering(StringRef name, unsigned number)
+      : name(name), number(number) {}
+
+  /// The namespace of the dialect.
+  StringRef name;
+
+  /// The number assigned to the dialect.
+  unsigned number;
+
+  /// The loaded dialect, or nullptr if the dialect isn't loaded.
+  Dialect *dialect = nullptr;
+};
+
+//===----------------------------------------------------------------------===//
+// IRNumberingState
+//===----------------------------------------------------------------------===//
+
+/// This class manages numbering IR entities in preparation of bytecode
+/// emission.
+class IRNumberingState {
+public:
+  IRNumberingState(Operation *op);
+
+  /// Return the numbered dialects.
+  auto getDialects() {
+    return llvm::make_pointee_range(llvm::make_second_range(dialects));
+  }
+  auto getAttributes() { return llvm::make_pointee_range(orderedAttrs); }
+  auto getOpNames() { return llvm::make_pointee_range(orderedOpNames); }
+  auto getTypes() { return llvm::make_pointee_range(orderedTypes); }
+
+  /// Return the number for the given IR unit.
+  unsigned getNumber(Attribute attr) {
+    assert(attrs.count(attr) && "attribute not numbered");
+    return attrs[attr]->number;
+  }
+  unsigned getNumber(Block *block) {
+    assert(blockIDs.count(block) && "block not numbered");
+    return blockIDs[block];
+  }
+  unsigned getNumber(OperationName opName) {
+    assert(opNames.count(opName) && "opName not numbered");
+    return opNames[opName]->number;
+  }
+  unsigned getNumber(Type type) {
+    assert(types.count(type) && "type not numbered");
+    return types[type]->number;
+  }
+  unsigned getNumber(Value value) {
+    assert(valueIDs.count(value) && "value not numbered");
+    return valueIDs[value];
+  }
+
+  /// Return the block and value counts of the given region.
+  std::pair<unsigned, unsigned> getBlockValueCount(Region *region) {
+    assert(regionBlockValueCounts.count(region) && "value not numbered");
+    return regionBlockValueCounts[region];
+  }
+
+  /// Return the number of operations in the given block.
+  unsigned getOperationCount(Block *block) {
+    assert(blockOperationCounts.count(block) && "block not numbered");
+    return blockOperationCounts[block];
+  }
+
+private:
+  /// Number the given IR unit for bytecode emission.
+  void number(Attribute attr);
+  void number(Block &block);
+  DialectNumbering &numberDialect(Dialect *dialect);
+  DialectNumbering &numberDialect(StringRef dialect);
+  void number(Operation &op);
+  void number(OperationName opName);
+  void number(Region &region);
+  void number(Type type);
+
+  /// Mapping from IR to the respective numbering entries.
+  DenseMap<Attribute, AttributeNumbering *> attrs;
+  DenseMap<OperationName, OpNameNumbering *> opNames;
+  DenseMap<Type, TypeNumbering *> types;
+  DenseMap<Dialect *, DialectNumbering *> registeredDialects;
+  llvm::MapVector<StringRef, DialectNumbering *> dialects;
+  std::vector<AttributeNumbering *> orderedAttrs;
+  std::vector<OpNameNumbering *> orderedOpNames;
+  std::vector<TypeNumbering *> orderedTypes;
+
+  /// Allocators used for the various numbering entries.
+  llvm::SpecificBumpPtrAllocator<AttributeNumbering> attrAllocator;
+  llvm::SpecificBumpPtrAllocator<DialectNumbering> dialectAllocator;
+  llvm::SpecificBumpPtrAllocator<OpNameNumbering> opNameAllocator;
+  llvm::SpecificBumpPtrAllocator<TypeNumbering> typeAllocator;
+
+  /// The value ID for each Block and Value.
+  DenseMap<Block *, unsigned> blockIDs;
+  DenseMap<Value, unsigned> valueIDs;
+
+  /// The number of operations in each block.
+  DenseMap<Block *, unsigned> blockOperationCounts;
+
+  /// A map from region to the number of blocks and values within that region.
+  DenseMap<Region *, std::pair<unsigned, unsigned>> regionBlockValueCounts;
+
+  /// The next value ID to assign when numbering.
+  unsigned nextValueID = 0;
+};
+} // namespace detail
+} // namespace bytecode
+} // namespace mlir
+
+#endif

diff  --git a/mlir/lib/CMakeLists.txt b/mlir/lib/CMakeLists.txt
index 0da0d544cc657..3fb13a2c20021 100644
--- a/mlir/lib/CMakeLists.txt
+++ b/mlir/lib/CMakeLists.txt
@@ -3,6 +3,7 @@ add_flag_if_supported("-Werror=global-constructors" WERROR_GLOBAL_CONSTRUCTOR)
 
 add_subdirectory(Analysis)
 add_subdirectory(AsmParser)
+add_subdirectory(Bytecode)
 add_subdirectory(Conversion)
 add_subdirectory(Dialect)
 add_subdirectory(IR)

diff  --git a/mlir/lib/Parser/CMakeLists.txt b/mlir/lib/Parser/CMakeLists.txt
index 7875b0fe5e744..9dcbb63dfdf72 100644
--- a/mlir/lib/Parser/CMakeLists.txt
+++ b/mlir/lib/Parser/CMakeLists.txt
@@ -6,5 +6,6 @@ add_mlir_library(MLIRParser
 
   LINK_LIBS PUBLIC
   MLIRAsmParser
+  MLIRBytecodeReader
   MLIRIR
   )

diff  --git a/mlir/lib/Parser/Parser.cpp b/mlir/lib/Parser/Parser.cpp
index 270e1d6f513f6..4c0742895b608 100644
--- a/mlir/lib/Parser/Parser.cpp
+++ b/mlir/lib/Parser/Parser.cpp
@@ -12,6 +12,7 @@
 
 #include "mlir/Parser/Parser.h"
 #include "mlir/AsmParser/AsmParser.h"
+#include "mlir/Bytecode/BytecodeReader.h"
 #include "llvm/Support/SourceMgr.h"
 
 using namespace mlir;
@@ -25,6 +26,8 @@ LogicalResult mlir::parseSourceFile(const llvm::SourceMgr &sourceMgr,
                                          sourceBuf->getBufferIdentifier(),
                                          /*line=*/0, /*column=*/0);
   }
+  if (isBytecode(*sourceBuf))
+    return readBytecodeFile(*sourceBuf, block, config);
   return parseAsmSourceFile(sourceMgr, block, config);
 }
 

diff  --git a/mlir/lib/Tools/mlir-opt/CMakeLists.txt b/mlir/lib/Tools/mlir-opt/CMakeLists.txt
index 318eb0a383fd0..dfcff9d034364 100644
--- a/mlir/lib/Tools/mlir-opt/CMakeLists.txt
+++ b/mlir/lib/Tools/mlir-opt/CMakeLists.txt
@@ -5,6 +5,7 @@ add_mlir_library(MLIROptLib
   ${MLIR_MAIN_INCLUDE_DIR}/mlir/Tools/mlir-opt
 
   LINK_LIBS PUBLIC
+  MLIRBytecodeWriter
   MLIRPass
   MLIRParser
   MLIRSupport

diff  --git a/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp b/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
index bde939fab6525..519da97af0c70 100644
--- a/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
+++ b/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
@@ -12,6 +12,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "mlir/Tools/mlir-opt/MlirOptMain.h"
+#include "mlir/Bytecode/BytecodeWriter.h"
 #include "mlir/IR/AsmState.h"
 #include "mlir/IR/Attributes.h"
 #include "mlir/IR/BuiltinOps.h"
@@ -47,7 +48,8 @@ using namespace llvm;
 static LogicalResult performActions(raw_ostream &os, bool verifyDiagnostics,
                                     bool verifyPasses, SourceMgr &sourceMgr,
                                     MLIRContext *context,
-                                    PassPipelineFn passManagerSetupFn) {
+                                    PassPipelineFn passManagerSetupFn,
+                                    bool emitBytecode) {
   DefaultTimingManager tm;
   applyDefaultTimingManagerCLOptions(tm);
   TimingScope timing = tm.getRootScope();
@@ -86,8 +88,12 @@ static LogicalResult performActions(raw_ostream &os, bool verifyDiagnostics,
 
   // Print the output.
   TimingScope outputTiming = timing.nest("Output");
-  module->print(os);
-  os << '\n';
+  if (emitBytecode) {
+    writeBytecodeToFile(module->getOperation(), os);
+  } else {
+    module->print(os);
+    os << '\n';
+  }
   return success();
 }
 
@@ -97,8 +103,8 @@ static LogicalResult
 processBuffer(raw_ostream &os, std::unique_ptr<MemoryBuffer> ownedBuffer,
               bool verifyDiagnostics, bool verifyPasses,
               bool allowUnregisteredDialects, bool preloadDialectsInContext,
-              PassPipelineFn passManagerSetupFn, DialectRegistry &registry,
-              llvm::ThreadPool *threadPool) {
+              bool emitBytecode, PassPipelineFn passManagerSetupFn,
+              DialectRegistry &registry, llvm::ThreadPool *threadPool) {
   // Tell sourceMgr about this buffer, which is what the parser will pick up.
   SourceMgr sourceMgr;
   sourceMgr.AddNewSourceBuffer(std::move(ownedBuffer), SMLoc());
@@ -122,7 +128,7 @@ processBuffer(raw_ostream &os, std::unique_ptr<MemoryBuffer> ownedBuffer,
   if (!verifyDiagnostics) {
     SourceMgrDiagnosticHandler sourceMgrHandler(sourceMgr, &context);
     return performActions(os, verifyDiagnostics, verifyPasses, sourceMgr,
-                          &context, passManagerSetupFn);
+                          &context, passManagerSetupFn, emitBytecode);
   }
 
   SourceMgrDiagnosticVerifierHandler sourceMgrHandler(sourceMgr, &context);
@@ -131,7 +137,7 @@ processBuffer(raw_ostream &os, std::unique_ptr<MemoryBuffer> ownedBuffer,
   // these actions succeed or fail, we only care what diagnostics they produce
   // and whether they match our expectations.
   (void)performActions(os, verifyDiagnostics, verifyPasses, sourceMgr, &context,
-                       passManagerSetupFn);
+                       passManagerSetupFn, emitBytecode);
 
   // Verify the diagnostic handler to make sure that each of the diagnostics
   // matched.
@@ -144,7 +150,8 @@ LogicalResult mlir::MlirOptMain(raw_ostream &outputStream,
                                 DialectRegistry &registry, bool splitInputFile,
                                 bool verifyDiagnostics, bool verifyPasses,
                                 bool allowUnregisteredDialects,
-                                bool preloadDialectsInContext) {
+                                bool preloadDialectsInContext,
+                                bool emitBytecode) {
   // The split-input-file mode is a very specific mode that slices the file
   // up into small pieces and checks each independently.
   // We use an explicit threadpool to avoid creating and joining/destroying
@@ -163,8 +170,8 @@ LogicalResult mlir::MlirOptMain(raw_ostream &outputStream,
                      raw_ostream &os) {
     return processBuffer(os, std::move(chunkBuffer), verifyDiagnostics,
                          verifyPasses, allowUnregisteredDialects,
-                         preloadDialectsInContext, passManagerSetupFn, registry,
-                         threadPool);
+                         preloadDialectsInContext, emitBytecode,
+                         passManagerSetupFn, registry, threadPool);
   };
   return splitAndProcessBuffer(std::move(buffer), chunkFn, outputStream,
                                splitInputFile, /*insertMarkerInOutput=*/true);
@@ -176,7 +183,8 @@ LogicalResult mlir::MlirOptMain(raw_ostream &outputStream,
                                 DialectRegistry &registry, bool splitInputFile,
                                 bool verifyDiagnostics, bool verifyPasses,
                                 bool allowUnregisteredDialects,
-                                bool preloadDialectsInContext) {
+                                bool preloadDialectsInContext,
+                                bool emitBytecode) {
   auto passManagerSetupFn = [&](PassManager &pm) {
     auto errorHandler = [&](const Twine &msg) {
       emitError(UnknownLoc::get(pm.getContext())) << msg;
@@ -186,7 +194,8 @@ LogicalResult mlir::MlirOptMain(raw_ostream &outputStream,
   };
   return MlirOptMain(outputStream, std::move(buffer), passManagerSetupFn,
                      registry, splitInputFile, verifyDiagnostics, verifyPasses,
-                     allowUnregisteredDialects, preloadDialectsInContext);
+                     allowUnregisteredDialects, preloadDialectsInContext,
+                     emitBytecode);
 }
 
 LogicalResult mlir::MlirOptMain(int argc, char **argv, llvm::StringRef toolName,
@@ -224,6 +233,10 @@ LogicalResult mlir::MlirOptMain(int argc, char **argv, llvm::StringRef toolName,
       "show-dialects", cl::desc("Print the list of registered dialects"),
       cl::init(false));
 
+  static cl::opt<bool> emitBytecode(
+      "emit-bytecode", cl::desc("Emit bytecode when generating output"),
+      cl::init(false));
+
   InitLLVM y(argc, argv);
 
   // Register any command line options.
@@ -268,7 +281,8 @@ LogicalResult mlir::MlirOptMain(int argc, char **argv, llvm::StringRef toolName,
 
   if (failed(MlirOptMain(output->os(), std::move(file), passPipeline, registry,
                          splitInputFile, verifyDiagnostics, verifyPasses,
-                         allowUnregisteredDialects, preloadDialectsInContext)))
+                         allowUnregisteredDialects, preloadDialectsInContext,
+                         emitBytecode)))
     return failure();
 
   // Keep the output file if the invocation of MlirOptMain was successful.

diff  --git a/mlir/test/Bytecode/general.mlir b/mlir/test/Bytecode/general.mlir
new file mode 100644
index 0000000000000..2aa30e3de269f
--- /dev/null
+++ b/mlir/test/Bytecode/general.mlir
@@ -0,0 +1,34 @@
+// RUN: mlir-opt -allow-unregistered-dialect -emit-bytecode %s | mlir-opt -allow-unregistered-dialect | FileCheck %s
+
+// CHECK-LABEL: "bytecode.test1"
+// CHECK-NEXT:    "bytecode.empty"() : () -> ()
+// CHECK-NEXT:    "bytecode.attributes"() {attra = 10 : i64, attrb = #bytecode.attr} : () -> ()
+// CHECK-NEXT:    test.graph_region {
+// CHECK-NEXT:      "bytecode.operands"(%[[RESULTS:.*]]#0, %[[RESULTS]]#1, %[[RESULTS]]#2) : (i32, i64, i32) -> ()
+// CHECK-NEXT:      %[[RESULTS]]:3 = "bytecode.results"() : () -> (i32, i64, i32)
+// CHECK-NEXT:    }
+// CHECK-NEXT:    "bytecode.branch"()[^[[BLOCK:.*]]] : () -> ()
+// CHECK-NEXT:  ^[[BLOCK]](%[[ARG0:.*]]: i32, %[[ARG1:.*]]: !bytecode.int, %[[ARG2:.*]]: !pdl.operation):
+// CHECK-NEXT:    "bytecode.regions"() ({
+// CHECK-NEXT:      "bytecode.operands"(%[[ARG0]], %[[ARG1]], %[[ARG2]]) : (i32, !bytecode.int, !pdl.operation) -> ()
+// CHECK-NEXT:      "bytecode.return"() : () -> ()
+// CHECK-NEXT:    }) : () -> ()
+// CHECK-NEXT:    "bytecode.return"() : () -> ()
+// CHECK-NEXT:  }) : () -> ()
+
+"bytecode.test1"() ({
+  "bytecode.empty"() : () -> ()
+  "bytecode.attributes"() {attra = 10, attrb = #bytecode.attr} : () -> ()
+  test.graph_region {
+    "bytecode.operands"(%results#0, %results#1, %results#2) : (i32, i64, i32) -> ()
+    %results:3 = "bytecode.results"() : () -> (i32, i64, i32)
+  }
+  "bytecode.branch"()[^secondBlock] : () -> ()
+
+^secondBlock(%arg1: i32, %arg2: !bytecode.int, %arg3: !pdl.operation):
+  "bytecode.regions"() ({
+    "bytecode.operands"(%arg1, %arg2, %arg3) : (i32, !bytecode.int, !pdl.operation) -> ()
+    "bytecode.return"() : () -> ()
+  }) : () -> ()
+  "bytecode.return"() : () -> ()
+}) : () -> ()

diff  --git a/mlir/test/Bytecode/invalid/invalid-attr_type_offset_section-large_offset.mlirbc b/mlir/test/Bytecode/invalid/invalid-attr_type_offset_section-large_offset.mlirbc
new file mode 100644
index 0000000000000..84bd1cc3aa073
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-attr_type_offset_section-large_offset.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-attr_type_offset_section-trailing_data.mlirbc b/mlir/test/Bytecode/invalid/invalid-attr_type_offset_section-trailing_data.mlirbc
new file mode 100644
index 0000000000000..ae7af203e2750
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-attr_type_offset_section-trailing_data.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-attr_type_section-index.mlirbc b/mlir/test/Bytecode/invalid/invalid-attr_type_section-index.mlirbc
new file mode 100644
index 0000000000000..e630416d404af
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-attr_type_section-index.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-attr_type_section-trailing_data.mlirbc b/mlir/test/Bytecode/invalid/invalid-attr_type_section-trailing_data.mlirbc
new file mode 100644
index 0000000000000..28b161b797d79
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-attr_type_section-trailing_data.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-dialect_section-dialect_string.mlirbc b/mlir/test/Bytecode/invalid/invalid-dialect_section-dialect_string.mlirbc
new file mode 100644
index 0000000000000..39d899217f256
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-dialect_section-dialect_string.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-dialect_section-opname_dialect.mlirbc b/mlir/test/Bytecode/invalid/invalid-dialect_section-opname_dialect.mlirbc
new file mode 100644
index 0000000000000..192d1ddcee441
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-dialect_section-opname_dialect.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-dialect_section-opname_string.mlirbc b/mlir/test/Bytecode/invalid/invalid-dialect_section-opname_string.mlirbc
new file mode 100644
index 0000000000000..49ef041768f4a
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-dialect_section-opname_string.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-dialect_section.mlir b/mlir/test/Bytecode/invalid/invalid-dialect_section.mlir
new file mode 100644
index 0000000000000..0508c0966710e
--- /dev/null
+++ b/mlir/test/Bytecode/invalid/invalid-dialect_section.mlir
@@ -0,0 +1,19 @@
+// This file contains various failure test cases related to the structure of
+// the dialect section.
+
+//===--------------------------------------------------------------------===//
+// Dialect Name
+//===--------------------------------------------------------------------===//
+
+// RUN: not mlir-opt %S/invalid-dialect_section-dialect_string.mlirbc 2>&1 | FileCheck %s --check-prefix=DIALECT_STR
+// DIALECT_STR: invalid string index: 15
+
+//===--------------------------------------------------------------------===//
+// OpName
+//===--------------------------------------------------------------------===//
+
+// RUN: not mlir-opt %S/invalid-dialect_section-opname_dialect.mlirbc 2>&1 | FileCheck %s --check-prefix=OPNAME_DIALECT
+// OPNAME_DIALECT: invalid dialect index: 7
+
+// RUN: not mlir-opt %S/invalid-dialect_section-opname_string.mlirbc 2>&1 | FileCheck %s --check-prefix=OPNAME_STR
+// OPNAME_STR: invalid string index: 31

diff  --git a/mlir/test/Bytecode/invalid/invalid-ir_section-attr.mlirbc b/mlir/test/Bytecode/invalid/invalid-ir_section-attr.mlirbc
new file mode 100644
index 0000000000000..8e13750fa088f
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-ir_section-attr.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-ir_section-forwardref.mlirbc b/mlir/test/Bytecode/invalid/invalid-ir_section-forwardref.mlirbc
new file mode 100644
index 0000000000000..deecc35a2b49b
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-ir_section-forwardref.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-ir_section-loc.mlirbc b/mlir/test/Bytecode/invalid/invalid-ir_section-loc.mlirbc
new file mode 100644
index 0000000000000..8bc5331a3d611
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-ir_section-loc.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-ir_section-operands.mlirbc b/mlir/test/Bytecode/invalid/invalid-ir_section-operands.mlirbc
new file mode 100644
index 0000000000000..b8fc9de63b76a
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-ir_section-operands.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-ir_section-opname.mlirbc b/mlir/test/Bytecode/invalid/invalid-ir_section-opname.mlirbc
new file mode 100644
index 0000000000000..e6197bdf3523f
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-ir_section-opname.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-ir_section-results.mlirbc b/mlir/test/Bytecode/invalid/invalid-ir_section-results.mlirbc
new file mode 100644
index 0000000000000..004b1e9d92a30
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-ir_section-results.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-ir_section-successors.mlirbc b/mlir/test/Bytecode/invalid/invalid-ir_section-successors.mlirbc
new file mode 100644
index 0000000000000..73e7832b4f022
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-ir_section-successors.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-ir_section.mlir b/mlir/test/Bytecode/invalid/invalid-ir_section.mlir
new file mode 100644
index 0000000000000..be87e81bdd5f3
--- /dev/null
+++ b/mlir/test/Bytecode/invalid/invalid-ir_section.mlir
@@ -0,0 +1,45 @@
+// This file contains various failure test cases related to the structure of
+// the IR section.
+
+//===--------------------------------------------------------------------===//
+// Operations
+//===--------------------------------------------------------------------===//
+
+//===--------------------------------------------------------------------===//
+// Name
+
+// RUN: not mlir-opt %S/invalid-ir_section-opname.mlirbc -allow-unregistered-dialect 2>&1 | FileCheck %s --check-prefix=OP_NAME
+// OP_NAME: invalid operation name index: 14
+
+//===--------------------------------------------------------------------===//
+// Loc
+
+// RUN: not mlir-opt %S/invalid-ir_section-loc.mlirbc -allow-unregistered-dialect 2>&1 | FileCheck %s --check-prefix=OP_LOC
+// OP_LOC: expected attribute of type: {{.*}}, but got: {attra = 10 : i64, attrb = #bytecode.attr}
+
+//===--------------------------------------------------------------------===//
+// Attr
+
+// RUN: not mlir-opt %S/invalid-ir_section-attr.mlirbc -allow-unregistered-dialect 2>&1 | FileCheck %s --check-prefix=OP_ATTR
+// OP_ATTR: expected attribute of type: {{.*}}, but got: loc(unknown)
+
+//===--------------------------------------------------------------------===//
+// Operands
+
+// RUN: not mlir-opt %S/invalid-ir_section-operands.mlirbc -allow-unregistered-dialect 2>&1 | FileCheck %s --check-prefix=OP_OPERANDS
+// OP_OPERANDS: invalid value index: 6
+
+// RUN: not mlir-opt %S/invalid-ir_section-forwardref.mlirbc -allow-unregistered-dialect 2>&1 | FileCheck %s --check-prefix=FORWARD_REF
+// FORWARD_REF: not all forward unresolved forward operand references
+
+//===--------------------------------------------------------------------===//
+// Results
+
+// RUN: not mlir-opt %S/invalid-ir_section-results.mlirbc -allow-unregistered-dialect 2>&1 | FileCheck %s --check-prefix=OP_RESULTS
+// OP_RESULTS: value index range was outside of the expected range for the parent region, got [3, 6), but the maximum index was 2
+
+//===--------------------------------------------------------------------===//
+// Successors
+
+// RUN: not mlir-opt %S/invalid-ir_section-successors.mlirbc -allow-unregistered-dialect 2>&1 | FileCheck %s --check-prefix=OP_SUCCESSORS
+// OP_SUCCESSORS: invalid successor index: 3

diff  --git a/mlir/test/Bytecode/invalid/invalid-string_section-count.mlirbc b/mlir/test/Bytecode/invalid/invalid-string_section-count.mlirbc
new file mode 100644
index 0000000000000..13e0f39f88f1d
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-string_section-count.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-string_section-large_string.mlirbc b/mlir/test/Bytecode/invalid/invalid-string_section-large_string.mlirbc
new file mode 100644
index 0000000000000..ecd4c53b38fc1
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-string_section-large_string.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-string_section-no_string.mlirbc b/mlir/test/Bytecode/invalid/invalid-string_section-no_string.mlirbc
new file mode 100644
index 0000000000000..237d482b82afd
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-string_section-no_string.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-string_section-trailing_data.mlirbc b/mlir/test/Bytecode/invalid/invalid-string_section-trailing_data.mlirbc
new file mode 100644
index 0000000000000..2e38790873815
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-string_section-trailing_data.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-string_section.mlir b/mlir/test/Bytecode/invalid/invalid-string_section.mlir
new file mode 100644
index 0000000000000..a39017c3375e5
--- /dev/null
+++ b/mlir/test/Bytecode/invalid/invalid-string_section.mlir
@@ -0,0 +1,26 @@
+// This file contains various failure test cases related to the structure of
+// the string section.
+
+//===--------------------------------------------------------------------===//
+// Count
+//===--------------------------------------------------------------------===//
+
+// RUN: not mlir-opt %S/invalid-string_section-count.mlirbc 2>&1 | FileCheck %s --check-prefix=COUNT
+// COUNT: attempting to parse a byte at the end of the bytecode
+
+//===--------------------------------------------------------------------===//
+// Invalid String
+//===--------------------------------------------------------------------===//
+
+// RUN: not mlir-opt %S/invalid-string_section-no_string.mlirbc 2>&1 | FileCheck %s --check-prefix=NO_STRING
+// NO_STRING: attempting to parse a byte at the end of the bytecode
+
+// RUN: not mlir-opt %S/invalid-string_section-large_string.mlirbc 2>&1 | FileCheck %s --check-prefix=LARGE_STRING
+// LARGE_STRING: string size exceeds the available data size
+
+//===--------------------------------------------------------------------===//
+// Trailing data
+//===--------------------------------------------------------------------===//
+
+// RUN: not mlir-opt %S/invalid-string_section-trailing_data.mlirbc 2>&1 | FileCheck %s --check-prefix=TRAILING_DATA
+// TRAILING_DATA: unexpected trailing data between the offsets for strings and their data

diff  --git a/mlir/test/Bytecode/invalid/invalid-structure-producer.mlirbc b/mlir/test/Bytecode/invalid/invalid-structure-producer.mlirbc
new file mode 100644
index 0000000000000..91d523653dc4d
--- /dev/null
+++ b/mlir/test/Bytecode/invalid/invalid-structure-producer.mlirbc
@@ -0,0 +1 @@
+MLïRÿ
\ No newline at end of file

diff  --git a/mlir/test/Bytecode/invalid/invalid-structure-section-duplicate.mlirbc b/mlir/test/Bytecode/invalid/invalid-structure-section-duplicate.mlirbc
new file mode 100644
index 0000000000000..c27d214baa736
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-structure-section-duplicate.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-structure-section-id-unknown.mlirbc b/mlir/test/Bytecode/invalid/invalid-structure-section-id-unknown.mlirbc
new file mode 100644
index 0000000000000..f59dd3ccdf9c7
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-structure-section-id-unknown.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-structure-section-length.mlirbc b/mlir/test/Bytecode/invalid/invalid-structure-section-length.mlirbc
new file mode 100644
index 0000000000000..e2e928c19f88a
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-structure-section-length.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-structure-section-missing.mlirbc b/mlir/test/Bytecode/invalid/invalid-structure-section-missing.mlirbc
new file mode 100644
index 0000000000000..8cc405783a683
Binary files /dev/null and b/mlir/test/Bytecode/invalid/invalid-structure-section-missing.mlirbc 
diff er

diff  --git a/mlir/test/Bytecode/invalid/invalid-structure-version.mlirbc b/mlir/test/Bytecode/invalid/invalid-structure-version.mlirbc
new file mode 100644
index 0000000000000..9cbf7f8a8e38c
--- /dev/null
+++ b/mlir/test/Bytecode/invalid/invalid-structure-version.mlirbc
@@ -0,0 +1 @@
+MLïRÿ
\ No newline at end of file

diff  --git a/mlir/test/Bytecode/invalid/invalid-structure.mlir b/mlir/test/Bytecode/invalid/invalid-structure.mlir
new file mode 100644
index 0000000000000..95d9efdfcb63d
--- /dev/null
+++ b/mlir/test/Bytecode/invalid/invalid-structure.mlir
@@ -0,0 +1,44 @@
+// This file contains various failure test cases related to the structure of
+// a bytecode file.
+
+//===--------------------------------------------------------------------===//
+// Version
+//===--------------------------------------------------------------------===//
+
+// RUN: not mlir-opt %S/invalid-structure-version.mlirbc 2>&1 | FileCheck %s --check-prefix=VERSION
+// VERSION: bytecode version 127 is newer than the current version 0
+
+//===--------------------------------------------------------------------===//
+// Producer
+//===--------------------------------------------------------------------===//
+
+// RUN: not mlir-opt %S/invalid-structure-producer.mlirbc 2>&1 | FileCheck %s --check-prefix=PRODUCER
+// PRODUCER: malformed null-terminated string, no null character found
+
+//===--------------------------------------------------------------------===//
+// Section
+//===--------------------------------------------------------------------===//
+
+//===--------------------------------------------------------------------===//
+// Missing
+
+// RUN: not mlir-opt %S/invalid-structure-section-missing.mlirbc 2>&1 | FileCheck %s --check-prefix=SECTION_MISSING
+// SECTION_MISSING: missing data for top-level section: String (0)
+
+//===--------------------------------------------------------------------===//
+// ID
+
+// RUN: not mlir-opt %S/invalid-structure-section-id-unknown.mlirbc 2>&1 | FileCheck %s --check-prefix=SECTION_ID_UNKNOWN
+// SECTION_ID_UNKNOWN: invalid section ID: 255
+
+//===--------------------------------------------------------------------===//
+// Length
+
+// RUN: not mlir-opt %S/invalid-structure-section-length.mlirbc 2>&1 | FileCheck %s --check-prefix=SECTION_LENGTH
+// SECTION_LENGTH: attempting to parse a byte at the end of the bytecode
+
+//===--------------------------------------------------------------------===//
+// Duplicate
+
+// RUN: not mlir-opt %S/invalid-structure-section-duplicate.mlirbc 2>&1 | FileCheck %s --check-prefix=SECTION_DUPLICATE
+// SECTION_DUPLICATE: duplicate top-level section: String (0)

diff  --git a/mlir/test/Bytecode/invalid/invalid_attr_type_offset_section.mlir b/mlir/test/Bytecode/invalid/invalid_attr_type_offset_section.mlir
new file mode 100644
index 0000000000000..9a8bc08e6c25b
--- /dev/null
+++ b/mlir/test/Bytecode/invalid/invalid_attr_type_offset_section.mlir
@@ -0,0 +1,16 @@
+// This file contains various failure test cases related to the structure of
+// the attribute/type offset section.
+
+//===--------------------------------------------------------------------===//
+// Offset
+//===--------------------------------------------------------------------===//
+
+// RUN: not mlir-opt %S/invalid-attr_type_offset_section-large_offset.mlirbc 2>&1 | FileCheck %s --check-prefix=LARGE_OFFSET
+// LARGE_OFFSET: Attribute or Type entry offset points past the end of section
+
+//===--------------------------------------------------------------------===//
+// Trailing Data
+//===--------------------------------------------------------------------===//
+
+// RUN: not mlir-opt %S/invalid-attr_type_offset_section-trailing_data.mlirbc 2>&1 | FileCheck %s --check-prefix=TRAILING_DATA
+// TRAILING_DATA: unexpected trailing data in the Attribute/Type offset section

diff  --git a/mlir/test/Bytecode/invalid/invalid_attr_type_section.mlir b/mlir/test/Bytecode/invalid/invalid_attr_type_section.mlir
new file mode 100644
index 0000000000000..aba6b3fd1a34a
--- /dev/null
+++ b/mlir/test/Bytecode/invalid/invalid_attr_type_section.mlir
@@ -0,0 +1,16 @@
+// This file contains various failure test cases related to the structure of
+// the attribute/type offset section.
+
+//===--------------------------------------------------------------------===//
+// Index
+//===--------------------------------------------------------------------===//
+
+// RUN: not mlir-opt %S/invalid-attr_type_section-index.mlirbc 2>&1 | FileCheck %s --check-prefix=INDEX
+// INDEX: invalid Attribute index: 3
+
+//===--------------------------------------------------------------------===//
+// Trailing Data
+//===--------------------------------------------------------------------===//
+
+// RUN: not mlir-opt %S/invalid-attr_type_section-trailing_data.mlirbc 2>&1 | FileCheck %s --check-prefix=TRAILING_DATA
+// TRAILING_DATA: trailing characters found after Attribute assembly format: trailing