[PATCH] D11072: Implement tool to convert bitcode to text.

Jan Voung jvoung at chromium.org
Thu Jul 9 13:48:07 PDT 2015


jvoung added a comment.

Made a quick scan over (not a full review).

Would be nice if folks who are more involved with fuzzing could chime in about whether or not they agree if this tool will be useful to them =)


================
Comment at: include/llvm/Bitcode/BitcodeConvert.h:1
@@ +1,2 @@
+//===- BitcodeConvert.h -----------------------------------------*- C++ -*-===//
+//
----------------
Short description next to the file name?

================
Comment at: include/llvm/Bitcode/BitcodeConvert.h:1
@@ +1,2 @@
+//===- BitcodeConvert.h -----------------------------------------*- C++ -*-===//
+//
----------------
jvoung wrote:
> Short description next to the file name?
BitcodeConvert might be a bit too generic of a name. What does it convert to-from?

I'm sure more experienced reviewers would have a better suggestion. TextualBitcodeConvert ?

================
Comment at: include/llvm/Bitcode/BitcodeConvert.h:13
@@ +12,3 @@
+// Each bitcode record is a list of 64-bit integers, separated by commas, and
+// terminated with a semicolin.
+//
----------------
semicolin -> semicolon (here and below)

================
Comment at: include/llvm/Bitcode/BitcodeConvert.h:16
@@ +15,3 @@
+// The goal of textual bitcode records is to provide a simple API to the
+// contents of bitcode files that is human readable, easy to fuzz, and easy to
+// write test cases. To do this, the textual form removes all notions of
----------------
For "easy to fuzz", I don't know if you mentioned the issue with binary form. The binary form isn't byte aligned, etc...

================
Comment at: include/llvm/Bitcode/BitcodeConvert.h:58
@@ +57,3 @@
+
+/// Writes the textual representation fo Records into the buffer. If
+/// ErrorRecover is true, it will apply repairs and continue when
----------------
fo -> for

================
Comment at: include/llvm/Bitcode/BitcodeConvert.h:59
@@ +58,3 @@
+/// Writes the textual representation fo Records into the buffer. If
+/// ErrorRecover is true, it will apply repairs and continue when
+/// possible. Returns true when successful.
----------------
Do you have any experience for supporting ErrorRecovery mode?

================
Comment at: lib/Bitcode/Reader/BitcodeRecordReader.cpp:1
@@ +1,2 @@
+//===- BitcodeRecordReader.cpp --------------------------------------------===//
+//
----------------
Wonder if these various fils should be TextualBitcode...

================
Comment at: lib/Bitcode/Reader/BitcodeRecordReader.cpp:18
@@ +17,3 @@
+// If comments are allowed, the comment begins with a semicolin. Any text after
+// a semicolin, to the end of the line is considered a comment. Empty lines are
+// also allowed.
----------------
semicolin -> semicolon in this file too

================
Comment at: lib/Bitcode/Reader/BitcodeRecordReader.cpp:62
@@ +61,3 @@
+  const char *name() const LLVM_NOEXCEPT override {
+    return "pnacl.text_bitcode";
+  }
----------------
Don't need to reference "pnacl" here...

================
Comment at: lib/Bitcode/Reader/BitcodeRecordReader.cpp:133
@@ +132,3 @@
+  // Tries to read a character in the given set of Characters. Returns
+  // the character found, or 0 if not found.
+  char readChar(const char *Characters);
----------------
Clarify that if not found, it will not advance the cursor.

================
Comment at: lib/Bitcode/Reader/BitcodeRecordReader.cpp:142
@@ +141,3 @@
+  // Skip empty lines, and lines beginning with a semicolin.
+  bool skipWhitespace();
+
----------------
How about skipWhitespaceLines() as an alternate name? This doesn't skip whitespace in the middle of a line.

================
Comment at: lib/Bitcode/Reader/BitcodeRecordReader.cpp:262
@@ +261,3 @@
+
+  /// read in a module block.
+  std::error_code readModule();
----------------
read -> Read

================
Comment at: lib/Bitcode/Reader/BitcodeRecordReader.cpp:271
@@ +270,3 @@
+
+  // read block with ID.
+  std::error_code readBlock(unsigned ID);
----------------
read -> Read

================
Comment at: lib/Bitcode/Reader/BitcodeRecordReader.cpp:288
@@ +287,3 @@
+
+  // Sniff for the signature.
+  if (Cursor->Read(8) != 'B' || Cursor->Read(8) != 'C' ||
----------------
Could this be shared with the "other" bitcode reader? (binary to IR reader, which has two instances of this already?).

The refactoring can be a separate patch too...

================
Comment at: lib/Bitcode/Reader/BitcodeRecordReader.cpp:369
@@ +368,3 @@
+                  std::vector<std::unique_ptr<BitcodeRecord>> &Records) {
+  // TODO(kschimpf) Handle bitcode wrappers.
+  BinaryRecordParser Parser(std::move(Input), Records);
----------------
Doesn't this already check isBitcodeWrapper ?

================
Comment at: lib/Bitcode/Writer/BitcodeRecordWriter.cpp:1
@@ +1,2 @@
+//===- BitcodeRecordwriter.cpp --------------------------------------------===//
+//
----------------
BitcodeRecordWriter.cpp (capital W)

Wonder if this file should also be prefixed with Textual...

================
Comment at: lib/Bitcode/Writer/BitcodeRecordWriter.cpp:12
@@ +11,3 @@
+//
+// The Textual bitcode writer takes a sequence of bitcode records, and writes
+// out textually to a buffer.  Each bitcode record is a sequence of 64-bit,
----------------
Capital T for Textual seems a bit inconsistent (binary doesn't get a capital B) in the docs. So, perhaps just lower case it.

================
Comment at: lib/Bitcode/Writer/BitcodeRecordWriter.cpp:243
@@ +242,3 @@
+
+  void writeHeader() {
+    Writer.Emit((unsigned)'B', 8);
----------------
If you are refactoring the reader to share code, in a separate patch, you could share this writeHeader also.

================
Comment at: tools/Makefile:30
@@ -29,4 +29,3 @@
 DIRS := llvm-config
-PARALLEL_DIRS := opt llvm-as llvm-dis llc llvm-ar llvm-nm llvm-link \
-                 lli llvm-extract llvm-mc bugpoint llvm-bcanalyzer llvm-diff \
-                 macho-dump llvm-objdump llvm-readobj llvm-rtdyld \
+ARALLEL_DIRS := opt llvm-as llvm-dis llc llvm-ar llvm-nm llvm-link \
+                 lli llvm-extract llvm-mc bugpoint llvm-bcanalyzer llvm-bcconv \
----------------
let's keep the P in PARALLEL

================
Comment at: tools/llvm-bcconv/Makefile:12
@@ +11,3 @@
+TOOLNAME := llvm-bcconv
+LINK_COMPONENTS := bitreader bitwriter Support
+
----------------
BitReader BitWriter

================
Comment at: tools/llvm-bcconv/llvm-bcconv.cpp:1
@@ +1,2 @@
+//===-- llvm-bcanalyzer.cpp - Bitcode Analyzer --------------------------===//
+//
----------------
Update the header (it's not llvm-bcanalyzer)

================
Comment at: tools/llvm-bcconv/llvm-bcconv.cpp:23
@@ +22,3 @@
+// Each bitcode record is a sequence of (unsigned) integers, separated by
+// commas, and terminated with a semicolin (followed by a newline).
+//
----------------
semicolon 

================
Comment at: tools/llvm-bcconv/llvm-bcconv.cpp:30
@@ +29,3 @@
+//
+// Note that the '-allow-comments' feature allows one to use textual bitcode
+// files as input to FileCheck.
----------------
Is the motivation for making it optional, so that a fuzzer doesn't go off and start filling up the file with comments which don't affect anything?

I guess I would have hoped that a fuzzer would be smart enough to detect that such changes aren't useful (no increase in coverage), but maybe it's still helpful to make it optional...


http://reviews.llvm.org/D11072







More information about the llvm-commits mailing list