[PATCH] D11072: Implement tool to convert bitcode to text.

Wed Jul 29 15:13:51 PDT 2015

kschimpf added a comment.

I have added the new form "simplified" bitcode to this, based on conversations with Kostya and libFuzzer. I also moved the code into a separate library as suggested by Rafael.

================
Comment at: include/llvm/Bitcode/BitcodeConvert.h:1
@@ +1,2 @@
+//===- BitcodeConvert.h -----------------------------------------*- C++ -*-===//
+//
----------------
jvoung wrote:
> jvoung wrote:
> > Short description next to the file name?
> BitcodeConvert might be a bit too generic of a name. What does it convert to-from?
> 
> I'm sure more experienced reviewers would have a better suggestion. TextualBitcodeConvert ?
I don't like TextualBitcodeConvert because it converts back and forth between the text, binary, and (new) simplified forms.

The simplified form is based on discussions with Kostya and the way that libFuzzer works. It is intended to improve the ability of that fuzzer to mutate simplified forms.

================
Comment at: include/llvm/Bitcode/BitcodeConvert.h:16
@@ +15,3 @@
+// The goal of textual bitcode records is to provide a simple API to the
+// contents of bitcode files that is human readable, easy to fuzz, and easy to
+// write test cases. To do this, the textual form removes all notions of
----------------
jvoung wrote:
> For "easy to fuzz", I don't know if you mentioned the issue with binary form. The binary form isn't byte aligned, etc...
Changed the discussion to try and make this more clear.

================
Comment at: include/llvm/Bitcode/BitcodeConvert.h:59
@@ +58,3 @@
+/// Writes the textual representation fo Records into the buffer. If
+/// ErrorRecover is true, it will apply repairs and continue when
+/// possible. Returns true when successful.
----------------
jvoung wrote:
> Do you have any experience for supporting ErrorRecovery mode?
The main purpose for supporting error recovery is to be able to convert fuzzed (i.e. mutated) bitcode files into something that is (bitstream) readable. It is not clear if they are needed for the other forms (i.e. textual and the new variant "simplified"). However, since all three forms use the same driver, it is easier to assume that we can recover when writing all forms.

================
Comment at: lib/Bitcode/Reader/BitcodeRecordReader.cpp:1
@@ +1,2 @@
+//===- BitcodeRecordReader.cpp --------------------------------------------===//
+//
----------------
jvoung wrote:
> Wonder if these various fils should be TextualBitcode...
Decided to separate out each of the reader/writers into separate files, and moved then to a separate subdirectory (and library).

================
Comment at: lib/Bitcode/Reader/BitcodeRecordReader.cpp:288
@@ +287,3 @@
+
+  // Sniff for the signature.
+  if (Cursor->Read(8) != 'B' || Cursor->Read(8) != 'C' ||
----------------
jvoung wrote:
> Could this be shared with the "other" bitcode reader? (binary to IR reader, which has two instances of this already?).
> 
> The refactoring can be a separate patch too...
I agree that this should be factored out, however, I will do it in a separate patch.

================
Comment at: lib/Bitcode/Reader/BitcodeRecordReader.cpp:369
@@ +368,3 @@
+                  std::vector<std::unique_ptr<BitcodeRecord>> &Records) {
+  // TODO(kschimpf) Handle bitcode wrappers.
+  BinaryRecordParser Parser(std::move(Input), Records);
----------------
jvoung wrote:
> Doesn't this already check isBitcodeWrapper ?
Yes. Forgot to remove TODO after fix.

================
Comment at: lib/Bitcode/Writer/BitcodeRecordWriter.cpp:1
@@ +1,2 @@
+//===- BitcodeRecordwriter.cpp --------------------------------------------===//
+//
----------------
jvoung wrote:
> BitcodeRecordWriter.cpp (capital W)
> 
> Wonder if this file should also be prefixed with Textual...
Refactored the code so that each bitcode writer is in a separate source file.

================
Comment at: lib/Bitcode/Writer/BitcodeRecordWriter.cpp:243
@@ +242,3 @@
+
+  void writeHeader() {
+    Writer.Emit((unsigned)'B', 8);
----------------
jvoung wrote:
> If you are refactoring the reader to share code, in a separate patch, you could share this writeHeader also.
Agreed. Will create separate patch for this.

================
Comment at: tools/llvm-bcconv/llvm-bcconv.cpp:30
@@ +29,3 @@
+//
+// Note that the '-allow-comments' feature allows one to use textual bitcode
+// files as input to FileCheck.
----------------
jvoung wrote:
> Is the motivation for making it optional, so that a fuzzer doesn't go off and start filling up the file with comments which don't affect anything?
> 
> I guess I would have hoped that a fuzzer would be smart enough to detect that such changes aren't useful (no increase in coverage), but maybe it's still helpful to make it optional...
Yes. I'm not sure that it will guarantee that the fuzzer will notice this, so I was being safe.

http://reviews.llvm.org/D11072