[PATCH] D81258: [yaml2obj] - Introduce a 10 Mb limit of the output by default and a --max-size option.

Tue Jun 9 00:31:02 PDT 2020

jhenderson added a comment.

Another idea I had to prevent accidental forgetting to check the limit at the end of the function is to actually store an `Error` inside the `ContiguousBlobAccumulator`, similar to the `Cursor` sub-class in the `DataExtractor` class. The error would be set instead of `hasReachedLimit` and any writes would be skipped if it was already set. Then, if the class was destroyed without the error being checked, an assertion would fire. If you wanted, you could even configure the error to say where the limit was breached (e.g. "error: ... when writing 0x12345678 bytes for section .baz"). Not sure we necessarily need to go that far (i.e. with the additional context), but it might help when trying to figure out what exactly went wrong.

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:42-43
+
   SmallVector<char, 128> Buf;
   raw_svector_ostream OS;
+  bool ReachedLimit;
----------------
grimar wrote:
> jhenderson wrote:
> > Somewhat unrelated, but I can't help but feel that a `SmallVector` of 128 chars is probably not appropriate for many cases. This wouldn't be able to represent anything more than an ELF header and single section header or program header before spilling over the 128 limit.
> Yeah. It comes from constructor of `raw_svector_ostream` which takes `SmallVectorImpl<char>`.
> I think it could be just `SmallVector<char, 1>` to avoid a confusion.
Or just a `raw_string_ostream`?

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:37-40
+// The blob might be limited to an arbitrary size. All attempts to write a
+// data are ignored and the error condition is remembered when the limit is
+// reached. Such an approach allows us to simplify the code by delaying errors
+// reporting and doing it at a convenient time.
----------------
write a data -> write data
when the limit -> once the limit
errors reporting -> error reporting

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:60
+      : InitialOffset(BaseOffset), MaxSize(SizeLimit), OS(Buf),
+        ReachedLimit(false) {}

----------------
The inline initaliser @MaskRay is suggesting will allow you to get rid of this line.

Also, you should probably do `checkLimit` in the constructor - that way, if the `InitialOffset` is too great, it will set the error state immediately. This is probably not necessary if you adopt my suggestion of `hasReachedLimit` calling the `checkLimit` function.

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:904-906
+  // We are unable to predict the size of a debug data, so we request 0 bytes to
+  // write. This should always return us an output stream unless CBA is already
+  // in a error state.
----------------
of a debug data -> of debug data
request 0 bytes to write -> request to write 0 bytes
a error state -> an error state

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:920-921
     llvm_unreachable("unexpected emitDWARF() call");

-  return OS.tell() - BeginOffset;
+  // Now, when we wrote the debug data, we request an output stream again.
+  // This forces CBA to recheck the available data size and triggers
----------------
Now, when we wrote the debug data, -> Now that we have written the debug data,

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:924
+  // a corresponding internal error state if the limit was reached.
+  CBA.getRawOS(0);
+
----------------
Could you maybe change `hasReachedLimit` to actually check against the limit? I think that would be a more natural function to use based on names. Same probably goes above for the initial call to `getRawOS` in this function.

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:1773-1774
+  // than intended, for example, due to an issue in the YAML description.
+  // We limit the maximum allowed output size, but also provide a command line
+  // option to drop this limitation.
+  ContiguousBlobAccumulator CBA(SectionContentBeginOffset, MaxSize);
----------------
to drop -> to change

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:1784
+  // Align the start of the section header table, which is written after all
+  // other sections to the end of the file.
+  uint64_t SHOff =
----------------
after all other sections -> after all section data

(I think that would be a little clearer - the section header table isn't a section itself)

I think you can delete "to the end of the file"

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:1789-1790
+      SHOff + arrayDataSize(makeArrayRef(SHeaders)) > MaxSize)
+    State.reportError("the output size produced reached the limit. Consider "
+                      "using the --max-size option");
+
----------------
I'd change this message slightly: "Consider using ..." -> "Use the --max-size option to change the limit"

================
Comment at: llvm/test/tools/yaml2obj/ELF/output-limit.yaml:1
+## Check that yaml2obj limits the output size by default to 10 Mb.
+## Check it is possible to change this limit using the
----------------
Here and everywhere: Mb -> MB

(I think commonly Mb == megabits, rather than megabytes)

================
Comment at: llvm/test/tools/yaml2obj/ELF/output-limit.yaml:10
+# RUN: yaml2obj %s -DSIZE=0x9FFEC0 --docnum=1 -o /dev/null 2>&1
+# RUN: not yaml2obj %s -DSIZE=0x9FFEC1 --docnum=1 -o /dev/null 2>&1 | \
+# RUN:  FileCheck %s --check-prefix=ERROR
----------------
I think it would be worthwhile testing a case where there is no section header table at all. This is because the `CBA` is not used to write the elf header and program header table.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81258/new/

https://reviews.llvm.org/D81258