[PATCH] D81258: [yaml2obj] - Introduce a ~10 Mb limit of the output by default and a --nolimit option.

Mon Jun 8 00:30:05 PDT 2020

jhenderson added a subscriber: Higuoxing.
jhenderson added inline comments.

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:34-36
 // This class is used to build up a contiguous binary blob while keeping
 // track of an offset in the output (which notionally begins at
 // `InitialOffset`).
----------------
The new limitation behaviour probably wants documenting in the comments somewhere, like here and/or by the methods that have been added/changed.

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:42-43
+
   SmallVector<char, 128> Buf;
   raw_svector_ostream OS;
+  bool ReachedLimit;
----------------
Somewhat unrelated, but I can't help but feel that a `SmallVector` of 128 chars is probably not appropriate for many cases. This wouldn't be able to represent anything more than an ELF header and single section header or program header before spilling over the 128 limit.

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:55
+  ContiguousBlobAccumulator(uint64_t BaseOffset, uint64_t SizeLimit)
+      : InitialOffset(BaseOffset), MaxSize(SizeLimit), Buf(), OS(Buf),
+        ReachedLimit(false) {}
----------------
Presumably, you can get rid of the `Buf` constructor call here.

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:908-909
+  // --nolimit yaml2obj command line option.
+  raw_ostream &OS = *CBA.getRawOS(0x100000 * 8); // 8Mb.
+  uint64_t BeginOffset = CBA.tell();
   if (Name == ".debug_str")
----------------
This feels like a bit of a hack to get this to work with the DWARFYAML interface. @Higuoxing, how straightforward do you think it would be to modify the DWARFYAML interface to take a `ContiguousBlobAccumulator`? I'm guessing not particularly, since we'd need to move it into a shared header etc, but thought I'd ask.

@grimar, as an alternative, maybe you could create a new CBA here, with a size limit equal to the remaining size available in the original CBA. You'd then append its buffer to the original, once you'd checked the size limit hadn't been breached.

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:1762-1763
       sizeof(Elf_Ehdr) + sizeof(Elf_Phdr) * Doc.ProgramHeaders.size();
-  ContiguousBlobAccumulator CBA(SectionContentBeginOffset);
+  // Often it is an issue when the output produced by yaml2obj is too large,
+  // because it is slower and might we caused by an issue in a YAML description.
+  // We limit the maximum allowed content size, but also provide a command line
----------------
There are a few grammar issues with this sentence, and it feels a bit hard for me to follow. How about changing this sentence to: "It is quite easy to accidentally create output with yaml2obj that is larger than intended, for example, due to an issue in the YAML description."

================
Comment at: llvm/test/tools/yaml2obj/ELF/output-limit.yaml:11
+# RUN: yaml2obj %s -DSIZE=0xA00000 --docnum=1 -o /dev/null 2>&1
+# RUN: not yaml2obj %s -DSIZE=0xA00001 --docnum=1 -o /dev/null 2>&1 | \
+# RUN:  FileCheck %s --check-prefix=ERROR
----------------
You should also show the behaviour with the switch specified.

================
Comment at: llvm/tools/yaml2obj/yaml2obj.cpp:47-49
+static cl::opt<bool>
+    NoLimit("nolimit", cl::init(false),
+            cl::desc("Do not limit the output size (ELF only)"));
----------------
Rather than have this a boolean, I'd make it configurable, so that users who want a large object but have some upper bound in mind can do so. It might also be useful for testing yaml2obj itself - if we know that an object should be some specific size, but might end up larger due to a bug (e.g. file space usage of nobits), we could use the option to test it doesn't exceed that limit.

If you do that, I'd rename the option to `max-size`. `--nolimit` could be achieved by simply specifying a value of 0xffffffffffffffff, or we could special-case `--max-size=0` to mean unlimited. If you go down the route of 0 means unlimited, I'd make the size limit an `Optional` internally, and translate the 0 to `None` early (you could also just do the 0 -> uint64_tmax early instead).

Finally, this option actually only applies to section sizes currently. I think it would make sense to either a) apply it to the whole output, or b) rename to the option and change the help text further to indicate it only applies to sections (e.g. `--max-sections-size`).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81258/new/

https://reviews.llvm.org/D81258