[PATCH] D81258: [yaml2obj] - Introduce a 10 Mb limit of the output by default and a --max-size option.

Mon Jun 8 08:11:31 PDT 2020

grimar added inline comments.

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:42-43
+
   SmallVector<char, 128> Buf;
   raw_svector_ostream OS;
+  bool ReachedLimit;
----------------
jhenderson wrote:
> Somewhat unrelated, but I can't help but feel that a `SmallVector` of 128 chars is probably not appropriate for many cases. This wouldn't be able to represent anything more than an ELF header and single section header or program header before spilling over the 128 limit.
Yeah. It comes from constructor of `raw_svector_ostream` which takes `SmallVectorImpl<char>`.
I think it could be just `SmallVector<char, 1>` to avoid a confusion.

================
Comment at: llvm/lib/ObjectYAML/ELFEmitter.cpp:908-909
+  // --nolimit yaml2obj command line option.
+  raw_ostream &OS = *CBA.getRawOS(0x100000 * 8); // 8Mb.
+  uint64_t BeginOffset = CBA.tell();
   if (Name == ".debug_str")
----------------
jhenderson wrote:
> grimar wrote:
> > grimar wrote:
> > > Higuoxing wrote:
> > > > jhenderson wrote:
> > > > > This feels like a bit of a hack to get this to work with the DWARFYAML interface. @Higuoxing, how straightforward do you think it would be to modify the DWARFYAML interface to take a `ContiguousBlobAccumulator`? I'm guessing not particularly, since we'd need to move it into a shared header etc, but thought I'd ask.
> > > > > 
> > > > > @grimar, as an alternative, maybe you could create a new CBA here, with a size limit equal to the remaining size available in the original CBA. You'd then append its buffer to the original, once you'd checked the size limit hadn't been breached.
> > > > I think it's not easy to change the interface. `MachOEmitter` doesn't use the `ContiguousBlobAccumulator`.
> > > > since we'd need to move it into a shared header etc
> > > 
> > > Sharing `ContiguousBlobAccumulator` is a reasonable move I think. I thought about it when wrote this, as doing this should allow to reuse it for non-ELF emitters.
> > > 
> > > > as an alternative, maybe you could create a new CBA here, with a size limit equal to the remaining size available in the original CBA.
> > > 
> > > Ok, I'll try. But it is still not ideal. While we use `raw_ostream` in the `DWARFYAML` directly, it is possible to break everything. Imagine I create `ContiguousBlobAccumulator`. with a limit of 10 Mb here, but `DWARFYAML` tries to write 10 Gb.
> > > We might face a OOM inside `DWARFYAML` logic before having a chance to check that the size limit is breached here.
> > > 
> > > So the only right way is changing the DWARFYAML interface I believe. I think for now we can go with your suggestion and leave `DWARFYAML` changes for a follow-up?
> > > 
> > > I think it's not easy to change the interface. MachOEmitter doesn't use the ContiguousBlobAccumulator.
> > 
> > Yeah. I'd do it later. We can try to change all non-ELF emitters to take `ContiguousBlobAccumulator` to make the new limit option work for them too.
> Thanks @grimar. That sounds reasonable to me.
I've came up to a slightly different solution here, which allows to avoid creating a new `ContiguousBlobAccumulator` instance and hence safes a bit of memory and time.

================
Comment at: llvm/tools/yaml2obj/yaml2obj.cpp:47-49
+static cl::opt<bool>
+    NoLimit("nolimit", cl::init(false),
+            cl::desc("Do not limit the output size (ELF only)"));
----------------
jhenderson wrote:
> Rather than have this a boolean, I'd make it configurable, so that users who want a large object but have some upper bound in mind can do so. It might also be useful for testing yaml2obj itself - if we know that an object should be some specific size, but might end up larger due to a bug (e.g. file space usage of nobits), we could use the option to test it doesn't exceed that limit.
> 
> If you do that, I'd rename the option to `max-size`. `--nolimit` could be achieved by simply specifying a value of 0xffffffffffffffff, or we could special-case `--max-size=0` to mean unlimited. If you go down the route of 0 means unlimited, I'd make the size limit an `Optional` internally, and translate the 0 to `None` early (you could also just do the 0 -> uint64_tmax early instead).
> 
> Finally, this option actually only applies to section sizes currently. I think it would make sense to either a) apply it to the whole output, or b) rename to the option and change the help text further to indicate it only applies to sections (e.g. `--max-sections-size`).
I've implemented `--max-size`, I think it is better because it is more generic than `--max-sections-size` which looks like a something specific.

I am also have an idea how `--max-size` can be used in tests now, so thanks for the suggestion!
(Looking at the code, we always write all section headers now, even when some of/all of them are omitted with the use of `SectionHeaders` key.
seems it might be convenient to use the `--max-size` to verify this bit).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81258/new/

https://reviews.llvm.org/D81258