[LLVMdev] object construction patterns and unique_ptr

Tue Jun 19 14:25:10 PDT 2012

On Tue, Jun 19, 2012 at 1:51 PM, Nick Kledzik <kledzik at apple.com> wrote:
> On Jun 18, 2012, at 4:36 PM, Michael Spencer wrote:
>>>>>
>>>>> In asking around for clean models on how to do this better, I've heard two interesting ideas:
>>>>>
>>>>> 1) Incorporate the error_code and string message into the File class.  That is the factory method always returns a File object. But there is a method on File objects to query if they are valid or not.  If they are not, there is a method to get the error_code and/or error string.
>>
>> I like this with some modifications.
>>
>>>>> 2) A variation on this is to have a new subclass of File called FileError.  If the Reader cannot construct a real FileFoo object because of some error, it instead constructs a FileError object and returns that.  The FileError object has methods to get the error_code and/or error string.
>>
> My intuition is the opposite here.  If you add the error information to the lld::File base class then leads you to write the constructor for FileFoo() which does all the parsing, and if it runs into a problem, just sets the error ivar and returns (your FileCOFF is an example of that).  The problem this leads to is half constructed objects.  That can trip up a client which might accidentally call methods on the half constructed object  other than the error getting methods.  It also means the destructor has to be careful when tearing down the half constructed object.
>

Makes sense. You can't parse atoms as they are requested anyway, so
it's fine to do it all up front.

>
> On the other hand making a new subclass of lld::File called lld::FileError leads you to a different style of programming.  You have to open and parse the file past the point that any error can occur before you construct the FileFoo object.  If anything goes wrong up to the point, you instead construct a FileError object and return that.
>
>> I dislike this variant because a FileError is not a File.
> I'm not sure what you mean by "not a File".  It is a subclass of lld::File.  Its path() method returns the file system path that tried to be used.
>
> Think of it as a proxy.  The real file could not parsed.  It represents that state.

Yes it is a subclass of File, but it does not really represent one.
You cannot get its atoms, and you have to do a special check of the
run-time type, just like the special check for failed() above.

>
>>
>>>>> With both these ideas the Reader method is:
>>>>>
>>>>>   virtual std::unique_ptr<lld::File> readFile(StringRef path);
>>>>>
>>>>> which is much easier to use.
>>>>>
>>>>>
>>>>> Any recommendations on a good way to design this?
>>>>
>>>> Would an API like:
>>>>
>>>>   virtual std::unique_ptr<lld::File> readFile(StringRef path, error_code &ErrorInfo);
>>>
>>> That is a common pattern, used for instance in the bit code reader.  If we can unify the OS level error and the syntax/semantic errors from parsing object files, this pattern might work.
>>>
>>> One other thought that leans me towards integrating the error code/msg into the File object is that it works cleaner when you start adding multithreading to the linker.  When you have multiple threads going, each reading an object file and one of them returns an error, you need to cleaning shut down the other threads, and return the error.  With the error status integrated into the File object, all you do is have each thread append it resulting File object onto a list, then after all threads are joined, run through the list once to see if any say they encountered an error.  If the error_code is returned separately from the File, you have to create some data structure to bind together the error_code with which file path could not be parsed.
>>>
>>> -Nick
>>
>> I believe that specifically for object file parsing errors a
>> pair<error_code, vector<string> values> (or something similar) as a
>> member of File would work. This would be used as:
>>
>> // In the error_category stuff
>> class blah_reader_error_category : public llvm::_do_message {
>> public:
>>  virtual std::string message(int ev) const {
>>    ...
>>    case uknown_reloc:
>>      return "uknown relocation % at %";
>>    ...
>>  }
>> };
>>
>> // In the reader.
>>
>> setError(make_pair(blah_error::unkown_reloc,
>> vector<string>{relocationumer, loc})); // loc could be calculated by
>> backtracking, or just the current file address. (we could even pass
>> the address as a stop address to obj2yaml).
>>
>> // Someplace else.
>>
>> if (File->error())
>>  o << File->getErrorMessage();
>>
>> Which would do the string substitution.
> What is the advantage of this delayed substitution?  The site creating the error needs to convert all values to strings. Your example has two numbers (reloc value and address) which would need to be converted to strings.  It would be simpler all around to just generate the complete string right there at the error site.  Are you trying to support localization? or error numbers?

The point is supporting making decisions based on enum values while
still having dynamic strings. For instance, in the case of a valid
structure but unknown value, the linker could use obj2yaml to dump the
object file in human readable format up to the point where the unknown
value occurred. In the case of the yaml reader we actually have real
diagnostics we can give.

> Also, are these enum values per file format (i.e. does ELF and mach-o each define their own set)?  Or is there one big set of enums for use by all Readers?

A general set plus per format.

>
>>
>> This lets us not spend any time building up a representation for
>> diagnostics when none will be printed, and still get decent
>> diagnostics with well defined enum values when needed.
>
> What is the rule of thumb of when a enum value is needed (as opposed to success/failure)?  My thinking is that an enum only makes sense if the client can do something about a specific enum value.  For instance in some future IDE, maybe file_not_found error can prod the IDE to recompile something to regenerate the object file.  Another reason for enums is if you want a way to localize error strings to different languages.  The enums are an index into a per-localization string table.  Given that these error (e.g. unknown relocation) are a bug in the compiler or linker and not a problem the user of the tools can remedy, I'm not sure they need localizing.

Yes, an additional enum value is only needed when a distinction can be
made by higher level code. Localization is not a goal of this.

Also, by full representation I meant enough to actually go tell the
user how to edit the binary file to fix the parse error. We clearly
shouldn't be doing that. But even in the case of compiler bugs we
should still give good diagnostics for the people who get the bug
reports, and so the user doesn't feel totally lost.

>
>
> -Nick
>
>
>

I like the error_or<T> idea.

- Michael Spencer