[LLVMdev] object construction patterns and unique_ptr

Tue Jun 19 13:51:09 PDT 2012

On Jun 18, 2012, at 4:36 PM, Michael Spencer wrote:
>>>> 
>>>> In asking around for clean models on how to do this better, I've heard two interesting ideas:
>>>> 
>>>> 1) Incorporate the error_code and string message into the File class.  That is the factory method always returns a File object. But there is a method on File objects to query if they are valid or not.  If they are not, there is a method to get the error_code and/or error string.
> 
> I like this with some modifications.
> 
>>>> 2) A variation on this is to have a new subclass of File called FileError.  If the Reader cannot construct a real FileFoo object because of some error, it instead constructs a FileError object and returns that.  The FileError object has methods to get the error_code and/or error string.
> 
My intuition is the opposite here.  If you add the error information to the lld::File base class then leads you to write the constructor for FileFoo() which does all the parsing, and if it runs into a problem, just sets the error ivar and returns (your FileCOFF is an example of that).  The problem this leads to is half constructed objects.  That can trip up a client which might accidentally call methods on the half constructed object  other than the error getting methods.  It also means the destructor has to be careful when tearing down the half constructed object.

On the other hand making a new subclass of lld::File called lld::FileError leads you to a different style of programming.  You have to open and parse the file past the point that any error can occur before you construct the FileFoo object.  If anything goes wrong up to the point, you instead construct a FileError object and return that.  

> I dislike this variant because a FileError is not a File.
I'm not sure what you mean by "not a File".  It is a subclass of lld::File.  Its path() method returns the file system path that tried to be used.  

Think of it as a proxy.  The real file could not parsed.  It represents that state.

> 
>>>> With both these ideas the Reader method is:
>>>> 
>>>>   virtual std::unique_ptr<lld::File> readFile(StringRef path);
>>>> 
>>>> which is much easier to use.
>>>> 
>>>> 
>>>> Any recommendations on a good way to design this?
>>> 
>>> Would an API like:
>>> 
>>>   virtual std::unique_ptr<lld::File> readFile(StringRef path, error_code &ErrorInfo);
>> 
>> That is a common pattern, used for instance in the bit code reader.  If we can unify the OS level error and the syntax/semantic errors from parsing object files, this pattern might work.
>> 
>> One other thought that leans me towards integrating the error code/msg into the File object is that it works cleaner when you start adding multithreading to the linker.  When you have multiple threads going, each reading an object file and one of them returns an error, you need to cleaning shut down the other threads, and return the error.  With the error status integrated into the File object, all you do is have each thread append it resulting File object onto a list, then after all threads are joined, run through the list once to see if any say they encountered an error.  If the error_code is returned separately from the File, you have to create some data structure to bind together the error_code with which file path could not be parsed.
>> 
>> -Nick
> 
> I believe that specifically for object file parsing errors a
> pair<error_code, vector<string> values> (or something similar) as a
> member of File would work. This would be used as:
> 
> // In the error_category stuff
> class blah_reader_error_category : public llvm::_do_message {
> public:
>  virtual std::string message(int ev) const {
>    ...
>    case uknown_reloc:
>      return "uknown relocation % at %";
>    ...
>  }
> };
> 
> // In the reader.
> 
> setError(make_pair(blah_error::unkown_reloc,
> vector<string>{relocationumer, loc})); // loc could be calculated by
> backtracking, or just the current file address. (we could even pass
> the address as a stop address to obj2yaml).
> 
> // Someplace else.
> 
> if (File->error())
>  o << File->getErrorMessage();
> 
> Which would do the string substitution.
What is the advantage of this delayed substitution?  The site creating the error needs to convert all values to strings. Your example has two numbers (reloc value and address) which would need to be converted to strings.  It would be simpler all around to just generate the complete string right there at the error site.  Are you trying to support localization? or error numbers?

Also, are these enum values per file format (i.e. does ELF and mach-o each define their own set)?  Or is there one big set of enums for use by all Readers?

> 
> This lets us not spend any time building up a representation for
> diagnostics when none will be printed, and still get decent
> diagnostics with well defined enum values when needed.

What is the rule of thumb of when a enum value is needed (as opposed to success/failure)?  My thinking is that an enum only makes sense if the client can do something about a specific enum value.  For instance in some future IDE, maybe file_not_found error can prod the IDE to recompile something to regenerate the object file.  Another reason for enums is if you want a way to localize error strings to different languages.  The enums are an index into a per-localization string table.  Given that these error (e.g. unknown relocation) are a bug in the compiler or linker and not a problem the user of the tools can remedy, I'm not sure they need localizing. 

-Nick