[LLVMdev] object construction patterns and unique_ptr

Wed Jun 20 12:36:59 PDT 2012

On Jun 20, 2012, at 12:28 AM, Chandler Carruth wrote:
> On Tue, Jun 19, 2012 at 11:18 AM, Jeffrey Yasskin <jyasskin at google.com> wrote:
> On Jun 18, 2012 1:24 PM, "Nick Kledzik" <kledzik at apple.com> wrote:
> >
> > On Jun 16, 2012, at 3:51 PM, Chris Lattner wrote:
> > > On Jun 15, 2012, at 3:48 PM, Nick Kledzik wrote:
> > >
> > >> In lld we have a Reader class which is a factory that takes .o file(s) and produces an in memory lld::File object(s).  But in doing so, there could be I/O errors (file not readable) or file may be malformed.  We are also using C++11 in lld, so we use std::unique_ptr for managing object ownership.
> > >>
> > >> The Reader class currently has an interface that can be simplified down to:
> > >>
> > >>   virtual error_code readFile(StringRef path, std::unique_ptr<lld::File> &result);
> > >>
> > >> But this interface has become awkward to use. There are two "return" values.  This method is a unique_ptr "source" but the use of a by-refernce parameter means you have to start with an uninitialized unique_ptr and the readFile() has to std::move() into it.   unique_ptr is much nicer to use as a return value.
> > >
> > > Not a c++'11 expert here, but does a returning an std::tuple work?
> > I talked to Howard about that.  It does work, but it is cumbersome.  Either:
> >
> > std::tuple<std::unique_ptr<lld::File>, error_code>  result = reader.readFile(path);
> >
> 
> I suggest an error_or<T> class that can be combined with any result type, not just File. It could be implemented with a fully generic variant class, but the interface would likely be better with some specialization to errors.

I too think that error_or<T> is better than std::pair<error_code, T>: 
* more compact
* can name fields (e.g. error, object instead of first, second)
* can embed asserts that object field is not accessed if there is an error.

But there are some drawbacks:
* lld uses C++11, so we thought we'd add explicit ownership by using std::unique_ptr.  That means we'd need to layer the templates (e.g. error_or<std::unique_ptr<lld::File>>), or forgo using unique_ptr, or cause error_or to always internally use unique_ptr.
* We still have not figured out how to encapsulate errors for object file parsing.  The last thread was an enum *and* and a string.

> I also really like the variant design. It has a few important advantages IMO:
> 
> 1) No pointers or heap allocations. These make code more bug prone and can needlessly hurt optimizers. (although the latter is hopefully not relevant in this specific case)
> 
> 2) No "error" state in the file object. Either you have a fully constructed object or you don't.
I do agree that either you successfully parsed the file and have a full File object or you failed and have some sort of error.  That means they are mutually exclusive, so having every File object carry an error_code is a waste and confusing.   But, that also make the case for the FileError subclass...

> 4) Doesn't require polymorphic types (but they can be supported if needed). Essentially, you can wrap a *specific* file type, rather than being forced to use some generic file type.
Yes, the template lets you do that.  But in the specific case of the linker, it will always be using the generic type.  You can pass the linker paths to object files, shared libraries, or static libraries.  The linker passes those paths onto the readFile() method which therefore must return the generic File type.

> 3) An explicit API to check and instrument whether an error has occured. For example, it is harder to "forget" to check the error, and carry on manipulating the file. The implementation can assert in debug builds to catch this early, etc.
The same assert functionality can be added to the FileError design too.  About the only thing you can do with a File is iterator its Atoms.  The FileError class can implement its iterators to assert.  Also, it is already the case that the linker handles libraries and object files differently, yet they both come back from readFile() as a generic File object.  The linker already has to sort all the File objects by kind, so noticing a FileError in the mix not a problem.

I've noticed resistance to the FileError design.  It seems natural to me having recently done some work on the Darwin linker to support threaded and pipelined linking.  The idea there is that you create an object immediately that will eventually become fully populated with Atoms.   But the object is first in an incomplete state until 1) the compiler is done producing the .o file, and 2) the Reader has parsed the .o file into atoms.   The linker main thread creates these incomplete objects and then waits for them to transition to the complete state (with atoms) or to an error state (if some parsing problem happened).  

Instead of thinking of FileError as a degenerate File with an error_code slapped on.  Think of FileError as an object encapsulating the attempt to read and parse the file.  There could be a rich interface to query it about what went wrong.  The error_code design attempts to capture everything about a failure in an 'int'.  The FileError class defines a rich object interface to the error.  Perhaps it could print out a range of bytes from the file and highlight which bits were bad.  Or perhaps it could dump some context showing all the sections or all the symbols and then why some bits are invalid within that context.  In other words, the error reporting is open ended with a FileError class. 

-Nick