[LLVMdev] object construction patterns and unique_ptr

Michael Spencer bigcheesegs at gmail.com
Mon Jun 18 16:36:33 PDT 2012


On Mon, Jun 18, 2012 at 1:22 PM, Nick Kledzik <kledzik at apple.com> wrote:
> On Jun 16, 2012, at 3:51 PM, Chris Lattner wrote:
>> On Jun 15, 2012, at 3:48 PM, Nick Kledzik wrote:
>>
>>> In lld we have a Reader class which is a factory that takes .o file(s) and produces an in memory lld::File object(s).  But in doing so, there could be I/O errors (file not readable) or file may be malformed.  We are also using C++11 in lld, so we use std::unique_ptr for managing object ownership.
>>>
>>> The Reader class currently has an interface that can be simplified down to:
>>>
>>>   virtual error_code readFile(StringRef path, std::unique_ptr<lld::File> &result);
>>>
>>> But this interface has become awkward to use. There are two "return" values.  This method is a unique_ptr "source" but the use of a by-refernce parameter means you have to start with an uninitialized unique_ptr and the readFile() has to std::move() into it.   unique_ptr is much nicer to use as a return value.
>>
>> Not a c++'11 expert here, but does a returning an std::tuple work?
> I talked to Howard about that.  It does work, but it is cumbersome.  Either:
>
> std::tuple<std::unique_ptr<lld::File>, error_code>  result = reader.readFile(path);
>
> Or you use std::tie() but then you still have to declare the two local variables and tie them together.  You could define a wrapper class for the tuple, but then you realize it would be simpler to push the error_code into the File and get rid of the wrapper class.
>
>>> An other issue is that since llvm::error_code was designed to return a fixed set of strings, there is no way to return a dynamic error message (e.g. "relocation 27 is invalid").
>>
>> I'm pretty sure that the linker can specify a custom error_category with whatever information you want, and then return an error_code put with that.  Michael would know for sure, but it is definitely in the design of error_code for it to be extensible.
>
> >From what I can tell error_code is about wrapping out OS specific "int" error codes, and how to map those to static strings.  I suppose you could abuse it by having a error_category which has a static vector or recently dynamically generated strings and the int "error code" is an index into that vector.
>
> Clang has this great Diagnostics infrastructure for showing where the error occurred in the source file.  Does developing something like that make sense for object files whose source is not human readable?

The chance that a user will get an invalid object file is incredibly
low, and if they do get one there's generally little they can do about
it. However when this does happen we need to be able to figure out
what's going on. So we do not need clang style diagnostic printing for
object file parsing, but we do still need to get useful diagnostics.

>>> In asking around for clean models on how to do this better, I've heard two interesting ideas:
>>>
>>> 1) Incorporate the error_code and string message into the File class.  That is the factory method always returns a File object. But there is a method on File objects to query if they are valid or not.  If they are not, there is a method to get the error_code and/or error string.

I like this with some modifications.

>>> 2) A variation on this is to have a new subclass of File called FileError.  If the Reader cannot construct a real FileFoo object because of some error, it instead constructs a FileError object and returns that.  The FileError object has methods to get the error_code and/or error string.

I dislike this variant because a FileError is not a File.

>>> With both these ideas the Reader method is:
>>>
>>>   virtual std::unique_ptr<lld::File> readFile(StringRef path);
>>>
>>> which is much easier to use.
>>>
>>>
>>> Any recommendations on a good way to design this?
>>
>> Would an API like:
>>
>>   virtual std::unique_ptr<lld::File> readFile(StringRef path, error_code &ErrorInfo);
>
> That is a common pattern, used for instance in the bit code reader.  If we can unify the OS level error and the syntax/semantic errors from parsing object files, this pattern might work.
>
> One other thought that leans me towards integrating the error code/msg into the File object is that it works cleaner when you start adding multithreading to the linker.  When you have multiple threads going, each reading an object file and one of them returns an error, you need to cleaning shut down the other threads, and return the error.  With the error status integrated into the File object, all you do is have each thread append it resulting File object onto a list, then after all threads are joined, run through the list once to see if any say they encountered an error.  If the error_code is returned separately from the File, you have to create some data structure to bind together the error_code with which file path could not be parsed.
>
> -Nick

I believe that specifically for object file parsing errors a
pair<error_code, vector<string> values> (or something similar) as a
member of File would work. This would be used as:

// In the error_category stuff
class blah_reader_error_category : public llvm::_do_message {
public:
  virtual std::string message(int ev) const {
    ...
    case uknown_reloc:
      return "uknown relocation % at %";
    ...
  }
};

// In the reader.

setError(make_pair(blah_error::unkown_reloc,
vector<string>{relocationumer, loc})); // loc could be calculated by
backtracking, or just the current file address. (we could even pass
the address as a stop address to obj2yaml).

// Someplace else.

if (File->error())
  o << File->getErrorMessage();

Which would do the string substitution.

This lets us not spend any time building up a representation for
diagnostics when none will be printed, and still get decent
diagnostics with well defined enum values when needed.

So:

virtual std::unique_ptr<lld::File> readFile(StringRef path);

class File {
public:
  bool error();
  error_code getError();
  string getErrorMessage();
protected:
  void setError(pair<error_code, vector<string>>);
private:
  pair<error_code, vector<string>> Error;
}

I believe we may also want to tablegen the different errors so it's
trivial to add a new one.

Note that for proper linker errors (undefined reference etc...) I
believe we need something more extensive.

- Michael Spencer




More information about the llvm-dev mailing list