[lld] registry design proposal

Nick Kledzik kledzik at apple.com
Wed Dec 4 14:31:29 PST 2013


On Dec 4, 2013, at 9:56 AM, Shankar Easwaran <shankare at codeaurora.org> wrote:

>>  1) LinkingContext has a bunch of oddball methods that have just been added because it was convenient (such as getDefaultHandler which is really only needed by the Archive Reader).
> Gnu flavor has a lot of targets(X86_64, Hexagon, ARM, MIPS) and so on, it would not be possible to have a Generic Reader that encompasses all the issues. Currently we have the ELFTargetHandler, which is really flaky and is not really flexible enough to handle ARM, Hexagon in a sane way.
> 
> Targets may use LinkingContext to really get to the DefaultReader thats applicable for that target.
> 
> I am not sure if the new design addresses this.
The design gets rid of the Reader class. But does not do anything to help or hinder ELF’s need for architecture specific parsing.  You can continue to use the delegate object (TargetHandler), but you may need to hang it off the ELFFile object being created instead of hanging it off the LinkingContext.


> 
>> 2) The native file format does not record the “universe” for the kind field of References.  That is, a mach-o file could be converted to native and an ELF file converted to native and both may have a reference of kind 42, but the same 42 has different semantics.
> Does having a separate magic for the native file format to encompass different architectures solve this issue ?
No.  Image an x86 mach-o file converted to native and an x86 ELF file converted to native.  The are both the same arch, but Reference kind values have different semantics.  


> 
>> 3) identify_magic() is called redundantly in lots of places.
> The Driver needs to know the correct node type that needs to be created for a input file, for example on ELF, we have a mix of object files and libraries(the libraries can refer to linker scripts!, for example libc.so which is a linker script). I assume Mach-O also would need this information to group nodes properly.
> 
> I am not sure if the new design addresses this.
> 
> Driver would need to call identify_magic (or) some equivalent to decide what kind of node that needs to be created.
Can the File object be created first, and then use file->kind() to see if it is an object, archive, or shared library?  

> 
> 
>> 4) We don’t have a way for yaml to hand off control if a yaml test case contains heterogenous yaml file types.
> Agree.
>> 5) Readers have been refactored into a trivial role.  They just instantiate some File subclass.
> Yes, I thought they need to just return a parsed file holding atoms.
> 
>> 
>> There is a design pattern that solves all these problems.  Have a global registry of “handlers”.  Require clients to call functions to which register the handlers they request.  Then any code that needs a handler calls the global registry to call the needed handler.
>> 
>> I see the need for three registries:  a file parser registry, a yaml document tag registry, and a kind string registry.
>> 
>> Here is one way to implement a registry for the file parser:
>> 
>> class InputParserRegistry {
>> public:
>>   // A function to quickly check if a file can be parsed.  Usually only the
>>   // magic parameter is checked.  But it can also check the file extension, or
>>   // even look at more of the buffer if needed.
>>   typedef bool (*InputFileChecker)(file_magic magic, StringRef fileExtension,
>>                                    const MemoryBuffer &mb);
>>                                        // A function to parse the supplied input file.
>>   typedef error_code (*InputFileParser)(std::unique_ptr<MemoryBuffer> &mb,
>>                                 std::vector<std::unique_ptr<File>> &result);
>>                                    // Registers a checker and parser with the global registry.
>>   // Usually, the checker and parser are static methods in some lld::File
>>   // subclass that will be instantiated.
>>   static void registerParser(InputFileChecker checker,
>>                              InputFileParser parser);
>>                                        // Walks registered checkers to parse the specified file.
>>   static error_code parseFile(std::unique_ptr<MemoryBuffer> &mb,
>>                               std::vector<std::unique_ptr<File>> &result);
>> };
>> 
>> 
>> The mach-o linker tool would call:
>>    registryAddSupportMachO();
>>    registryAddSupportArchives();
>>    registryAddSupportNative();
>> 
>> The implementation of registryAddSupportMachO would look like:
>>   void registryAddSupportMachO() {
>>     InputParserRegistry::registerParser(FileMachO::check, FileMachO::parse);
>>   }
> Questions :-
> 
> a) Whats the relationship with the Readers and the InputParserRegistry ?
The Reader class would be removed.

> b) Is there a linkingContext thats associated with the InputParserRegistry ? If there is none, how does parseFile convey that info to the Reader ?
Hopefully, parsing no longer requires a linkingContext.

> c) How does the InputParserRegistry deal with figuring out the input file type at the Driver ?
The Driver just uses the Registry to parse files, then examines the generated lld::File object to decide what to do.

> d) How does the InputParserRegistry deal with automatically figuring out the flavor and the target from the first file that appeared in the input ?
The Driver asks the Registry to parse (the first) file, and then inspect the result.  lld::File needs methods to return the file's architecture and Reference-kind-namespace.


> 
>> The code (in custom InputGraph nodes) no longer needs to call identify_magic, or look at the file extension, or handle archives.  All it does is:
>>   InputParserRegistry::parseFile(_buffer, _files);
>> 
>> I’ve looked at the llvm::Registry<> classes.  It does not seem to be a good match for what we need here:  It is designed around using static variables whose construction does the registration.  I prefer explicit calls to register to enable clients to *not* drag in support they don’t want.  The  Registry<> model also assume you want to instantiate the object with the default constructor, which we don’t.  And we have no need for listeners or name/description. Yes, by overriding SimpleRegistryEntry and RegistryTraits we could use  llvm::Registry<>, but I’m not sure it is worth the effort.
>> 
> Overall I like this design, but I would like to know how we deal with the missing features, and I am not really following the relationships of how the InputParserRegistry is associated with a lot of Readers.
Readers are gone.  You manually register the parsers you want to support, then ask the Registry to pick the right parser and use it.

-Nick








More information about the llvm-commits mailing list