[LLVMdev] Status of YAML IO?

Thu Nov 1 14:13:48 PDT 2012

On Oct 31, 2012, at 7:46 PM, Shankar Easwaran wrote:

> Hi Nick,
>>> The range of flags would be integers ranging from LOW_PROC .. HIGH_PROC.
>>> 
>>> The Generic flags would be within the range less than LOW_PROC  and greater HIGH_PROC. Any value within the range LOW_PROC .. HIGH_PROC is os/platform specific.
>>> 
>>> What I was thinking was there could be a uint32_t flags() in the definedAtom which returns the flags, and platforms can act accordingly on the meaning of the flags in their pieces of code.
>>> 
>>> What do you think ?
>> You still have not given an example of what information is missing in the current Atom model that is driving the need for this.
>> 
>> It sounds like your flags() returns a value - not a set of bits.  Which means it can only be used for one thing.  What if you need two or more kinds of information/attributes not in the Atom model?   I don't see why LOW_PROC, HIGH_PROC is needed.  If we decide there are new kinds of information/attributes that are general we would just define new methods on Atom, rather than define a value to be returned by flags().
> There are two usecases that I can think of now :-
> 
> 1) flags :- These are used to determine what the Atom contains in addition to the content, could be that the Atom has
>    a) follow on reference
>    b) atom is part of a group, where other atoms are part of
> 
> The flags could be used to determine if there is a follow up atom or if the atom is part of a group.
> Both of them would be useful than iterating through the reference list and iterating it and figuring out if there is a follow on reference / atom being part of a group.
I see layout constraints (follow-on) and grouping as a natural use for References.  Seems like your concern is just performance.  I would wait and see if the searching of References for special kinds is actually a bottleneck in practice, then we can talk about was to improve the performance.

> 
> 2) Atom specific content types
> 
> This is where the LOW_PROC, and HIGH_PROC comes in, there are content types which are architecture specific.
> 
> Currently there are many types defined within contentType which are operating system specific. As more environments start using lld, I feel that many architectures would want to add.
I've already added all the content types that darwin needs.  I think it is fine for you to add any that ELF needs. 

> 
> Example for GNU support would include
> 
> a) checksum
> b) hash
> c) gnu prelink library list

These are actually generic attributes that could be made into real attributes (methods) of DefinedAtom.  

I've been thinking about adding something like a checksum for you in coalescing by-content (for instance coalescing duplication c-strings or other constants).  For that, having a checksum would speed up comparisons.

I'm not sure if you mean a hash of the content or hash of the name.  On the name side, I've had thoughts of reworking Atom::name() to return some new abstract type like SymbolName, instead of StringRef.  The idea is that SymbolName maintains a hash for the string so equality checks are fast. It can also be used to help reduce the size of the new "native" format for object files of C++ code.  With C++ (especially with namespaces) generates huge symbol names.   A more compact format would be to factor out all the common substrings and use a dictionary coder.  Thus in the native object file, each symbol name is some data stream of chars and dictionary indices. 

I'm not sure what "gnu prelink library list" has to do with individual Atoms.

-Nick

> I believe both of them would be solvable, by using a 64bit unsigned integer, where in the lower half is used by content type and the upper is used by flags. I dont think we would need more than 32 flags anytime soon. But atleast there is a possibility of adding more flags.
> 
> I think flags should be supported only by lld Core.
> 
> Thanks
> 
> Shankar Easwaran