[PATCH] D20541: [esan|cfrag] Create the cfrag variable for the runtime

Thu May 26 09:08:46 PDT 2016

filcab added a comment.

In http://reviews.llvm.org/D20541#440917, @zhaoqin wrote:

> In http://reviews.llvm.org/D20541#440870, @filcab wrote:
>
> > Re: this change:
> >  I think it's better, but I don't really like one thing here: Finding out what's getting included in the anonymous structure has us browsing the source code in several different locations.
> >  Having to track what's getting pushed onto the `SmallVector` seems harder to follow without much gain.
> >  (I think the current solution is acceptable, but I'd prefer to have it be a bit easier to read/follow)
>
>
> I feel we are going circle here!!!
>  In my original CL, the full struct description and definition is at one place, i.e., in the tool specific createGV function.
>  I am asked to split it so the header (ToolType/UnitName) would be put at beginning of the struct for all different types of Tool GV.
>  Now it is hard to track what the struct is because of the split!!!

I don't think we're going full circle. But sorry if I'm hard to understand. Sometimes I might not explain what I would like to have in the best way. Feel free to ask for more details if what I'm asking for seems out of place or seems to go against what I (or others) have said before.

In your original CL you had `createCacheFragGV` which would:
 create a struct type
 create the string global for the module
 create the constant int and the constant expr to put in the struct
 create the struct itself
 create the GV

This is to create the GV argument, which tools need for getting information from the ESan pass, basically.

Out of those, some are not needed (create the struct type. We can use an anonymous struct), some are common to any tool that needs an extra argument (create the struct itself, create the GV), and others are, for now, common to every tool that gets an argument (create the string global for the module). The rest is tool-specific, and is about creating the values to put in the struct.

My previous comment was about extracting away the struct + GV creation (and not needing to have a struct type).
Those two steps are what I pointed to from clang's source: every check will create an `ArrayRef` with the data to be put on the struct, and then common code simply takes that and creates an anonymous structure.

Your change addressed those two points, and thanks for doing that. But the resulting code, for me, seems to be more confusing than it needs to be. One of the very good things about the clang example I posted was that you just need to look at the temporary array to figure out what exactly the structure that is passed to `__esan_init` contains. This also makes it easy to add/change fields to that structure (simply add a new `llvm::Constant *[]` element to the array creation).

My opinion is that it's much easier to create the array at a single place than to track down the `SmallVector`'s inserted elements.

> > How about:

> 

> >  The createEsanInitToolGV just dispatches to the appropriate (tool-specific) function and returns the same as that function (possibly creates the module string and passes it to the next function too, if you want to factor that out to here);

> 

> >  The tool-specific function:

> 

> > 

> 

> >   Gets all the info/adds constants

> 

> >   Sets up an array with the information to be placed in the structure

> 

> >   Calls into a helper "create a GV for tool init" function (optional if you want to return the GV from the tool-specific function, at the expense of code duplication)

> 

> >    

> 

> > 

> 

> > That way, each function deals only with one thing. Instead of having dispatch + (some) setup + GV creation all mixed in the same function. The creation of the structure is very easy to read, and you can easily make sure both places (here and compiler-rt) have a matching structure definition.

> 

> >  Adding a tool-specific function will also avoid introducing more bugs, since you'd only need to touch the dispatcher + your own tool's struct creator. The dispatcher wouldn't need to know anything about how the GV is created.

> 

> > 

> 

> > What do you think?

> 

> 

> I am not sure what's the difference from my original CL, where the struct is declared in each tool-specific create GV?

>  The original CL had a static struct declaration and then filled the entries and created the GV at the end of the function.

> 

>   StructType *CacheFragType =

>     StructType::get(Int32Ty, IntptrTy, nullptr);

>   ...

>   GlobalVariable *CacheFragGV = new GlobalVariable(

>       M, CacheFragType, true, GlobalVariable::InternalLinkage,

>       ConstantStruct::get(CacheFragType,

>                           ConstantInt::get(Int32Ty, Options.ToolType),

>                           ConstantExpr::getPointerCast(UnitName, IntptrTy),

>                           nullptr));

> 

> It seems clearer than pushing a vector in a separate the function and then create the GV without knowing what's inside.

> 

> Also, how do you guarantee the UnitName would be the first entry if each tool-specific function "Gets all the info/adds constants"?

The rationale is that each tool knows what things to place in its structure. If it wants to get the `UnitName` and wants it to be the first thing in the structure, it will place it there.
This is what is being done, for example, in clang's `EmitCheck` Look at the `lib/CodeGen/CGExpr.cpp` file I mentioned and search for calls to `EmitCheck` (especially the third parameter, usually called `StaticData`).

http://reviews.llvm.org/D20541