[llvm-dev] [LLVM RFC] Add llvm.typeid.for intrinsic

Sun Aug 16 16:40:57 PDT 2015

On Fri, Aug 14, 2015 at 10:05:12AM +0000, Wang Nan via llvm-dev wrote:
> This is for BPF output. BPF program output bytes to perf through a
> tracepoint. For decoding such data, we need a way to describe the format
> of the buffer. This patch is a try which gives each variable a unique
> number by introducing a new intrinsic 'llvm.typeid.for'.
> 
> At the bottom is an example of using that intrinsic and the result
> of
>  $ clang -target bpf  -O2 -c  -S ./test_typeid.c
> 
> There is a limitation of the newly introduced intrinsic that, I can't
> find a way to make the intrinsic to accept all types without name
> mangling. Therefore, we have to define different intrinsics for
> different type. See the example below, by using macro trick, we define
> llvm.typeid.for.p0struct.mystr and llvm.typeid.for.p0struct.mystr2, and
> also the different output functions.
> 
> Another problem is that I'm still unable to find a way to insert dwarf
> information in this stage. After clang, debug information are already
> isolated, and debug information entries are linked together. Adjusting
> debug information requires me to create new metadata and new debug info
> entries, link them properly then insert into correct place. Which is
> possible, but makes code ugly.
> 
> Because of the above two problems, I decided to try clang builtin
> again. I think that should be the last try. If still not work, then
> I'd like to stop working on it until I have any better idea (BCC
> rewriter should be a considerable solution). Let patch series
> 'Make eBPF programs output data to perf' be merged into upstream
> without the 'typeid' change. Before the decoding problem solved, we
> have to let user decode the BPF output themself manually or use
> perf script or babeltrace script.
> 
> Thank you.
>  
> @@ -6769,6 +6773,29 @@ void SelectionDAGBuilder::visitPatchpoint(ImmutableCallSite CS,
>    FuncInfo.MF->getFrameInfo()->setHasPatchPoint();
>  }
>  
> +void SelectionDAGBuilder::visitTypeidfor(const CallInst &CI) {
> +  SDValue Res;
> +  static std::vector<const StructType *> StructTypes;

'static' is obviously short term hack for illustration purpose, right?

> +  int ID = -1;
> +  Value *PtrArg = CI.getArgOperand(0);
> +  PointerType *PTy = cast<PointerType>(PtrArg->getType());
> +  if (PTy) {
> +	  StructType *STy = cast<StructType>(PTy->getElementType());
> +	  if (STy) {
> +		for (unsigned i = 0, N = StructTypes.size(); i != N; ++i)
> +			if (StructTypes[i] == STy)
> +				ID = i + 1;
> +		if (ID == -1) {
> +			StructTypes.push_back(STy);
> +			ID = StructTypes.size();
> +		}
> +	  }
> +  }
> unsigned long long __get_typeid_##name(struct name *) asm ("llvm.typeid.for.p0struct."#name); \

the macro hack and the loop are quite ugly.
Also how do you plane to correlate such ID to dwarf info?
Instead of StructType we need to lookup DICompositeType,
but looks like there is no clear connection between call
arguments to metadata provided by clang.
May be indeed it would be easier to add clang intrinsic
that will add metadata number as explicit constant.

I didn't really have time to explore this problem in depth.
May be we can make the clear problem statement and someone
on llvm list that familiar with debug info can help design
a solution.
Let me state what I think we're trying to do.
For the program:
void foo(void * ptr);
void bar(...)
{
   struct S s;
   ...
   foo(&s);
}
We want to be able to scan .o file and for the callsite of
foo, we want to be able to find an id of DICompositeType
looking at binary code of .o, so we can lookup this id in
dwarf info (that is also part of .o) and figure out the layout
of the struct passed into the function foo.