[llvm-dev] [LLVM RFC] Add llvm.typeid.for intrinsic

Sun Aug 16 18:54:04 PDT 2015

On 2015/8/17 7:40, Alexei Starovoitov wrote:
> On Fri, Aug 14, 2015 at 10:05:12AM +0000, Wang Nan via llvm-dev wrote:
>> This is for BPF output. BPF program output bytes to perf through a
>> tracepoint. For decoding such data, we need a way to describe the format
>> of the buffer. This patch is a try which gives each variable a unique
>> number by introducing a new intrinsic 'llvm.typeid.for'.
>>
>> At the bottom is an example of using that intrinsic and the result
>> of
>>   $ clang -target bpf  -O2 -c  -S ./test_typeid.c
>>
>> There is a limitation of the newly introduced intrinsic that, I can't
>> find a way to make the intrinsic to accept all types without name
>> mangling. Therefore, we have to define different intrinsics for
>> different type. See the example below, by using macro trick, we define
>> llvm.typeid.for.p0struct.mystr and llvm.typeid.for.p0struct.mystr2, and
>> also the different output functions.
>>
>> Another problem is that I'm still unable to find a way to insert dwarf
>> information in this stage. After clang, debug information are already
>> isolated, and debug information entries are linked together. Adjusting
>> debug information requires me to create new metadata and new debug info
>> entries, link them properly then insert into correct place. Which is
>> possible, but makes code ugly.
>>
>> Because of the above two problems, I decided to try clang builtin
>> again. I think that should be the last try. If still not work, then
>> I'd like to stop working on it until I have any better idea (BCC
>> rewriter should be a considerable solution). Let patch series
>> 'Make eBPF programs output data to perf' be merged into upstream
>> without the 'typeid' change. Before the decoding problem solved, we
>> have to let user decode the BPF output themself manually or use
>> perf script or babeltrace script.
>>
>> Thank you.
>>   
>> @@ -6769,6 +6773,29 @@ void SelectionDAGBuilder::visitPatchpoint(ImmutableCallSite CS,
>>     FuncInfo.MF->getFrameInfo()->setHasPatchPoint();
>>   }
>>   
>> +void SelectionDAGBuilder::visitTypeidfor(const CallInst &CI) {
>> +  SDValue Res;
>> +  static std::vector<const StructType *> StructTypes;
> 'static' is obviously short term hack for illustration purpose, right?

Of course. Actually I don't like this solution. Please see my commit 
message.
>
>> +  int ID = -1;
>> +  Value *PtrArg = CI.getArgOperand(0);
>> +  PointerType *PTy = cast<PointerType>(PtrArg->getType());
>> +  if (PTy) {
>> +	  StructType *STy = cast<StructType>(PTy->getElementType());
>> +	  if (STy) {
>> +		for (unsigned i = 0, N = StructTypes.size(); i != N; ++i)
>> +			if (StructTypes[i] == STy)
>> +				ID = i + 1;
>> +		if (ID == -1) {
>> +			StructTypes.push_back(STy);
>> +			ID = StructTypes.size();
>> +		}
>> +	  }
>> +  }
>> unsigned long long __get_typeid_##name(struct name *) asm ("llvm.typeid.for.p0struct."#name); \
> the macro hack and the loop are quite ugly.

Quite sure. This is a hard limitation if we implement this in llvm 
intrinsic.
Instead, in clang we can use varargs:

BUILTIN(__builtin_bpf_typeid, "Wi.", "nc")

> Also how do you plane to correlate such ID to dwarf info?
> Instead of StructType we need to lookup DICompositeType,
> but looks like there is no clear connection between call
> arguments to metadata provided by clang.

Not sure. I'd like try clang intrinsic again.

> May be indeed it would be easier to add clang intrinsic
> that will add metadata number as explicit constant.
>
> I didn't really have time to explore this problem in depth.
> May be we can make the clear problem statement and someone
> on llvm list that familiar with debug info can help design
> a solution.
> Let me state what I think we're trying to do.
> For the program:
> void foo(void * ptr);
> void bar(...)
> {
>     struct S s;
>     ...
>     foo(&s);
> }
> We want to be able to scan .o file and for the callsite of
> foo, we want to be able to find an id of DICompositeType
> looking at binary code of .o, so we can lookup this id in
> dwarf info (that is also part of .o) and figure out the layout
> of the struct passed into the function foo.
>

Yes.

I think if we can generate program like this we solve this problem:

struct structure1 {
   int ID;
   int x;
   int y;
};
struct structure2 {
   int ID;
   int a;
   int b;
};

enum bpf_types {
   BPF_TYPE_structure1 = 1,
   BPF_TYPE_structure2 = 2,
};

int func(void)
{
   struct structure1 var1;
   struct structure2 var2;

   var1.ID = BPF_TYPE_structure1;
   var2.ID = BPF_TYPE_structure2;
   foo(&var1);
   foo(&var2);
   return 0;
}

The key is the enum type. The value of BPF_TYPE_structure{1,2} will be 
recorded
in DWARF info like:

  <1><2a>: Abbrev Number: 2 (DW_TAG_enumeration_type)
     <2b>   DW_AT_name        : (indirect string, offset: 0xf4): bpf_types
     <2f>   DW_AT_byte_size   : 4
     <30>   DW_AT_decl_file   : 1
     <31>   DW_AT_decl_line   : 12
  <2><32>: Abbrev Number: 3 (DW_TAG_enumerator)
     <33>   DW_AT_name        : (indirect string, offset: 0xcc): 
BPF_TYPE_structure1
     <37>   DW_AT_const_value : 1
  <2><38>: Abbrev Number: 3 (DW_TAG_enumerator)
     <39>   DW_AT_name        : (indirect string, offset: 0xe0): 
BPF_TYPE_structure2
     <3d>   DW_AT_const_value : 2

So we can connect the ID field and type with them.

DW_AT_const_value can also be used by const, so we may be enum can be 
replaced.

Thank you.