[llvm-dev] [LLVM RFC] Add llvm.typeid.for intrinsic
Wangnan (F) via llvm-dev
llvm-dev at lists.llvm.org
Sun Aug 16 18:54:04 PDT 2015
On 2015/8/17 7:40, Alexei Starovoitov wrote:
> On Fri, Aug 14, 2015 at 10:05:12AM +0000, Wang Nan via llvm-dev wrote:
>> This is for BPF output. BPF program output bytes to perf through a
>> tracepoint. For decoding such data, we need a way to describe the format
>> of the buffer. This patch is a try which gives each variable a unique
>> number by introducing a new intrinsic 'llvm.typeid.for'.
>>
>> At the bottom is an example of using that intrinsic and the result
>> of
>> $ clang -target bpf -O2 -c -S ./test_typeid.c
>>
>> There is a limitation of the newly introduced intrinsic that, I can't
>> find a way to make the intrinsic to accept all types without name
>> mangling. Therefore, we have to define different intrinsics for
>> different type. See the example below, by using macro trick, we define
>> llvm.typeid.for.p0struct.mystr and llvm.typeid.for.p0struct.mystr2, and
>> also the different output functions.
>>
>> Another problem is that I'm still unable to find a way to insert dwarf
>> information in this stage. After clang, debug information are already
>> isolated, and debug information entries are linked together. Adjusting
>> debug information requires me to create new metadata and new debug info
>> entries, link them properly then insert into correct place. Which is
>> possible, but makes code ugly.
>>
>> Because of the above two problems, I decided to try clang builtin
>> again. I think that should be the last try. If still not work, then
>> I'd like to stop working on it until I have any better idea (BCC
>> rewriter should be a considerable solution). Let patch series
>> 'Make eBPF programs output data to perf' be merged into upstream
>> without the 'typeid' change. Before the decoding problem solved, we
>> have to let user decode the BPF output themself manually or use
>> perf script or babeltrace script.
>>
>> Thank you.
>>
>> @@ -6769,6 +6773,29 @@ void SelectionDAGBuilder::visitPatchpoint(ImmutableCallSite CS,
>> FuncInfo.MF->getFrameInfo()->setHasPatchPoint();
>> }
>>
>> +void SelectionDAGBuilder::visitTypeidfor(const CallInst &CI) {
>> + SDValue Res;
>> + static std::vector<const StructType *> StructTypes;
> 'static' is obviously short term hack for illustration purpose, right?
Of course. Actually I don't like this solution. Please see my commit
message.
>
>> + int ID = -1;
>> + Value *PtrArg = CI.getArgOperand(0);
>> + PointerType *PTy = cast<PointerType>(PtrArg->getType());
>> + if (PTy) {
>> + StructType *STy = cast<StructType>(PTy->getElementType());
>> + if (STy) {
>> + for (unsigned i = 0, N = StructTypes.size(); i != N; ++i)
>> + if (StructTypes[i] == STy)
>> + ID = i + 1;
>> + if (ID == -1) {
>> + StructTypes.push_back(STy);
>> + ID = StructTypes.size();
>> + }
>> + }
>> + }
>> unsigned long long __get_typeid_##name(struct name *) asm ("llvm.typeid.for.p0struct."#name); \
> the macro hack and the loop are quite ugly.
Quite sure. This is a hard limitation if we implement this in llvm
intrinsic.
Instead, in clang we can use varargs:
BUILTIN(__builtin_bpf_typeid, "Wi.", "nc")
> Also how do you plane to correlate such ID to dwarf info?
> Instead of StructType we need to lookup DICompositeType,
> but looks like there is no clear connection between call
> arguments to metadata provided by clang.
Not sure. I'd like try clang intrinsic again.
> May be indeed it would be easier to add clang intrinsic
> that will add metadata number as explicit constant.
>
> I didn't really have time to explore this problem in depth.
> May be we can make the clear problem statement and someone
> on llvm list that familiar with debug info can help design
> a solution.
> Let me state what I think we're trying to do.
> For the program:
> void foo(void * ptr);
> void bar(...)
> {
> struct S s;
> ...
> foo(&s);
> }
> We want to be able to scan .o file and for the callsite of
> foo, we want to be able to find an id of DICompositeType
> looking at binary code of .o, so we can lookup this id in
> dwarf info (that is also part of .o) and figure out the layout
> of the struct passed into the function foo.
>
Yes.
I think if we can generate program like this we solve this problem:
struct structure1 {
int ID;
int x;
int y;
};
struct structure2 {
int ID;
int a;
int b;
};
enum bpf_types {
BPF_TYPE_structure1 = 1,
BPF_TYPE_structure2 = 2,
};
int func(void)
{
struct structure1 var1;
struct structure2 var2;
var1.ID = BPF_TYPE_structure1;
var2.ID = BPF_TYPE_structure2;
foo(&var1);
foo(&var2);
return 0;
}
The key is the enum type. The value of BPF_TYPE_structure{1,2} will be
recorded
in DWARF info like:
<1><2a>: Abbrev Number: 2 (DW_TAG_enumeration_type)
<2b> DW_AT_name : (indirect string, offset: 0xf4): bpf_types
<2f> DW_AT_byte_size : 4
<30> DW_AT_decl_file : 1
<31> DW_AT_decl_line : 12
<2><32>: Abbrev Number: 3 (DW_TAG_enumerator)
<33> DW_AT_name : (indirect string, offset: 0xcc):
BPF_TYPE_structure1
<37> DW_AT_const_value : 1
<2><38>: Abbrev Number: 3 (DW_TAG_enumerator)
<39> DW_AT_name : (indirect string, offset: 0xe0):
BPF_TYPE_structure2
<3d> DW_AT_const_value : 2
So we can connect the ID field and type with them.
DW_AT_const_value can also be used by const, so we may be enum can be
replaced.
Thank you.
More information about the llvm-dev
mailing list