[PATCH] D61809: [BPF] Preserve original struct/union type name/access index and array subscripts

Fri May 10 16:27:49 PDT 2019

rsmith added a comment.

If I understand correctly, you want to be able to compile a program against some `struct` and `union` layouts, and then at load time "update" the program to cope with the actual layouts for those types being something else, but containing (at least) the set of members you know about. And your proposed approach is to add intrinsics into the IR that identify the GEPs that we emitted to model struct field accesses, but to otherwise not change the emitted IR. If so, I don't see how this approach to that problem can work. There seem to be a couple of problems:

>   %2 = alloca %struct.sk_buff*, align 8

The size you allocate here will presumably need to vary as the struct layout changes, and you have no way of knowing which `alloca`s will need their sizes to be changed.

>   %6 = getelementptr inbounds %struct.sk_buff, %struct.sk_buff* %5, i32 0, i32 2, !dbg !54
>   %7 = call [10 x %union.anon]*
>       @llvm.preserve.di.access.index.p0a10s_union.anons.p0a10s_union.anons.p0s_struct.sk_buffs(
>       [10 x %union.anon]* %6, %struct.sk_buff* %5,
>       i8* getelementptr inbounds ([8 x i8], [8 x i8]* @0, i32 0, i32 0), i32 3), !dbg !54

If this call is supposed to somehow represent that the `%6` GEP is computed from `%5` by way of a struct field access, this will not be robust against any kind of optimization: optimizations will happily rewrite uses of `%6` to use `%5` or some other form directly, and you will have lost track of the pointers you need to update.

It would seem better to me to represent a relocateable GEP directly: that is, instead of emitting a regular GEP and some fixup intrinsic, you could emit an intrinsic call //instead of// the GEP, and have the intrinsic compute the field offset. (And you'll need to do something about your `alloca`s, such as using an intrinsic to get the struct size and emitting a dynamic alloca of that size.) For example, instead of the above, you could emit

>   %6 = call i8* @llvm.preserve.di.access.gep.2(i8* %5, i32 0, i32 2, !typeinfo)

(where `!typeinfo` is whatever information you need to identify `struct sk_buff` later on, when generating your relocations, and the indices you pass in are whichever ones you need to identify the subobject within that type information). Does that make sense?

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61809/new/

https://reviews.llvm.org/D61809