[cfe-dev] Function pointer type becomes empty struct

David Wiberg via cfe-dev cfe-dev at lists.llvm.org
Thu Nov 17 14:55:46 PST 2016


Hi Chris,

2016-11-13 12:49 GMT+01:00 Christian Dehnert via cfe-dev
<cfe-dev at lists.llvm.org>:
> Hi cfe-dev,
>
> I am using clang to compile C programs to LLVM IR. I have the following
> program:
>
> struct A {
>    int (*f)(int, struct A*);
>    int (*g)(char, struct A*);
> };
>
> int h(int a, struct A* b) {
> return 0;
> }
>
> int i(char a, struct A* b) {
> return 1;
> }
>
> int main() {
> struct A a;
> a.f = h;
> a.g = 0;
> a.f(0, &a);
> return 0;
> }
>
> Now, if I compile this to LLVM IR, the struct A becomes:
>
> %struct.A = type { {}*, i32 (i8, %struct.A*)* }
>
> So the type of function pointer f in A is {}* and the type of g (in A) is
> (just as I would expect of type i32 (i8, %struct.A*)*. Now, why is the type
> for f abbreviated like this? If I change the order of the functions in the C
> program by swapping the order of the definitions of h(...) and i(…), the
> type of the struct A becomes:
>
> %struct.A = type { i32 (i32, %struct.A*)*, {}* }
>
> Now the type of f is spelt out completely as I would expect, but g’s type is
> given by {}*. If I slightly change the type of the function pointers whose
> type becomes {}* (e.g. changing “int” to “long” or something like this), I
> get their full types back again.
>

TL;DR - Clang tries to determine LLVM IR types related to the first
function but bails out due to the recursive nature of the type
hierarchy.

I thought this was an interesting question so I spent some time trying
to understand what happens. I can't however say if the behavior is
correct. Note that I haven't looked at this code previously so please
correct me if I've misunderstood something.

- Clang handles top level declarations first. In your example the
first one is the function "h". To be able to emit LLVM IR, information
about the function is collected. One thing done is to iterate over the
function arguments (to determine IR types?). The first interesting
argument is the pointer to struct A.
- To compute the layout of the struct, there's an iteration over the
struct fields which eventually triggers an attempt to get the LLVM
type for each field.
  - The first field is a function pointer and an attempt is made to
gather information regarding the function. But since this function is
the same as the one we are currently processing, this attempt is
aborted (to avoid recursion?) and an empty struct is created as type
instead. This is the unexpected type you see in your output.
  - The second field is also a function pointer but the main
difference is that this function type isn't currently being processed
which means that it can be handled. (The handling actually leads into
the same call stack as for the previous function. But in this case,
once the pointer to struct A argument is found, it is determined that
the struct is currently being processed and further handling is
deferred.) Since it was possible to handle the function the expected
type is returned.

> From what I can see, the function pointer types within struct A become {}*
> if they coincide with the type of the first function following the structure
> definition that involves the struct itself. In other words, if I put a
> function
>
> int j() {
> return 0;
> }
>
> between struct A and h, then everything stays as it is (i.e. one of the
> types is going to be {}*),

If you insert a function first which doesn't use struct A there's no
need to determine the types for the struct and the behavior stays the
same.

> but if I make it
>
> int j(struct A* a) {
> return 0;
> }
>
> the IR for %struct.A changes (and I suddenly get the full types for both
> function pointers).

If you insert a function which uses struct A but doesn't cause the
recursive behavior when looking up the types of struct A you get the
wanted behavior.

One possible way of fixing this (if it is considered an issue) might
be to defer the handling of structs if they contain fields with
function pointers. I'm however not sure that this is correct or if it
has unwanted side-effects.

Best regards
David

>
> How is the type {}* supposed to be interpreted here? Why is it, that
> depending on what follows, the type is “abbreviated” as {}* and sometimes
> not?
>
> I am using AppleClang on Mac OS 10.12, but I get the same behavior with
> pre-built clang 3.9 downloaded from the LLVM website.
>
> Best wishes,
> Chris
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>



More information about the cfe-dev mailing list