[LLVMdev] Named metadata to represent language specific logic

Renato Golin rengolin at systemcall.org
Fri Mar 25 08:27:16 PDT 2011


Hi all,

I was wondering if we could use named metadata to store some of C++
logic without changing the IR. This is primarily only for front-end
buiding and the resulting IR (with or without metadata) should be the
same as it is today (or better).

I say this because of the number of global variables front-ends need
to keep because LLVM IR cannot represent all the information of types,
vatriables, functions (like sizes, offsets, alignment, linkage
semantics etc). So, if we could generate some generic IR with
annotations, and run a pass before validation that would convert all
those annotations into another, lower, IR, coding front-ends would be
much simpler.

That would also allow back-ends to understand those named metadata and
possibly generate correct code without the necessity of the final
pass, but I gather that some people find it repulsing to have metadata
with meaning in IR, so I won't go as far as to suggest that... ;)

Some examples below. Don't pay too much attention to the syntax or the
contents, I'm just brainstorming...

;====================================
; Unions & bitfields
; union U { int a; int b:3; int c:3; char d; }
%union.U = type { i32 }, !union;

!union = metadata { metadata !U.a, metadata !U.bc, metadata !U.d };
!U.a = metadata { metadata !intID, metadata !"align", i8 4 };
!U.bc = metadata { metadata !U.b, metadata !U.c };
!U.b = metadata { metadata !charID, metadata !"align", i8 4, metadata
!"size", i8 3 };
!U.c = metadata { metadata !charID, metadata !"align", i8 4, metadata
!"size", i8 3, metadata !"offset", i8 3 };
!U.d = metadata { metadata !charID, metadata !"align", i8 4 };

;====================================
; Linkage information on a function
; extern inline f_() { return "const string"; } // "const string" HAS
to be common to ALL comp.units
define linkonce_odr i8* @_Z2f_f() nounwind inlinehint, !extern {
entry:
  ret i8* getelementptr inbounds ([13 x i8]* @.str, i32 0, i32 0)
}

!extern = metadata { metadata !"common group", metadata !"_Z2f_f" };

-> so, if inside a function that has metadata "extern", returning the
constant string should place the string into a common group, even
though it's not declared itself as such.

;====================================
; Class size
; struct Base { char a[3]; Base() {} };
; struct Derived : Base { char b; }
%struct.Base = type { [3 x i8] }, !BasePadding;
%struct.Derived type { %struct.Base, i8 }, !DerivedPadding;

!BasePadding = { metadata !"size", i8 1 };
!DerivedPadding = { metadata !"size", i8 3 };

So, Base's padding is only applied when inside Derived, and GEP can
still work on the element directly. Sizes could be relative to WORD
size, if one wanted a truly generic IR, but that would raise a lot of
questions... Ignore that for now.


The final pass would replace all GEPs to those classes, unions,
constant returns into the confusing IR we have today.

I know each front-end could do that on its own, but if there an
interest among other front-end developers (specially C++) to have such
feature, we could do a more generic approach, so we could extend
support for specific languages without drastically changing the
LangRef. (As a matter of fact, is that something we want in the long
run?)

Would that benefit other languages that cannot be properly represented
in IR? OpenCL?

Thoughts welcome, even harsh ones. ;)

-- 
cheers,
--renato

http://systemcall.org/

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm



More information about the llvm-dev mailing list