<div dir="ltr"><div>Hi all,</div><div><br></div>The bitset metadata currently used in LLVM has a few problems:<div><br></div><div>1. It has the wrong name. The name "bitset" refers to an implementation detail of one use of the metadata (i.e. its original use case, CFI). This makes it harder to understand, as the name makes no sense in the context of virtual call optimization.</div><div>2. It is represented using a global named metadata node, rather than being directly associated with a global. This makes it harder to manipulate the metadata when rebuilding global variables, summarise it as part of ThinLTO and drop unused metadata when associated globals are dropped. For this reason, CFI does not currently work correctly when both CFI and vcall opt are enabled, as vcall opt needs to rebuild vtable globals, and fails to associate metadata with the rebuilt globals. As I understand it, the same problem could also affect ASan, which rebuilds globals with a red zone.</div><div><br></div><div>I would like to solve both of these problems in the following way:</div><div><br></div><div>1. Rename the metadata to "type metadata". This new name reflects how the metadata is currently being used (i.e. to represent type information for CFI and vtable opt).</div><div>2. Attach metadata directly to the globals that it pertains to, rather than using the "llvm.bitsets" global metadata node as we are doing now. This would be done using the newly introduced capability to attach metadata to global variables (r271348 and r271358). Passes which manipulate globals can easily copy metadata between globals with the GlobalObject::copyMetadata function, which would be taught to understand type metadata.</div><div><br></div><div>To give an example of how this would look, suppose that we have the following declarations:</div><div><br></div><div>class A {</div><div> virtual void f() {}</div><div>};<br clear="all"><div><br></div><div>class B : public A {</div><div> virtual void f() {}</div><div> virtual void g() {}</div><div>};<br></div><div><br></div><div>The vtables for A and B would be represented in IR like this:</div><div><br></div><div>@_ZTV1A = constant [3 x i8*] [i8* ..., i8* ..., i8* @A::f], !type !0</div><div>@_ZTV1B = constant [4 x i8*] [i8* ..., i8* ..., i8* @B::f, i8* @B::g], type !0, !type !1</div><div><br></div><div>!0 = {i64 16, !"A"}</div><div>!1 = {i64 16, !"B"}</div><div><br></div><div>The metadata !0 indicates that the attached global has an address point for the type A at byte offset 16, and metadata !1 indicates that the attached global has an address point for the type B at byte offset 16. We attach !0 to _ZTV1A, which indicates that the vtable for A has a valid address point for A at offset 16, and attach both !0 and !1 to _ZTV1B, which indicates that the vtable for B has a valid address point for both A and B at offset 16.</div><div><br></div><div>I also plan to apply this renaming to existing passes and intrinsics that use the "bitset" name.</div><div><br></div><div>Thanks,</div><div>-- <br></div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div>
</div></div>