[llvm-dev] RFC: a more detailed design for ThinLTO + vcall CFI

Tue Oct 25 17:27:47 PDT 2016

Hi all,

As promised, here is a brain dump on how I see CFI for vcalls working under
ThinLTO. Most of this has been prototyped, so the design does appear to be
sound. For context on how CFI currently works under regular LTO, please
read:

http://llvm.org/docs/TypeMetadata.html
http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html
http://clang.llvm.org/docs/LTOVisibility.html

==== Summary extensions ====

The combined summary index would be extended to include a mapping from type
identifiers to "resolutions". The resolution would control what type of
code we generate to create a CFI check for that type identifier. Here are
the resolutions that we would support:

Inline32, Inline64, SingleBit: these would cause us to generate code as
described in "Short Inline Bit Vectors" in the design document:
http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html#short-inline-bit-vectors
AllOnes: this would cause us to generate code as described in "Eliminating
Bit Vector Checks for All-Ones Bit Vectors" in the design document:
http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html#eliminating-bit-vector-checks-for-all-ones-bit-vectors
Unsat: no vtable is a member of that type identifier, so we can simply
replace type checks for that type identifier with "false"
ByteArray: we emit the general form of the type check, similar to the one
shown at the end of
http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html#forward-edge-cfi-for-virtual-calls
just before "Optimizations"

Armed with that information, we have a general idea of what the code to
implement the type check would look like. In fact, given one of these
values, the code will be identical to any other check that uses that
resolution, modulo the constant values embedded in the code.

To expand on what I mean by "constant values", let's look at a typical CFI
check in the ByteArray case. Consider this module (based on
test/Transforms/LowerTypeTests/simple.ll):

@a = constant i32 1, !type !0, !type !2
@b = constant [63 x i32] zeroinitializer, !type !0, !type !1
@c = constant i32 3, !type !1, !type !2
@d = constant [2 x i32] [i32 4, i32 5], !type !3

!0 = !{i32 0, !"typeid1"}
!3 = !{i32 4, !"typeid1"}
!1 = !{i32 0, !"typeid2"}
!2 = !{i32 0, !"typeid3"}

define i1 @baz(i32* %p) {
  %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid3")
  ret i1 %x
}

Here is the IR after the LowerTypeTests pass has run. I've marked the
places where we use a constant value:

define i1 @baz(i32* %p) {
  %pi8 = bitcast i32* %p to i8*
  %1 = ptrtoint i8* %pi8 to i32
  %2 = sub i32 %1, ptrtoint ({ i32, [0 x i8], [63 x i32], [4 x i8], i32, [0
x i8], [2 x i32] }* @0 to i32)
  %3 = lshr i32 %2, 2 ; CONSTANT: rotate count
  %4 = shl i32 %2, 30 ; CONSTANT: 32-rotate count
  %5 = or i32 %3, %4
  %6 = icmp ult i32 %5, 66 ; CONSTANT: size of byte array
  br i1 %6, label %7, label %12

; <label>:7:                                      ; preds = %0
  %8 = getelementptr i8, i8* @bits_use, i32 %5
  %9 = load i8, i8* %8
  %10 = and i8 %9, 2 ; CONSTANT: bit mask
  %11 = icmp ne i8 %10, 0
  br label %12

; <label>:12:                                     ; preds = %0, %7
  %13 = phi i1 [ false, %0 ], [ %11, %7 ]
  ret i1 %13
}

Here is what the asm for the above looks like:

baz:
leaq .L__unnamed_1(%rip), %rax
subl %eax, %edi
roll $30, %edi ; CONSTANT: 32-rotate count
cmpl $65, %edi ; CONSTANT: size of byte array
ja .LBB2_1

movslq %edi, %rax
leaq .Lbits_use(%rip), %rcx
movb (%rax,%rcx), %al
andb $2, %al ; CONSTANT: bit mask
shrb %al
retq
.LBB2_1:
xorl %eax, %eax
retq

A naive summary encoding would map a type identifier to a tuple of
(resolution, rotate count, size of byte array, bit mask), and pull the
latter three out of the summary as constants. However, the disadvantage of
hard coding the constants in the IR like this is that any change to one of
the constants will invalidate any cache entry that depends on a constant
value. For example, if I add a class to a hierarchy, that would most likely
increase the size of the byte array, which may invalidate every cache entry
containing a check for a class in that hierarchy. To avoid this, we only
include resolutions in the summary and obtain the constants using absolute
symbol references. Here is what the asm code may look like:

baz:
leaq __typeid_typeid3_global_addr(%rip), %rax
subl %eax, %edi
rorl $__typeid_typeid3_rotate_count, %edi
cmpl $__typeid_typeid3_size, %edi
ja .LBB2_1

movslq %edi, %rax
leaq __typeid_typeid3_byte_array(%rip), %rcx
movb (%rax,%rcx), %al
andb $__typeid_typeid3_bitmask, %al
shrb %al
retq
.LBB2_1:
xorl %eax, %eax
retq

The appropriate representation for this at the IR level is the subject of
the RFC entitled 'Absolute or "fixed address" symbols as immediate
operands'. We can imagine, though, that it could look something like this:

@__typeid_typeid3_rotate_count = external globalconst i8
@__typeid_typeid3_size = external globalconst i32
@__typeid_typeid3_bit_mask = external globalconst i8
@__typeid_typeid3_byte_array = external global i8
@__typeid_typeid3_global_addr = external global i8

define i1 @baz(i32* %p) {
  %pi8 = bitcast i32* %p to i8*
  %1 = ptrtoint i8* %pi8 to i32
  %2 = sub i32 %1, ptrtoint (i8* @__typeid_typeid3_global_addr to i32)
  %3 = lshr i32 %2, i8 @__typeid_typeid3_rotate_count
  %4 = shl i32 %2, sub i8 (i8 32, i8 @__typeid_typeid3_rotate_count)
  %5 = or i32 %3, %4
  %6 = icmp ult i32 %5, i32 @__typeid_typeid3_size
  br i1 %6, label %7, label %12

; <label>:7:                                      ; preds = %0
  %8 = getelementptr i8, i8* @__typeid_typeid3_byte_array, i32 %5
  %9 = load i8, i8* %8
  %10 = and i8 %9, @__typeid_typeid3_bitmask
  %11 = icmp ne i8 %10, 0
  br label %12

; <label>:12:                                     ; preds = %0, %7
  %13 = phi i1 [ false, %0 ], [ %11, %7 ]
  ret i1 %13
}

==== Getting resolutions into the summary ====

Now that we've looked at how code is generated for the individual modules,
let's look at how the resolutions are computed and put into the summary.

The first step happens when we compile the translation unit with clang. I
propose to change the bitcode format for any module that defines virtual
tables with hidden LTO visibility. The bitcode would contain two modules:
one to be compiled with regular LTO and the other to be compiled with
ThinLTO. The regular LTO module would contain only the following:

   - the definitions of those virtual tables that have hidden LTO visibility
   - the !type metadata for those virtual tables
   - a list of type tests required by the corresponding ThinLTO module, in
   a GlobalMDNode named "llvm.export.type.tests"

The ThinLTO module would contain the rest of the original module.

For example, here is what the regular LTO module for our example above
would look like:

@a = constant i32 1, !type !0, !type !2
@b = constant [63 x i32] zeroinitializer, !type !0, !type !1
@c = constant i32 3, !type !1, !type !2
@d = constant [2 x i32] [i32 4, i32 5], !type !3

!0 = !{i32 0, !"typeid1"}
!3 = !{i32 4, !"typeid1"}
!1 = !{i32 0, !"typeid2"}
!2 = !{i32 0, !"typeid3"}

!8 = !{!"typeid3"}
!llvm.export.type.tests = !{!8}

and here is the ThinLTO module:

define i1 @baz(i32* %p) {
  %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid3")
  ret i1 %x
}

The regular LTO modules are merged and compiled in the usual way, so we
would end up merging the type definitions together with the list of
required data.

When the LowerTypeTests pass is given a module with the
"llvm.export.type.tests" global MDNode, it "exports" each of the type
identifiers mentioned in the MDNode by storing a resolution in the combined
summary, and by creating definitions of each of the symbols that shall be
required to satisfy the link time dependencies in the individual ThinLTO
modules. Here is an example of what the combined module would look like:

@0 = [...] ; combined global for a, b, c and d
@bits = [...] ; combined global byte array

@__typeid_typeid3_global_addr = hidden alias i8, bitcast ({...}* @0 to i8*)
@__typeid_typeid3_rotate_count = hidden globalconst i8 2
@__typeid_typeid3_size = hidden globalconst i32 65
@__typeid_typeid3_byte_array = hidden alias i8, i8* @bits
@__typeid_typeid3_bit_mask = hidden globalconst i8 2

The ThinLTO backend processes would run after regular LTO, so they would
have access to the resolutions in the summary. If the module summary is
present, LowerTypeTests will "import" the appropriate type identifier by
generating code containing external references to these symbols, as
described previously.

Thanks,
-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161025/8404fc51/attachment.html>