[clang] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)
via cfe-commits
cfe-commits at lists.llvm.org
Wed Nov 20 23:21:44 PST 2024
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-clang-codegen
Author: Scott Constable (scottconstable)
<details>
<summary>Changes</summary>
Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect calls by comparing a 32-bit hash of the function pointer's type against a hash of the target function's type. If the hashes do not match, the kernel may panic (or log the hash check failure, depending on the kernel's configuration). These hashes are computed at compile time by applying the xxHash64 algorithm to each mangled canonical function (or function pointer) type, then truncating the result to 32 bits.
Like any hashing scheme, hash collisions are possible. For example, a commodity Linux kernel configured for Ubuntu 24.04 server has 141,617 total indirect call targets, with 10,903 unique function types. With a 32-bit kCFI hash, the expected number of collisions is 10,903-2^32+2^32*(1-1/(2^32))^10,903 = 0.01383765 (see https://courses.cs.duke.edu/cps102/spring09/Lectures/L-18.pdf for the formula). This number can balloon with the addition of drivers and kernel modules.
This patch reduces both the expected number of collisions and the potential impact of a collision by augmenting the hash with an arity value that indicates how many parameters the function has at the ABI level. Specifically, the patch further truncates the kCFI hash down to 29 bits, then concatenates a 3-bit arity indicator as follows:
| Arity Indicator | Description |
| --------------- | --------------- |
| 0 | 0 parameters |
| 1 | 1 parameter in RDI |
| 2 | 2 parameters in RDI and RSI |
| 3 | 3 parameters in RDI, RSI, and RDX |
| 4 | 4 parameters in RDI, RSI, RDX, and RCX |
| 5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 |
| 6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 |
| 7 | At least one parameter may be passed on the stack |
This scheme enhances security in two ways. First, it prevents a j-arity function pointer from being used to call a k-arity function, unless j=k. The current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from calling a 3-arity target if the kCFI hashes collide. If this were to happen, then potentially malicious stale/dead data in RDX at the call site could suddenly become live as the third parameter at the call target.
Second, this scheme reduces the expected number of hash collisions within each arity, compared against the expected number of collisions (0.01383765) for the 32-bit hashing scheme that includes all arities. The table below shows the expected number of collisions for each arity, given the number of unique indirect callable function types within that arity in the same Ubuntu 24.04 server kernel discussed above.
| Arity | Unique Indirect Callable Function Types | Number of Expected Collisions |
| ----- | --------------------------- | --------------------------------- |
| 0 | 32 | 0.00000092 |
| 1 | 2492 | 0.00578125 |
| 2 | 3775 | 0.01326841 |
| 3 | 2547 | 0.00603931 |
| 4 | 1169 | 0.00127162 |
| 5 | 519 | 0.00025038 |
| 6 | 221 | 0.00004528 |
| 7 | 148 | 0.00002026 |
One additional benefit of this patch is that it can benefit other CFI approaches that build on kCFI, such as FineIBT. For example, this proposed enhancement to FineIBT must be able to infer (at kernel init time) which registers are live at an indirect call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the kCFI type ID, then this information is trivial to infer.
---
Full diff: https://github.com/llvm/llvm-project/pull/117121.diff
3 Files Affected:
- (modified) clang/lib/CodeGen/CodeGenModule.cpp (+28-3)
- (modified) clang/test/CodeGen/kcfi-normalize.c (+12-6)
- (modified) clang/test/CodeGen/kcfi.c (+19-3)
``````````diff
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index b854eeb62a80ce..7cc6f120ec39a9 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2183,7 +2183,8 @@ llvm::ConstantInt *CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) {
}
llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
- if (auto *FnType = T->getAs<FunctionProtoType>())
+ auto *FnType = T->getAs<FunctionProtoType>();
+ if (FnType)
T = getContext().getFunctionType(
FnType->getReturnType(), FnType->getParamTypes(),
FnType->getExtProtoInfo().withExceptionSpec(EST_None));
@@ -2196,8 +2197,32 @@ llvm::ConstantInt *CodeGenModule::CreateKCFITypeId(QualType T) {
if (getCodeGenOpts().SanitizeCfiICallNormalizeIntegers)
Out << ".normalized";
- return llvm::ConstantInt::get(Int32Ty,
- static_cast<uint32_t>(llvm::xxHash64(OutName)));
+ uint32_t OutHash = static_cast<uint32_t>(llvm::xxHash64(OutName));
+ const auto &Triple = getTarget().getTriple();
+ if (Triple.isX86() && Triple.isArch64Bit() && Triple.isOSLinux()) {
+ // Estimate the function's arity (i.e., the number of arguments) at the ABI
+ // level by counting the number of parameters that are likely to be passed
+ // as registers, such as pointers and 64-bit (or smaller) integers. The
+ // Linux x86-64 ABI allows up to 6 parameters to be passed in GPRs.
+ // Additional parameters or parameters larger than 64 bits may be passed on
+ // the stack, in which case the arity is denoted as 7.
+ bool MayHaveStackArgs = FnType->getNumParams() > 6;
+
+ for (unsigned int i = 0; !MayHaveStackArgs && i < FnType->getNumParams();
+ ++i) {
+ const Type *PT = FnType->getParamType(i).getTypePtr();
+ if (!(PT->isPointerType() || (PT->isIntegralOrEnumerationType() &&
+ getContext().getTypeSize(PT) <= 64)))
+ MayHaveStackArgs = true;
+ }
+
+ // The 3-bit arity is concatenated with the lower 29 bits of the KCFI hash
+ // to form an enhanced KCFI type ID. This can prevent, for example, a
+ // 3-arity function's ID from ever colliding with a 2-arity function's ID.
+ OutHash = (OutHash << 3) | (MayHaveStackArgs ? 7 : FnType->getNumParams());
+ }
+
+ return llvm::ConstantInt::get(Int32Ty, OutHash);
}
void CodeGenModule::SetLLVMFunctionAttributes(GlobalDecl GD,
diff --git a/clang/test/CodeGen/kcfi-normalize.c b/clang/test/CodeGen/kcfi-normalize.c
index b9150e88f6ab5f..8b7445fc85e490 100644
--- a/clang/test/CodeGen/kcfi-normalize.c
+++ b/clang/test/CodeGen/kcfi-normalize.c
@@ -10,25 +10,31 @@
void foo(void (*fn)(int), int arg) {
// CHECK-LABEL: define{{.*}}foo
// CHECK-SAME: {{.*}}!kcfi_type ![[TYPE1:[0-9]+]]
- // CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 1162514891) ]
+ // KCFI ID = 0x2A548E59
+ // CHECK: call void %0(i32 noundef %1){{.*}}[ "kcfi"(i32 710184537) ]
fn(arg);
}
void bar(void (*fn)(int, int), int arg1, int arg2) {
// CHECK-LABEL: define{{.*}}bar
// CHECK-SAME: {{.*}}!kcfi_type ![[TYPE2:[0-9]+]]
- // CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 448046469) ]
+ // KCFI ID = 0xD5A52C2A
+ // CHECK: call void %0(i32 noundef %1, i32 noundef %2){{.*}}[ "kcfi"(i32 -710595542) ]
fn(arg1, arg2);
}
void baz(void (*fn)(int, int, int), int arg1, int arg2, int arg3) {
// CHECK-LABEL: define{{.*}}baz
// CHECK-SAME: {{.*}}!kcfi_type ![[TYPE3:[0-9]+]]
- // CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 -2049681433) ]
+ // KCFI ID = 0x2EA2BF3B
+ // CHECK: call void %0(i32 noundef %1, i32 noundef %2, i32 noundef %3){{.*}}[ "kcfi"(i32 782417723) ]
fn(arg1, arg2, arg3);
}
// CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1}
-// CHECK: ![[TYPE1]] = !{i32 -1143117868}
-// CHECK: ![[TYPE2]] = !{i32 -460921415}
-// CHECK: ![[TYPE3]] = !{i32 -333839615}
+// KCFI ID = DEEB3EA2
+// CHECK: ![[TYPE1]] = !{i32 -555008350}
+// KCFI ID = 24372DCB
+// CHECK: ![[TYPE2]] = !{i32 607595979}
+// KCFI ID = 0x60D0180C
+// CHECK: ![[TYPE3]] = !{i32 1624250380}
diff --git a/clang/test/CodeGen/kcfi.c b/clang/test/CodeGen/kcfi.c
index 622843cedba50f..dc9e818a9f8cca 100644
--- a/clang/test/CodeGen/kcfi.c
+++ b/clang/test/CodeGen/kcfi.c
@@ -7,7 +7,6 @@
/// Must emit __kcfi_typeid symbols for address-taken function declarations
// CHECK: module asm ".weak __kcfi_typeid_[[F4:[a-zA-Z0-9_]+]]"
-// CHECK: module asm ".set __kcfi_typeid_[[F4]], [[#%d,HASH:]]"
/// Must not __kcfi_typeid symbols for non-address-taken declarations
// CHECK-NOT: module asm ".weak __kcfi_typeid_{{f6|_Z2f6v}}"
@@ -29,7 +28,7 @@ int __call(fn_t f) __attribute__((__no_sanitize__("kcfi"))) {
// CHECK: define dso_local{{.*}} i32 @{{call|_Z4callPFivE}}(ptr{{.*}} %f){{.*}}
int call(fn_t f) {
- // CHECK: call{{.*}} i32 %{{.}}(){{.*}} [ "kcfi"(i32 [[#HASH]]) ]
+ // CHECK: call{{.*}} i32 %{{.}}(){{.*}} [ "kcfi"(i32 [[#%d,HASH:]]) ]
return f();
}
@@ -48,6 +47,20 @@ static int f5(void) { return 2; }
// CHECK-DAG: declare !kcfi_type ![[#TYPE]]{{.*}} i32 @{{f6|_Z2f6v}}()
extern int f6(void);
+typedef struct {
+ int *p1;
+ int *p2[16];
+} s_t;
+
+// CHECK: define internal{{.*}} i32 @{{f7|_ZL2f7PFi3s_tEPS_}}(ptr{{.*}} %f, ptr{{.*}} %s){{.*}}
+static int f7(int (*f)(s_t), s_t *s) {
+ // CHECK: call{{.*}} i32 %{{.*}} [ "kcfi"(i32 [[#%d,HASH4:]]) ]
+ return f(*s) + 1;
+}
+
+// CHECK: define internal{{.*}} i32 @{{f8|_ZL2f83s_t}}(ptr{{.*}} %s){{.*}} !kcfi_type ![[#%d,TYPE4:]]
+static int f8(s_t s) { return 0; }
+
#ifndef __cplusplus
// C: define internal ptr @resolver1() #[[#]] !kcfi_type ![[#]] {
int ifunc1(int) __attribute__((ifunc("resolver1")));
@@ -59,12 +72,14 @@ long ifunc2(long) __attribute__((ifunc("resolver2")));
#endif
int test(void) {
+ s_t s;
return call(f1) +
__call((fn_t)f2) +
call(f3) +
call(f4) +
f5() +
- f6();
+ f6() +
+ f7(f8, &s);
}
#ifdef __cplusplus
@@ -85,3 +100,4 @@ void test_member_call(void) {
// CHECK-DAG: ![[#TYPE]] = !{i32 [[#HASH]]}
// CHECK-DAG: ![[#TYPE2]] = !{i32 [[#%d,HASH2:]]}
// MEMBER-DAG: ![[#TYPE3]] = !{i32 [[#HASH3]]}
+// CHECK-DAG: ![[#TYPE4]] = !{i32 [[#HASH4]]}
``````````
</details>
https://github.com/llvm/llvm-project/pull/117121
More information about the cfe-commits
mailing list