[clang] [llvm] [X86] Enhance kCFI type IDs with a 3-bit arity indicator. (PR #117121)

Tue Dec 3 14:36:12 PST 2024

scottconstable wrote:

Hi @maurer, I can honestly say that I appreciate your attention to detail. While your observation that "attackers often have direct control of intentionally passed arguments" is true in general, I do not see evidence of this among indirect calls in the Linux kernel. 77% of all indirect call parameters are pointers, and AFAIK user-mode code cannot exert arbitrary control over pointers within the kernel (especially given that x86 Linux kernels enable SMAP and SMEP when it is available on the CPU). That leaves the other 23% of indirect call parameters to consider, though I admit I don't have a concrete breakdown of what fraction of these *might* be attacker controllable.

But let's assume for the sake of discussion that all of the non-pointer indirect call parameters are attacker controllable. And because we don't know which dead registers at a call site *might* be attacker-controllable, let's similarly assume for the sake of discussion that all of these dead registers are attacker-controllable. I built the table below to estimate for the arity-enhanced scheme a kind of exploitability score, i.e., the expected number of attacker-controlled parameters times the frequency of function types of that arity. For example, suppose that a hash collision between two function types occurs within the arity-1 partition. The expected number of attacker-controlled parameters at the first function is 0.23, and likewise for the second. So the total number of attacker-controlled parameters is 0.46, times the relative frequency of arity-1 function types (roughly 26%) gives a weighted exploitability of 0.105138035.

| Arity | Expected # of attacker-controlled args | Weighted exploitability |
| --------------- | --------------- | ----------------------------------------|
| 0 | 0 | 0 |
| 1 | 0.46 | 0.105138035 |
| 2 | 0.92 | 0.318536183 |
| 3 | 1.38 | 0.322375493 |
| 4 | 1.84 | 0.197281482 |
| 5 | 2.3 | 0.109483628 |
| 6+ | 2.76 | 0.093409153 |

Total exploitability score for the arity enhancement: **1.146223975**

A similar analysis can be applied to the existing 32-bit hash scheme. For example, suppose a 1-arity function collides with a 2-arity function. Then the expected number of attacker-controlled parameters that would be live at the mis-matched target would be 0.23 if the 2-arity function calls the 1-arity function, or 1.23 if the 1-arity function calls the 2-arity function (because the second parameter at the target would be a dead register at the caller). This data is captured in the table below, where the columns and rows refer to the respective arities of the colliding function types.

|	| 0	| 1	| 2	| 3	| 4	| 5	| 6+ |
| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| **0**	| 0	| 	| 	| 	| 	| 	| 
| **1**	| 1	| 0.46	| 	| 	| 	| 	| 
| **2**	| 2	| 1.46	| 0.92	| 	|	|	|
| **3**	| 3	| 2.46	| 1.92	| 1.38	| 	|	| 
| **4**	| 4	| 3.46	| 2.92	| 2.38	| 1.84	| 	| 
| **5**	| 5	| 4.46	| 3.92	| 3.38  | 2.84	|  2.3	| 
| **6+**	| 6	|  5.46	| 4.92	| 4.38	| 3.84	| 3.3	| 2.76

Assuming that a collision occurs, this table shows the probability density of the arities (note that the probabilities do sum to 1):

|         | 0           | 1           | 2           | 3           | 4           | 5           | 6+          |
|---------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|
| **0**   | 8.61406E-06 |             |             |             |             |             |             |
| **1**   | 0.00134164  | 0.052240106 |             |             |             |             |             |
| **2**   | 0.00203238  | 0.15827159  | 0.119878662 |             |             |             |             |
| **3**   | 0.001371251 | 0.106786156 | 0.161764743 | 0.054571497 |             |             |             |
| **4**   | 0.000629365 | 0.049011785 | 0.074245381 | 0.050093506 | 0.011495742 |             |             |
| **5**   | 0.000279419 | 0.021759723 | 0.032962663 | 0.022239974 | 0.010207511 | 0.00226591  |             |
| **6+**  | 0.000198662 | 0.015470786 | 0.023435882 | 0.015812236 | 0.007257363 | 0.003222046 | 0.001145409 |

Then the weighted exploitability for each collision is:

|         | 0           | 1           | 2           | 3           | 4           | 5           | 6+         |
|---------|-------------|-------------|-------------|-------------|-------------|-------------|------------|
| **0**   | 0           | 0           | 0           | 0           | 0           | 0           | 0          |
| **1**   | 0.00134164  | 0.024030449 | 0           | 0           | 0           | 0           | 0          |
| **2**   | 0.00406476  | 0.231076521 | 0.110288369 | 0           | 0           | 0           | 0          |
| **3**   | 0.004113752 | 0.262693944 | 0.310588307 | 0.075308666 | 0           | 0           | 0          |
| **4**   | 0.002517459 | 0.169580776 | 0.216796512 | 0.119222544 | 0.021152165 | 0           | 0          |
| **5**   | 0.001397093 | 0.097048366 | 0.129213637 | 0.075171112 | 0.02898933  | 0.005211593 | 0          |
| **6+**  | 0.001191971 | 0.084470491 | 0.115304537 | 0.069257593 | 0.027868274 | 0.010632751 | 0.00316133 |

Total exploitability score for the existing 32-bit hash: **2.201693943**

So, if you accept my reasoning, then a collision in the 32-bit scheme is expected to be roughly twice as exploitable as a collision in the 29-bit scheme with 3 arity bits. On the other hand, the 29-bit scheme might produce twice as many collisions.

https://github.com/llvm/llvm-project/pull/117121