<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
h1
{mso-style-priority:9;
mso-style-link:"Heading 1 Char";
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:24.0pt;
font-family:"Times New Roman",serif;
font-weight:bold;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
span.Heading1Char
{mso-style-name:"Heading 1 Char";
mso-style-priority:9;
mso-style-link:"Heading 1";
font-family:"Calibri Light",sans-serif;
color:#2E74B5;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Can you run your new patch on CPU2017 INT C/C++ benchmarks and report the regression percentage ?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org]
<b>On Behalf Of </b>Peter Collingbourne via llvm-dev<br>
<b>Sent:</b> Tuesday, January 23, 2018 6:45 PM<br>
<b>To:</b> llvm-dev <llvm-dev@lists.llvm.org><br>
<b>Cc:</b> Vlad Tsyrklevich <vlad@tsyrklevich.net><br>
<b>Subject:</b> [llvm-dev] RFC: Using link-time optimization to eliminate retpolines<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">The proposed mitigation for variant 2 of CVE-2017-5715, “branch target injection”, is to send all indirect branches through an instruction sequence known as a retpoline. Because the purpose of a retpoline is to prevent attacker-controlled
speculation, we also end up losing the benefits of benign speculation, which can lead to a measurable loss of performance.<br>
<br>
We can regain some of those benefits if we know that the set of possible branch targets is fixed (this is sometimes known to be the case when using whole-program devirtualization or CFI -- see
<a href="https://clang.llvm.org/docs/LTOVisibility.html">https://clang.llvm.org/docs/LTOVisibility.html</a>). In that case, we can construct a so-called “branch funnel” that selects one of the possible targets by performing a binary search on an address associated
with the indirect branch (for virtual calls, this is the address of the vtable, and for indirect calls via a function pointer, this is the function pointer itself), eventually directly branching to the selected target. As a result, the processor is free to
speculatively execute the virtual call, but it can only speculatively branch to addresses of valid implementations of the virtual function, as opposed to arbitrary addresses.<br>
<br>
For example, suppose that we have the following class hierarchy, which is known to be closed:<br>
<br>
<span style="font-family:"Courier New"">struct Base { virtual void f() = 0; };<br>
struct A : Base { virtual void f(); };<br>
struct B : Base { virtual void f(); };<br>
struct C : Base { virtual void f(); };<br>
</span><br>
We can lay out the vtables for the derived classes in the order A, B, C, and produce an instruction sequence that directs execution to one of the targets A::f, B::f and C::f depending on the vtable address. In x86_64 assembly, a branch funnel would look like
this:<br>
<br>
<span style="font-family:"Courier New"">lea B::vtable+16(%rip), %r11<br>
cmp %r11, %r10<br>
jb A::f<br>
je B::f<br>
jmp C::f<br>
</span><br>
A caller performs a virtual call by loading the vtable address into register r10, setting up the other registers for the virtual call and directly calling the branch funnel as if it were a regular function. Because the branch funnel enforces control flow integrity
by itself, we can also avoid emitting CFI checks at call sites that use branch funnels when CFI is enabled.<br>
<br>
To control the layout of vtables and function pointers, we can extend existing mechanisms for controlling layout that are used to implement CFI (see
<a href="https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html">https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html</a>) so that they are also used whenever a branch funnel needs to be created.<br>
<br>
The compiler will only use branch funnels when both the retpoline mitigation (-mretpoline) and whole-program devirtualization (-fwhole-program-vtables) features are enabled (the former is on the assumption that in general a regular indirect call will be less
expensive than a branch funnel, and the latter provides the necessary guarantee that the type hierarchy is closed). Even when retpolines are enabled, there is still a cost associated with executing a branch funnel that needs to be balanced against the cost
of a regular CFI check and retpoline, so branch funnels are only used when there are <=10 targets (this number has not been tuned yet). Because the implementation uses some of the same mechanisms that are used to implement CFI and whole-program devirtualization,
it requires LTO (it is compatible with both full LTO and ThinLTO).<br>
<br>
To measure the performance impact of branch funnels, I ran a selection of Chrome benchmark suites on Chrome binaries built with CFI, CFI + retpoline and CFI + retpoline + branch funnels, and measured the median impact over all benchmarks in each suite. The
numbers are presented below. I should preface these numbers by saying that these are largely microbenchmarks, so the impact of retpoline on its own is unlikely to be characteristic of real workloads. The numbers to focus on should be the impact of retpoline
+ branch funnels relative to the impact of retpoline, where there is a median 5.7% regression as compared to the median 8% regression associated with retpoline.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="780" style="width:6.5in;border-collapse:collapse">
<tbody>
<tr>
<td valign="top" style="border:solid black 1.0pt;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">Benchmark suite</span><o:p></o:p></p>
</td>
<td valign="top" style="border:solid black 1.0pt;border-left:none;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">CFI + retpoline impact</span><o:p></o:p></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">(relative to CFI)</span><o:p></o:p></p>
</td>
<td valign="top" style="border:solid black 1.0pt;border-left:none;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">CFI + retpoline + BF impact</span><o:p></o:p></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">(relative to CFI)</span><o:p></o:p></p>
</td>
</tr>
<tr>
<td valign="top" style="border:solid black 1.0pt;border-top:none;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">blink_perf.bindings</span><o:p></o:p></p>
</td>
<td valign="top" style="border-top:none;border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">0.9% improvement</span><o:p></o:p></p>
</td>
<td valign="top" style="border-top:none;border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">9.8% improvement</span><o:p></o:p></p>
</td>
</tr>
<tr>
<td valign="top" style="border:solid black 1.0pt;border-top:none;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">blink_perf.dom</span><o:p></o:p></p>
</td>
<td valign="top" style="border-top:none;border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">20.4% regression</span><o:p></o:p></p>
</td>
<td valign="top" style="border-top:none;border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">17.5% regression</span><o:p></o:p></p>
</td>
</tr>
<tr>
<td valign="top" style="border:solid black 1.0pt;border-top:none;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">blink_perf.layout</span><o:p></o:p></p>
</td>
<td valign="top" style="border-top:none;border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">17.4% regression</span><o:p></o:p></p>
</td>
<td valign="top" style="border-top:none;border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">14.3% regression</span><o:p></o:p></p>
</td>
</tr>
<tr>
<td valign="top" style="border:solid black 1.0pt;border-top:none;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">blink_perf.parser</span><o:p></o:p></p>
</td>
<td valign="top" style="border-top:none;border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">3.8% regression</span><o:p></o:p></p>
</td>
<td valign="top" style="border-top:none;border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">5.7% regression</span><o:p></o:p></p>
</td>
</tr>
<tr>
<td valign="top" style="border:solid black 1.0pt;border-top:none;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">blink_perf.svg</span><o:p></o:p></p>
</td>
<td valign="top" style="border-top:none;border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">8.0% regression</span><o:p></o:p></p>
</td>
<td valign="top" style="border-top:none;border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;padding:5.0pt 5.0pt 5.0pt 5.0pt">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">5.4% regression</span><o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
</div>
<h1 style="mso-margin-top-alt:20.0pt;margin-right:0in;margin-bottom:6.0pt;margin-left:0in">
<span style="font-size:20.0pt;font-family:"Arial",sans-serif;color:black;font-weight:normal">Future work</span><o:p></o:p></h1>
<p class="MsoNormal">Implementation of branch funnels for architectures other than x86_64.<br>
<br>
Implementation of branch funnels for indirect calls via a function pointer (currently only implemented for virtual calls). This will probably require an implementation of whole-program “devirtualization” for indirect calls.<br>
<br>
Use profile data to order the comparisons in the branch funnel by frequency, to minimise the number of comparisons required for frequent virtual calls.<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal">-- <o:p></o:p></p>
<div>
<div>
<p class="MsoNormal">-- <o:p></o:p></p>
<div>
<p class="MsoNormal">Peter<o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</body>
</html>