<div dir="ltr">The attached patch fixes a bug where the address of the stack guard was being spilled to the stack, which is a potential security vulnerability attackers can take advantage of.<div><br></div><div>Currently, instruction selection emits multiple load instructions in the prologue to load the stack guard value, and then loads the value again in the epilogue using a volatile load (the instruction defining %vreg7):</div>

<div><div><br></div><div><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">       </span>%vreg0<def> = LDRLIT_ga_pcrel <ga:@__stack_chk_guard>[TF=128]; GPR:%vreg0</p>
<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>%vreg1<def> = LDRi12 %vreg0<kill>, 0, pred:14, pred:%noreg; mem:LD4[GOT] GPR:%vreg1,%vreg0</p>
<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>%vreg2<def> = LDRi12 %vreg1, 0, pred:14, pred:%noreg; mem:LD4[@__stack_chk_guard] GPR:%vreg2,%vreg1</p>
<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>STRi12 %vreg2<kill>, <fi#0>, 0, pred:14, pred:%noreg; mem:Volatile ST4[FixedStack0] GPR:%vreg2</p><p style="margin:0px;font-size:11px;font-family:Menlo">


<br></p><p style="margin:0px;font-size:11px;font-family:Menlo">...</p><p style="margin:0px;font-size:11px;font-family:Menlo"><br></p><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">   </span>%vreg7<def> = LDRi12 %vreg1<kill>, 0, pred:14, pred:%noreg; mem:Volatile LD4[@__stack_chk_guard] GPR:%vreg7,%vreg1</p>


<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>%vreg8<def> = LDRi12 <fi#0>, 0, pred:14, pred:%noreg; mem:Volatile LD4[FixedStack0] GPR:%vreg8</p><p style="margin:0px;font-size:11px;font-family:Menlo">


<span style="white-space:pre-wrap">     </span>%vreg9<def,dead> = SUBrr %vreg7<kill>, %vreg8<kill>, pred:14, pred:%noreg, opt:%CPSR<def>; GPR:%vreg9,%vreg7,%vreg8</p><p style="margin:0px;font-size:11px;font-family:Menlo">





</p><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">    </span>Bcc <BB#1>, pred:1, pred:%CPSR<kill></p></div><div><p style="margin:0px;font-size:11px;font-family:Menlo">


<br></p><p style="margin:0px;font-size:11px;font-family:Menlo"><br></p></div><div>Register allocator then spills the interval holding the address (%vreg1). The instruction that loads the address (the second instruction) cannot be rematted because %vreg1 is not available at the location where it is used:</div>

<div><br></div><div><div><span style="font-family:Menlo;font-size:11px;white-space:pre-wrap">     </span><span style="font-family:Menlo;font-size:11px">%R0<def> = LDRLIT_ga_pcrel <ga:@__stack_chk_guard>[TF=128]</span></div>


<div><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">   </span>%R0<def> = LDRi12 %R0<kill>, 0, pred:14, pred:%noreg; mem:LD4[GOT]<br></p><p style="margin:0px;font-size:11px;font-family:Menlo">


<span style="white-space:pre-wrap">     </span>STRi12 %R0, <fi#2>, 0, pred:14, pred:%noreg; mem:ST4[FixedStack2] // SPILL</p><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">       </span>%R0<def> = LDRi12 %R0<kill>, 0, pred:14, pred:%noreg; mem:LD4[@__stack_chk_guard]</p>


<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>STRi12 %R0<kill>, <fi#0>, 0, pred:14, pred:%noreg; mem:Volatile ST4[FixedStack0]</p><p style="margin:0px;font-size:11px;font-family:Menlo">


<br></p><p style="margin:0px;font-size:11px;font-family:Menlo">...</p><p style="margin:0px;font-size:11px;font-family:Menlo"><br></p><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">   </span>%R0<def> = LDRi12 <fi#2>, 0, pred:14, pred:%noreg; mem:LD4[FixedStack2] // RELOAD</p>


<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>%R0<def> = LDRi12 %R0<kill>, 0, pred:14, pred:%noreg; mem:Volatile LD4[@__stack_chk_guard]</p><p style="margin:0px;font-size:11px;font-family:Menlo">


<span style="white-space:pre-wrap">     </span>%R1<def> = LDRi12 <fi#0>, 0, pred:14, pred:%noreg; mem:Volatile LD4[FixedStack0]</p><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">   </span>%R0<def,dead> = SUBrr %R0<kill>, %R1<kill>, pred:14, pred:%noreg, opt:%CPSR<def></p>


<p style="margin:0px;font-size:11px;font-family:Menlo"></p><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">     </span>Bcc <BB#1>, pred:1, pred:%CPSR<kill></p></div></div>


<div><br></div><div><br></div><div>The fix in the attached patch defines a target-independent node LOAD_STACK_GUARD and emits a single LOAD_STACK_GUARD node in the prologue instead of emitting multiple loads:</div><div><br>
</div><div><p style="margin:0px;font-size:11px;font-family:Menlo">
<span style="white-space:pre-wrap">     </span>%vreg0<def> = LOAD_STACK_GUARD; mem:LD4[@__stack_chk_guard](align=0) GPR:%vreg0</p>

<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>STRi12 %vreg0, <fi#0>, 0, pred:14, pred:%noreg; mem:Volatile ST4[FixedStack0] GPR:%vreg0</p></div><div><br></div>
<div>...</div><div><br></div><div><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">        </span>%vreg3<def> = LDRi12 <fi#0>, 0, pred:14, pred:%noreg; mem:Volatile LD4[FixedStack0] GPR:%vreg3</p>



<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>%vreg4<def,dead> = SUBrr %vreg0<kill>, %vreg3<kill>, pred:14, pred:%noreg, opt:%CPSR<def>; GPR:%vreg4,%vreg0,%vreg3</p>



<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>Bcc <BB#1>, pred:1, pred:%CPSR<kill></p></div><div><br></div><div><p style="margin:0px">Since LOAD_STACK_GUARD is rematerializable, register allocator remats it instead of spilling to the stack:</p>


<p style="margin:0px"><br></p><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">    </span>%R0<def> = LOAD_STACK_GUARD; mem:LD4[@__stack_chk_guard](align=0)</p><p style="margin:0px">



</p><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">    </span>STRi12 %R0<kill>, <fi#0>, 0, pred:14, pred:%noreg; mem:Volatile ST4[FixedStack0]</p><p style="margin:0px;font-size:11px;font-family:Menlo">


<br></p><p style="margin:0px;font-size:11px;font-family:Menlo">...</p><p style="margin:0px;font-size:11px;font-family:Menlo"><br></p><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">   </span>%R0<def> = LDRi12 <fi#0>, 0, pred:14, pred:%noreg; mem:Volatile LD4[FixedStack0]</p>


<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>%R1<def> = LOAD_STACK_GUARD; mem:LD4[@__stack_chk_guard](align=0) // REMATTEDĀ </p><p style="margin:0px;font-size:11px;font-family:Menlo">


<span style="white-space:pre-wrap">     </span>%R0<def,dead> = SUBrr %R1<kill>, %R0<kill>, pred:14, pred:%noreg, opt:%CPSR<def></p><p style="margin:0px;font-size:11px;font-family:Menlo">


</p><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">    </span>Bcc <BB#1>, pred:1, pred:%CPSR<kill></p><div><br></div><p style="margin:0px">The LOAD_STACK_GUARD instruction is then expanded after register allocation:</p>

<p style="margin:0px"><br></p><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">    </span>%R0<def> = LDRLIT_ga_pcrel <ga:@__stack_chk_guard>[TF=128]</p><p style="margin:0px;font-size:11px;font-family:Menlo">


<span style="white-space:pre-wrap">     </span>%R0<def> = LDRi12 %R0<kill>, 0, pred:14, pred:%noreg; mem:LD4[GOT]</p><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap"> </span>%R0<def> = LDRi12 %R0<kill>, 0, pred:14, pred:%noreg; mem:LD4[@__stack_chk_guard](align=0)</p>


<p style="margin:0px">


</p><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">    </span>STRi12 %R0<kill>, %R7, -20, pred:14, pred:%noreg; mem:Volatile ST4[FixedStack0]</p><div><br></div><div>...</div>


<div><br></div><div><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">  </span>%R0<def> = LDRi12 %R7, -20, pred:14, pred:%noreg; mem:Volatile LD4[FixedStack0]</p>
<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>%R1<def> = LDRLIT_ga_pcrel <ga:@__stack_chk_guard>[TF=128]</p>
<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>%R1<def> = LDRi12 %R1<kill>, 0, pred:14, pred:%noreg; mem:LD4[GOT]</p>
<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>%R1<def> = LDRi12 %R1<kill>, 0, pred:14, pred:%noreg; mem:LD4[@__stack_chk_guard](align=0)</p>
<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>%R0<def,dead> = SUBrr %R1<kill>, %R0<kill>, pred:14, pred:%noreg, opt:%CPSR<def></p>
<p style="margin:0px;font-size:11px;font-family:Menlo"><span style="white-space:pre-wrap">      </span>Bcc <BB#2>, pred:1, pred:%CPSR</p></div><p style="margin:0px"><br></p><p style="margin:0px"><br></p><p style="margin:0px">
The patch also changes code generation of AArch64 and X86-64 to emit LOAD_STACK_GUARD. Although it's not necessary to do so (ARM and X86-64 both emit pseudo instructions that are rematerializable for loading the stack guard address), it removes the need to emit a (volatile) load in the epilogue if the interval holding the stack guard value doesn't have to be spilled.</p>
<p style="margin:0px"><br></p></div></div>
</div>