<div dir="ltr"><div dir="ltr"><div><div>This thread has IMO started going down an unfortunate path.</div><div><br></div><div><div>Normal compiler behavior and optimization passes (such as DSE, and everything else) should not care WHAT is inside the assembly string, but should just trust the asm-constraints to properly indicate the behavior of the contained assembly. If the asm constraint says it stores a value (which is what "=m" means), then the usual behavior of compiler should be to be to assume that it indeed does so. Doing otherwise starts to get into very-scary territory.</div><div><br></div><div>The initial problem here is that we do not properly tag memory-outputs of inline asm as definitely being a store to that memory. They should be so-tagged. When we fix that bug, then code like this (compiled with optimizations, but no special hardening flags):</div></div><div>int f() {</div><div>  int out = 5;</div><div>  asm("# Do nothing, LOL!" : "=m"(out));<br></div><div>  return out;</div><div>}</div><div>will be compiled down to simply load an uninitialized stack value and return it.</div><div><div>        movl    -4(%rsp), %eax</div><div>        ret</div></div><div>That is the _correct and desired_ behavior. (And, implied by this is that with the current implementation of -ftrivial-auto-var-init, its initialization also will be eliminated.)</div><div><br></div><div><div>That said -- if we want to implement inline-asm targeted mitigations in certain hardening modes, I'm not saying we cannot do that. It's just that we need to be clear that it _is_ special hardening behavior.</div><div><br></div><div><br></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 26, 2019 at 2:13 PM JF Bastien <<a href="mailto:jfbastien@apple.com" target="_blank">jfbastien@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><br><div><br><blockquote type="cite"><div>On Mar 26, 2019, at 10:15 AM, Dmitry Vyukov <<a href="mailto:dvyukov@google.com" target="_blank">dvyukov@google.com</a>> wrote:</div><br class="gmail-m_-7289355727761274034gmail-m_6905950078916565107Apple-interchange-newline"><div><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">On Tue, Mar 26, 2019 at 5:11 PM JF Bastien <</span><a href="mailto:jfbastien@apple.com" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank">jfbastien@apple.com</a><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">> wrote:</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">If an asm's constraints claim that the variable is an output, but then don't actually write to it, that's a bug (at least if the value is actually used afterwards). An output-only constraint on inline asm definitely does _not_ mean "pass through the previous value unchanged, if the asm failed to actually write to it". If you need that behavior, it's spelled "+m", not "=m".<br><br>We do seem to fail to take advantage of this for memory outputs (again, this is not just for ftrivial-auto-var-init -- we ought to eliminate manual initialization just the same), which I'd definitely consider an missing-optimization bug.<br></blockquote><br>You mean we assume C code is buggy and asm code is not buggy because<br>compiler fails to disprove that there is a bug?<br>Doing this optimization without -ftrivial-auto-var-init looks<br>reasonable, compilers do optimizations assuming absence of bugs<br>throughout. But -ftrivial-auto-var-init is specifically about assuming<br>these bugs are everywhere.<br><br><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">On Thu, Mar 21, 2019 at 10:16 AM Alexander Potapenko <<a href="mailto:glider@google.com" target="_blank">glider@google.com</a>> wrote:<br><br>On Thu, Mar 21, 2019 at 2:58 PM James Y Knight <<a href="mailto:jyknight@google.com" target="_blank">jyknight@google.com</a>> wrote:<br><br>Please be more specific about the problem, because your simplified example doesn't actually show an issue. If I write this function:<br>int foo() {<br>int retval;<br>asm("# ..." : "=r"(retval));<br>return retval;<br>}<br>it already does get treated as definitely writing retval, and optimizes away the initialization (whether you explicitly initialize retval, or use -ftrivial-auto-var-init).<br>Example: <a href="https://godbolt.org/z/YYBCXL" target="_blank">https://godbolt.org/z/YYBCXL</a><br></blockquote>This is probably because you're passing retval as a register output.<br>If you change "=r" to "=m" (<a href="https://godbolt.org/z/ulxSgx" target="_blank">https://godbolt.org/z/ulxSgx</a>), it won't be<br>optimized away.<br>(I admit I didn't know about the difference)<br><blockquote type="cite"><blockquote type="cite">On Thu, Mar 21, 2019 at 8:35 AM Alexander Potapenko via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>> wrote:<br><br>Hi JF et al.,<br><br>In the Linux kernel we often encounter the following pattern:<br><br>type op(...) {<br>type retval;<br>inline asm(... retval ...);<br>return retval;<br>}<br><br>, which is used to implement low-level platform-dependent memory operations.<br><br>Some of these operations turn out to be very hot, so we probably don't<br>want to initialize |retval| given that it's always initialized in the<br>assembly.<br><br>However it's practically impossible to tell that a variable is being<br>written to by the inline assembly, or figure out the size of that<br>write.<br>Perhaps we could speculatively treat every scalar output of an inline<br>assembly routine as an initialized value (which is true for the Linux<br>kernel, but I'm not sure about other users of inline assembly, e.g.<br>video codecs).<br><br>WDYT?<br><br><br>--<br>Alexander Potapenko<br>Software Engineer<br><br>Google Germany GmbH<br>Erika-Mann-Straße, 33<br>80636 München<br><br>Geschäftsführer: Paul Manicle, Halimah DeLaine Prado<br>Registergericht und -nummer: Hamburg, HRB 86891<br>Sitz der Gesellschaft: Hamburg<br>_______________________________________________<br>cfe-dev mailing list<br><a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br></blockquote></blockquote><br><br><br>--<br>Alexander Potapenko<br>Software Engineer<br><br>Google Germany GmbH<br>Erika-Mann-Straße, 33<br>80636 München<br><br>Geschäftsführer: Paul Manicle, Halimah DeLaine Prado<br>Registergericht und -nummer: Hamburg, HRB 86891<br>Sitz der Gesellschaft: Hamburg<br></blockquote></blockquote></blockquote><br>Does kernel asm use "+m" or "=m"?<br><br>If asm _must_ write to that variable, then we could improve DSE in<br>normal case (ftrivial-auto-var-init is not enabled). If<br>ftrivial-auto-var-init is enabled, then strictly saying we should not<br>remove initialization because we did not prove that asm actually<br>writes. But we may remove initialization as well for practical<br>reasons.<br><br>Alex mentioned that in some cases we don't know actual address/size of<br>asm writes. But we should know it if a local var is passed to the asm,<br>which should be the case for kernel atomic asm blocks.<br><br>Interestingly, ftrivial-auto-var-init DSE must not be stronger then<br>non-ftrivial-auto-var-init DSE, unless we are talking about our own<br>emitted initialization stores, in such case ftrivial-auto-var-init DSE<br>may remove then more aggressively then what normal DSE would do, we<br>don't actually have to _prove_ that the init store is dead.<br></blockquote><br><br>IMO the auto var init mitigation shouldn’t change the DSE optimization at all. We shouldn’t treat the stores we add any different. We should just improve DSE and everything benefits (auto var init moreso).<br></blockquote><br>But you realize that this "just" improve involves fully understanding<br>static and dynamic behavior of arbitrary assembly for any architecture<br>without even using integrated asm? ;)<br></blockquote><br>If you want to solve every problem however unlikely, yes. If you narrow what you’re doing to a handful of cases that matter, no.<br></blockquote><br>How can we improve DSE to handle all main kernel patterns that matter?<br>Can we? It's still unclear to me. Extending this optimization to<br>generic DSE and all stores can make it much harder (unsolvable)<br>problem...<br></blockquote><br>Right now there's a handful of places in the kernel where we have to<br>use __attribute__((uninitialized)) just to avoid creating an extra<br>initializer: <a href="https://github.com/google/kmsan/commit/00387943691e6466659daac0312c8c5d8f9420b9" target="_blank">https://github.com/google/kmsan/commit/00387943691e6466659daac0312c8c5d8f9420b9</a><br>and <a href="https://github.com/google/kmsan/commit/2954f1c33a81c6f15c7331876f5b6e2fec0d631f" target="_blank">https://github.com/google/kmsan/commit/2954f1c33a81c6f15c7331876f5b6e2fec0d631f</a><br>All those assembly directives are using local scalar variables of size<br><= 8 bytes as "=qm" outputs, so we can narrow the problem down to "let<br>DSE remove redundant stores to local scalars that are used as asm()<br>"m" outputs"<br>False positives will sure be possible in theory, but hopefully rare in practice.<br></blockquote><br>Right, you only need to teach the optimizer about asm that matters. You don’t need “extending this optimization to generic DSE”. What I’m saying is: this is generic DSE, nothing special about variable auto-init, except we’re making sure it help variable auto-init a lot. i.e. there’s no `if (VariableAutoInitIsOn)` in LLVM, there’s just some DSE smarts that are likely to kick in a lot more when variable auto-init is on.<br></blockquote><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">We can't start breaking correct user code because "hopefully rare in</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">practice”.</span></div></blockquote><div><br></div><div>I’m not advocating for this.</div><div><br></div><br><blockquote type="cite"><div><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">But we can well episodically omit our hardening</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">initializing store if in most cases it is not necessary but we are not</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">really sure, e.g. not sure what exactly memory an asm block writes.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"></div></blockquote><div><br></div><div>I don’t agree. It’s a bad mitigation if it sometimes goes ¯\_(ツ)_/¯</div><div><br></div><br><blockquote type="cite"><div><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">There is huge difference complexity-wise between a 100% sound proof</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">and a best-effort hint.</span></div></blockquote><div><br></div><div>Correct, and I don’t think you need a 100% solution for DSE (i.e. you don’t need to understand the semantics of all assembly instructions for all ISAs). You just need to hit the cases that matter (some instructions on some ISAs), and have those cases remain 100% sound.</div><div><br></div><br><blockquote type="cite"><div><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">This is very special about auto-initializing</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">stores.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">I mean, I agree, all others being equal we prefer handling it on</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">common grounds. But still don't see all others being equal here. From</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">what Alex says, it's not possible to figure out what exactly memory an</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline">asm block writes.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"></div></blockquote><div><br></div><div>Agreed, and I’m not saying that this needs to happen.</div><div><br></div><div>I’ll re-iterate: which asm statements result in extraneous initialization? What instructions are they?</div><div><br></div><br><blockquote type="cite"><div><blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><blockquote type="cite">I would still love to know what's the main source of truth for the<br>semantics of asm() constraints.<br></blockquote><br>I don’t think you can trust programmer-provided constraints, unless you also add diagnostics to warn on incorrect constraints.<br><br><br><blockquote type="cite">For example, we've noticed that the BSF instruction, which can be used<br>as follows:<br><br>unsigned long ffs(unsigned long word) {<br>unsigned long ret;<br>asm("rep; bsf %1,%0" : "=r" (ret) : "rm" (word));<br>return ret;<br>}<br><br>isn't guaranteed to initialize its output in the case |word| is 0<br>(according to unnamed Intel architect, it just zeroes out the top 32<br>bits of the return value).<br>Therefore the elimination of dead stores to |ret| done by both Clang<br>and GCC is correct only if the callers are careful enough.<br><br>--<br>Alexander Potapenko<br>Software Engineer<br><br>Google Germany GmbH<br>Erika-Mann-Straße, 33<br>80636 München<br><br>Geschäftsführer: Paul Manicle, Halimah DeLaine Prado<br>Registergericht und -nummer: Hamburg, HRB 86891<br>Sitz der Gesellschaft: Hamburg</blockquote></blockquote></div></blockquote></div><br></div></blockquote></div>