[cfe-dev] (not) initializing assembly outputs with -ftrivial-auto-var-init

Tue Mar 26 15:29:50 PDT 2019

The entirety of the named object is replaced. If you want to modify an
object, instead of entirely replacing it, you use "+m".

None of this is anything new or innovative -- GCC has had these semantics
-- and been optimizing based on them -- for ages.

E.g., here, all elements of the array are replaced, so the initialization
goes away, and the return needs to explicitly add all 4 values written by
the inline-asm.
int out[4] = {1,2,3,4};
asm("whatever" : "=m"(out));
return out[0] + out[1] + out[2] + out[3];

Here, only out[1] is touched by the inline asm. The other values are not
modified, so all of the initialization can disappear, and the generated
code can simply return 8 + out[1].
int out[4] = {1,2,3,4};
asm("whatever" : "=m"(out[1]));
return out[0] + out[1] + out[2] + out[3];

On Tue, Mar 26, 2019 at 4:25 PM Dmitry Vyukov <dvyukov at google.com> wrote:

> On Tue, Mar 26, 2019 at 9:07 PM James Y Knight <jyknight at google.com>
> wrote:
> >
> > This thread has IMO started going down an unfortunate path.
> >
> > Normal compiler behavior and optimization passes (such as DSE, and
> everything else) should not care WHAT is inside the assembly string, but
> should just trust the asm-constraints to properly indicate the behavior of
> the contained assembly. If the asm constraint says it stores a value (which
> is what "=m" means), then the usual behavior of compiler should be to be to
> assume that it indeed does so. Doing otherwise starts to get into
> very-scary territory.
> >
> > The initial problem here is that we do not properly tag memory-outputs
> of inline asm as definitely being a store to that memory. They should be
> so-tagged. When we fix that bug, then code like this (compiled with
> optimizations, but no special hardening flags):
> > int f() {
> >   int out = 5;
> >   asm("# Do nothing, LOL!" : "=m"(out));
> >   return out;
> > }
> > will be compiled down to simply load an uninitialized stack value and
> return it.
> >         movl    -4(%rsp), %eax
> >         ret
> > That is the _correct and desired_ behavior. (And, implied by this is
> that with the current implementation of -ftrivial-auto-var-init, its
> initialization also will be eliminated.)
>
> If an lvalue is passed to =m what exactly is written? Single value of
> the lvalue type as passed? Whole base object? What if it's a
> memset-like asm block that writes an array? What if it writes a single
> value but at some offset? What if it writes as single value, but size
> of the write does not match the static type? I think I've seen asm
> blocks of all types in kernel.
>
> > That said -- if we want to implement inline-asm targeted mitigations in
> certain hardening modes, I'm not saying we cannot do that. It's just that
> we need to be clear that it _is_ special hardening behavior.
> >
> >
> >
> > On Tue, Mar 26, 2019 at 2:13 PM JF Bastien <jfbastien at apple.com> wrote:
> >>
> >>
> >>
> >> On Mar 26, 2019, at 10:15 AM, Dmitry Vyukov <dvyukov at google.com> wrote:
> >>
> >> On Tue, Mar 26, 2019 at 5:11 PM JF Bastien <jfbastien at apple.com> wrote:
> >>
> >> If an asm's constraints claim that the variable is an output, but then
> don't actually write to it, that's a bug (at least if the value is actually
> used afterwards). An output-only constraint on inline asm definitely does
> _not_ mean "pass through the previous value unchanged, if the asm failed to
> actually write to it". If you need that behavior, it's spelled "+m", not
> "=m".
> >>
> >> We do seem to fail to take advantage of this for memory outputs (again,
> this is not just for ftrivial-auto-var-init -- we ought to eliminate manual
> initialization just the same), which I'd definitely consider an
> missing-optimization bug.
> >>
> >>
> >> You mean we assume C code is buggy and asm code is not buggy because
> >> compiler fails to disprove that there is a bug?
> >> Doing this optimization without -ftrivial-auto-var-init looks
> >> reasonable, compilers do optimizations assuming absence of bugs
> >> throughout. But -ftrivial-auto-var-init is specifically about assuming
> >> these bugs are everywhere.
> >>
> >> On Thu, Mar 21, 2019 at 10:16 AM Alexander Potapenko <glider at google.com>
> wrote:
> >>
> >> On Thu, Mar 21, 2019 at 2:58 PM James Y Knight <jyknight at google.com>
> wrote:
> >>
> >> Please be more specific about the problem, because your simplified
> example doesn't actually show an issue. If I write this function:
> >> int foo() {
> >> int retval;
> >> asm("# ..." : "=r"(retval));
> >> return retval;
> >> }
> >> it already does get treated as definitely writing retval, and optimizes
> away the initialization (whether you explicitly initialize retval, or use
> -ftrivial-auto-var-init).
> >> Example: https://godbolt.org/z/YYBCXL
> >>
> >> This is probably because you're passing retval as a register output.
> >> If you change "=r" to "=m" (https://godbolt.org/z/ulxSgx), it won't be
> >> optimized away.
> >> (I admit I didn't know about the difference)
> >>
> >> On Thu, Mar 21, 2019 at 8:35 AM Alexander Potapenko via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
> >>
> >> Hi JF et al.,
> >>
> >> In the Linux kernel we often encounter the following pattern:
> >>
> >> type op(...) {
> >> type retval;
> >> inline asm(... retval ...);
> >> return retval;
> >> }
> >>
> >> , which is used to implement low-level platform-dependent memory
> operations.
> >>
> >> Some of these operations turn out to be very hot, so we probably don't
> >> want to initialize |retval| given that it's always initialized in the
> >> assembly.
> >>
> >> However it's practically impossible to tell that a variable is being
> >> written to by the inline assembly, or figure out the size of that
> >> write.
> >> Perhaps we could speculatively treat every scalar output of an inline
> >> assembly routine as an initialized value (which is true for the Linux
> >> kernel, but I'm not sure about other users of inline assembly, e.g.
> >> video codecs).
> >>
> >> WDYT?
> >>
> >>
> >> --
> >> Alexander Potapenko
> >> Software Engineer
> >>
> >> Google Germany GmbH
> >> Erika-Mann-Straße, 33
> >> 80636 München
> >>
> >> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> >> Registergericht und -nummer: Hamburg, HRB 86891
> >> Sitz der Gesellschaft: Hamburg
> >> _______________________________________________
> >> cfe-dev mailing list
> >> cfe-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>
> >>
> >>
> >>
> >> --
> >> Alexander Potapenko
> >> Software Engineer
> >>
> >> Google Germany GmbH
> >> Erika-Mann-Straße, 33
> >> 80636 München
> >>
> >> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> >> Registergericht und -nummer: Hamburg, HRB 86891
> >> Sitz der Gesellschaft: Hamburg
> >>
> >>
> >> Does kernel asm use "+m" or "=m"?
> >>
> >> If asm _must_ write to that variable, then we could improve DSE in
> >> normal case (ftrivial-auto-var-init is not enabled). If
> >> ftrivial-auto-var-init is enabled, then strictly saying we should not
> >> remove initialization because we did not prove that asm actually
> >> writes. But we may remove initialization as well for practical
> >> reasons.
> >>
> >> Alex mentioned that in some cases we don't know actual address/size of
> >> asm writes. But we should know it if a local var is passed to the asm,
> >> which should be the case for kernel atomic asm blocks.
> >>
> >> Interestingly, ftrivial-auto-var-init DSE must not be stronger then
> >> non-ftrivial-auto-var-init DSE, unless we are talking about our own
> >> emitted initialization stores, in such case ftrivial-auto-var-init DSE
> >> may remove then more aggressively then what normal DSE would do, we
> >> don't actually have to _prove_ that the init store is dead.
> >>
> >>
> >>
> >> IMO the auto var init mitigation shouldn’t change the DSE optimization
> at all. We shouldn’t treat the stores we add any different. We should just
> improve DSE and everything benefits (auto var init moreso).
> >>
> >>
> >> But you realize that this "just" improve involves fully understanding
> >> static and dynamic behavior of arbitrary assembly for any architecture
> >> without even using integrated asm? ;)
> >>
> >>
> >> If you want to solve every problem however unlikely, yes. If you narrow
> what you’re doing to a handful of cases that matter, no.
> >>
> >>
> >> How can we improve DSE to handle all main kernel patterns that matter?
> >> Can we? It's still unclear to me. Extending this optimization to
> >> generic DSE and all stores can make it much harder (unsolvable)
> >> problem...
> >>
> >>
> >> Right now there's a handful of places in the kernel where we have to
> >> use __attribute__((uninitialized)) just to avoid creating an extra
> >> initializer:
> https://github.com/google/kmsan/commit/00387943691e6466659daac0312c8c5d8f9420b9
> >> and
> https://github.com/google/kmsan/commit/2954f1c33a81c6f15c7331876f5b6e2fec0d631f
> >> All those assembly directives are using local scalar variables of size
> >> <= 8 bytes as "=qm" outputs, so we can narrow the problem down to "let
> >> DSE remove redundant stores to local scalars that are used as asm()
> >> "m" outputs"
> >> False positives will sure be possible in theory, but hopefully rare in
> practice.
> >>
> >>
> >> Right, you only need to teach the optimizer about asm that matters. You
> don’t need “extending this optimization to generic DSE”. What I’m saying
> is: this is generic DSE, nothing special about variable auto-init, except
> we’re making sure it help variable auto-init a lot. i.e. there’s no `if
> (VariableAutoInitIsOn)` in LLVM, there’s just some DSE smarts that are
> likely to kick in a lot more when variable auto-init is on.
> >>
> >>
> >>
> >> We can't start breaking correct user code because "hopefully rare in
> >> practice”.
> >>
> >>
> >> I’m not advocating for this.
> >>
> >>
> >> But we can well episodically omit our hardening
> >> initializing store if in most cases it is not necessary but we are not
> >> really sure, e.g. not sure what exactly memory an asm block writes.
> >>
> >>
> >> I don’t agree. It’s a bad mitigation if it sometimes goes ¯\_(ツ)_/¯
> >>
> >>
> >> There is huge difference complexity-wise between a 100% sound proof
> >> and a best-effort hint.
> >>
> >>
> >> Correct, and I don’t think you need a 100% solution for DSE (i.e. you
> don’t need to understand the semantics of all assembly instructions for all
> ISAs). You just need to hit the cases that matter (some instructions on
> some ISAs), and have those cases remain 100% sound.
> >>
> >>
> >> This is very special about auto-initializing
> >> stores.
> >>
> >> I mean, I agree, all others being equal we prefer handling it on
> >> common grounds. But still don't see all others being equal here. From
> >> what Alex says, it's not possible to figure out what exactly memory an
> >> asm block writes.
> >>
> >>
> >> Agreed, and I’m not saying that this needs to happen.
> >>
> >> I’ll re-iterate: which asm statements result in extraneous
> initialization? What instructions are they?
> >>
> >>
> >> I would still love to know what's the main source of truth for the
> >> semantics of asm() constraints.
> >>
> >>
> >> I don’t think you can trust programmer-provided constraints, unless you
> also add diagnostics to warn on incorrect constraints.
> >>
> >>
> >> For example, we've noticed that the BSF instruction, which can be used
> >> as follows:
> >>
> >> unsigned long ffs(unsigned long word) {
> >> unsigned long ret;
> >> asm("rep; bsf %1,%0" : "=r" (ret) : "rm" (word));
> >> return ret;
> >> }
> >>
> >> isn't guaranteed to initialize its output in the case |word| is 0
> >> (according to unnamed Intel architect, it just zeroes out the top 32
> >> bits of the return value).
> >> Therefore the elimination of dead stores to |ret| done by both Clang
> >> and GCC is correct only if the callers are careful enough.
> >>
> >> --
> >> Alexander Potapenko
> >> Software Engineer
> >>
> >> Google Germany GmbH
> >> Erika-Mann-Straße, 33
> >> 80636 München
> >>
> >> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> >> Registergericht und -nummer: Hamburg, HRB 86891
> >> Sitz der Gesellschaft: Hamburg
> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190326/843c71fe/attachment.html>