[cfe-dev] (not) initializing assembly outputs with -ftrivial-auto-var-init

Thu Mar 28 10:40:47 PDT 2019

So far it looks weird.
I have benchmark which shows -1.5% performance regression with
-ftrivial-auto-var-init of trunk clang.

My DSE prototype, cross block only, removes additional +10% stores
comparing to existing DSE. Obviously cross-block is not enough, and we need
to do inter-procedural module analysis or even LTO.

However problem is following.
To my surprise, if I completely disable existing DSE (and my), I see no
difference on benchmarks at all. Same (no difference) with or without
-ftrivial-auto-var-init.

On Wed, Mar 27, 2019 at 9:34 AM JF Bastien <jfbastien at apple.com> wrote:

>
>
> On Mar 27, 2019, at 3:58 AM, Alexander Potapenko <glider at google.com>
> wrote:
>
> On Tue, Mar 26, 2019 at 11:30 PM James Y Knight <jyknight at google.com>
> wrote:
>
>
> The entirety of the named object is replaced. If you want to modify an
> object, instead of entirely replacing it, you use "+m".
>
> Thanks for taking your time to explain this!
> After re-reading the thread, I agree we need to make DSE work with the
> cases where the constraints are correct, which will cover most of the
> uses of inline assembly in the kernel.
> Not dead-eliminating certain stores with incorrect constraints
> shouldn't be a problem.
>
> Turns out it's quite easy to make DSE work for a particular case of a
> no-input-single-output asm() directive by patching
> hasAnalyzableMemoryWrite() and getLocForWrite() in
> DeadStoreElimination.cpp
> But for directives that read and write more than one memory location
> it might require some refactoring.
> Vitaly (CCed) is already working on cross-basic-block DSE right now,
> so I'd not touch the assembly handling before he lands his patches.
>
>
> Thanks! This all sounds great. Please CC me on the patches, I’d like to
> make sure it works well for our code as well (potentially in follow-ups I’d
> send).
>
>
> None of this is anything new or innovative -- GCC has had these semantics
> -- and been optimizing based on them -- for ages.
>
> E.g., here, all elements of the array are replaced, so the initialization
> goes away, and the return needs to explicitly add all 4 values written by
> the inline-asm.
> int out[4] = {1,2,3,4};
> asm("whatever" : "=m"(out));
> return out[0] + out[1] + out[2] + out[3];
>
> Here, only out[1] is touched by the inline asm. The other values are not
> modified, so all of the initialization can disappear, and the generated
> code can simply return 8 + out[1].
> int out[4] = {1,2,3,4};
> asm("whatever" : "=m"(out[1]));
> return out[0] + out[1] + out[2] + out[3];
>
>
> On Tue, Mar 26, 2019 at 4:25 PM Dmitry Vyukov <dvyukov at google.com> wrote:
>
>
> On Tue, Mar 26, 2019 at 9:07 PM James Y Knight <jyknight at google.com>
> wrote:
>
>
> This thread has IMO started going down an unfortunate path.
>
> Normal compiler behavior and optimization passes (such as DSE, and
> everything else) should not care WHAT is inside the assembly string, but
> should just trust the asm-constraints to properly indicate the behavior of
> the contained assembly. If the asm constraint says it stores a value (which
> is what "=m" means), then the usual behavior of compiler should be to be to
> assume that it indeed does so. Doing otherwise starts to get into
> very-scary territory.
>
> The initial problem here is that we do not properly tag memory-outputs of
> inline asm as definitely being a store to that memory. They should be
> so-tagged. When we fix that bug, then code like this (compiled with
> optimizations, but no special hardening flags):
> int f() {
>  int out = 5;
>  asm("# Do nothing, LOL!" : "=m"(out));
>  return out;
> }
> will be compiled down to simply load an uninitialized stack value and
> return it.
>        movl    -4(%rsp), %eax
>        ret
> That is the _correct and desired_ behavior. (And, implied by this is that
> with the current implementation of -ftrivial-auto-var-init, its
> initialization also will be eliminated.)
>
>
> If an lvalue is passed to =m what exactly is written? Single value of
> the lvalue type as passed? Whole base object? What if it's a
> memset-like asm block that writes an array? What if it writes a single
> value but at some offset? What if it writes as single value, but size
> of the write does not match the static type? I think I've seen asm
> blocks of all types in kernel.
>
> That said -- if we want to implement inline-asm targeted mitigations in
> certain hardening modes, I'm not saying we cannot do that. It's just that
> we need to be clear that it _is_ special hardening behavior.
>
>
>
> On Tue, Mar 26, 2019 at 2:13 PM JF Bastien <jfbastien at apple.com> wrote:
>
>
>
>
> On Mar 26, 2019, at 10:15 AM, Dmitry Vyukov <dvyukov at google.com> wrote:
>
> On Tue, Mar 26, 2019 at 5:11 PM JF Bastien <jfbastien at apple.com> wrote:
>
> If an asm's constraints claim that the variable is an output, but then
> don't actually write to it, that's a bug (at least if the value is actually
> used afterwards). An output-only constraint on inline asm definitely does
> _not_ mean "pass through the previous value unchanged, if the asm failed to
> actually write to it". If you need that behavior, it's spelled "+m", not
> "=m".
>
> We do seem to fail to take advantage of this for memory outputs (again,
> this is not just for ftrivial-auto-var-init -- we ought to eliminate manual
> initialization just the same), which I'd definitely consider an
> missing-optimization bug.
>
>
> You mean we assume C code is buggy and asm code is not buggy because
> compiler fails to disprove that there is a bug?
> Doing this optimization without -ftrivial-auto-var-init looks
> reasonable, compilers do optimizations assuming absence of bugs
> throughout. But -ftrivial-auto-var-init is specifically about assuming
> these bugs are everywhere.
>
> On Thu, Mar 21, 2019 at 10:16 AM Alexander Potapenko <glider at google.com>
> wrote:
>
> On Thu, Mar 21, 2019 at 2:58 PM James Y Knight <jyknight at google.com>
> wrote:
>
> Please be more specific about the problem, because your simplified example
> doesn't actually show an issue. If I write this function:
> int foo() {
> int retval;
> asm("# ..." : "=r"(retval));
> return retval;
> }
> it already does get treated as definitely writing retval, and optimizes
> away the initialization (whether you explicitly initialize retval, or use
> -ftrivial-auto-var-init).
> Example: https://godbolt.org/z/YYBCXL
>
> This is probably because you're passing retval as a register output.
> If you change "=r" to "=m" (https://godbolt.org/z/ulxSgx), it won't be
> optimized away.
> (I admit I didn't know about the difference)
>
> On Thu, Mar 21, 2019 at 8:35 AM Alexander Potapenko via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> Hi JF et al.,
>
> In the Linux kernel we often encounter the following pattern:
>
> type op(...) {
> type retval;
> inline asm(... retval ...);
> return retval;
> }
>
> , which is used to implement low-level platform-dependent memory
> operations.
>
> Some of these operations turn out to be very hot, so we probably don't
> want to initialize |retval| given that it's always initialized in the
> assembly.
>
> However it's practically impossible to tell that a variable is being
> written to by the inline assembly, or figure out the size of that
> write.
> Perhaps we could speculatively treat every scalar output of an inline
> assembly routine as an initialized value (which is true for the Linux
> kernel, but I'm not sure about other users of inline assembly, e.g.
> video codecs).
>
> WDYT?
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
>
>
> Does kernel asm use "+m" or "=m"?
>
> If asm _must_ write to that variable, then we could improve DSE in
> normal case (ftrivial-auto-var-init is not enabled). If
> ftrivial-auto-var-init is enabled, then strictly saying we should not
> remove initialization because we did not prove that asm actually
> writes. But we may remove initialization as well for practical
> reasons.
>
> Alex mentioned that in some cases we don't know actual address/size of
> asm writes. But we should know it if a local var is passed to the asm,
> which should be the case for kernel atomic asm blocks.
>
> Interestingly, ftrivial-auto-var-init DSE must not be stronger then
> non-ftrivial-auto-var-init DSE, unless we are talking about our own
> emitted initialization stores, in such case ftrivial-auto-var-init DSE
> may remove then more aggressively then what normal DSE would do, we
> don't actually have to _prove_ that the init store is dead.
>
>
>
> IMO the auto var init mitigation shouldn’t change the DSE optimization at
> all. We shouldn’t treat the stores we add any different. We should just
> improve DSE and everything benefits (auto var init moreso).
>
>
> But you realize that this "just" improve involves fully understanding
> static and dynamic behavior of arbitrary assembly for any architecture
> without even using integrated asm? ;)
>
>
> If you want to solve every problem however unlikely, yes. If you narrow
> what you’re doing to a handful of cases that matter, no.
>
>
> How can we improve DSE to handle all main kernel patterns that matter?
> Can we? It's still unclear to me. Extending this optimization to
> generic DSE and all stores can make it much harder (unsolvable)
> problem...
>
>
> Right now there's a handful of places in the kernel where we have to
> use __attribute__((uninitialized)) just to avoid creating an extra
> initializer:
> https://github.com/google/kmsan/commit/00387943691e6466659daac0312c8c5d8f9420b9
> and
> https://github.com/google/kmsan/commit/2954f1c33a81c6f15c7331876f5b6e2fec0d631f
> All those assembly directives are using local scalar variables of size
> <= 8 bytes as "=qm" outputs, so we can narrow the problem down to "let
> DSE remove redundant stores to local scalars that are used as asm()
> "m" outputs"
> False positives will sure be possible in theory, but hopefully rare in
> practice.
>
>
> Right, you only need to teach the optimizer about asm that matters. You
> don’t need “extending this optimization to generic DSE”. What I’m saying
> is: this is generic DSE, nothing special about variable auto-init, except
> we’re making sure it help variable auto-init a lot. i.e. there’s no `if
> (VariableAutoInitIsOn)` in LLVM, there’s just some DSE smarts that are
> likely to kick in a lot more when variable auto-init is on.
>
>
>
> We can't start breaking correct user code because "hopefully rare in
> practice”.
>
>
> I’m not advocating for this.
>
>
> But we can well episodically omit our hardening
> initializing store if in most cases it is not necessary but we are not
> really sure, e.g. not sure what exactly memory an asm block writes.
>
>
> I don’t agree. It’s a bad mitigation if it sometimes goes ¯\_(ツ)_/¯
>
>
> There is huge difference complexity-wise between a 100% sound proof
> and a best-effort hint.
>
>
> Correct, and I don’t think you need a 100% solution for DSE (i.e. you
> don’t need to understand the semantics of all assembly instructions for all
> ISAs). You just need to hit the cases that matter (some instructions on
> some ISAs), and have those cases remain 100% sound.
>
>
> This is very special about auto-initializing
> stores.
>
> I mean, I agree, all others being equal we prefer handling it on
> common grounds. But still don't see all others being equal here. From
> what Alex says, it's not possible to figure out what exactly memory an
> asm block writes.
>
>
> Agreed, and I’m not saying that this needs to happen.
>
> I’ll re-iterate: which asm statements result in extraneous initialization?
> What instructions are they?
>
>
> I would still love to know what's the main source of truth for the
> semantics of asm() constraints.
>
>
> I don’t think you can trust programmer-provided constraints, unless you
> also add diagnostics to warn on incorrect constraints.
>
>
> For example, we've noticed that the BSF instruction, which can be used
> as follows:
>
> unsigned long ffs(unsigned long word) {
> unsigned long ret;
> asm("rep; bsf %1,%0" : "=r" (ret) : "rm" (word));
> return ret;
> }
>
> isn't guaranteed to initialize its output in the case |word| is 0
> (according to unnamed Intel architect, it just zeroes out the top 32
> bits of the return value).
> Therefore the elimination of dead stores to |ret| done by both Clang
> and GCC is correct only if the callers are careful enough.
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
>
>
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190328/0459bc75/attachment.html>