[cfe-dev] Does a inline assembly or a memory barrier disable optimizations in a function

Thu Aug 6 09:24:57 PDT 2015

(resending since I accidently used the old list address)

I personally have this definition in headers I use for baremetal programming:

// compiler write barrier, limited to specified object
template< typename T > __attribute__((always_inline))
static inline void write_barrier( T const &target ) {
        asm volatile ( "" :: "m"(target) );
}

Using such a barrier on the buffer after the memset should guarantee
it will not get eliminated. As mentioned in the bug thread, using
"r"(&target) will not work since that only makes the asm block depend
on the pointer value and not on the pointee (this is documented
behaviour of GCC). It does however make the pointer "escape" hence
following it (or combining it) with a memory-clobber works. I
generally prefer targeted barriers like above over a general
memory-clobber though.

I think you could replace the "m" constraint by an "X" constraint to
avoid allocating the target in memory if it otherwise would have been
kept in register. However, in that case I can imagine it's also
possible that even if data was previously stored in memory, the memset
(since it fully overwrites the target and no pointer to it has escaped
yet) effectively makes a new allocation for the target which may be in
register, and hence memset + barrier will only affect those registers
and leave the data previously stored in memory intact.

This actually shows there's a more general fatal flaw to these
approaches: the compiler is free to have transiently stored the data
elsewhere, and there's no way to find or erase such locations in plain
C augmented with asm-barriers. In particular it may leave potentially
sensitive values in registers, which can subsequently get written to
memory on task switch. (Especially if the crypto code uses registers
not used by the calling application, e.g. Neon-optimized crypto
algorithms).

I don't think there's any architecture-independent way out of this
situation without something like an __attribute__((confidential)) to
instruct the compiler to diligently avoid leaving copies of the data
in locations that are invisible to the programmer model.

In the meantime, the only solution I see that has even a remote chance
of being reliable is a tiny bit of architecture-dependent wrapper
written in assembly that calls the crypto code and then clears all
caller-save registers (except when used for return value) and stack
used. Figuring out how much of the stack to clear would still be a
challenge though.