[cfe-dev] [RFC] volatile mem* builtins

Wed May 6 16:23:39 PDT 2020

On 6 May 2020, at 18:40, JF Bastien wrote:
> Hi fans of volatility!
>
> I’d like to add volatile overloads to mem* builtins, and authored a 
> patch: https://reviews.llvm.org/D79279 
> <https://reviews.llvm.org/D79279>
>
> The mem* builtins are often used (or should be used) in places where 
> time-of-check time-of-use security issues are important (e.g. copying 
> from untrusted buffers), because it prevents multiple reads / multiple 
> writes from occurring at the untrusted memory location. The current 
> builtins don't accept volatile pointee parameters in C++, and merely 
> warn about such parameters in C, which leads to confusion. In these 
> settings, it's useful to overload the builtin and permit volatile 
> pointee parameters. The code generation then directly emits the 
> existing volatile variant of the mem* builtin function call, which 
> ensures that the affected memory location is only accessed once 
> (thereby preventing double-reads under an adversarial memory mapping).
>
> Side-note: yes, ToCToU avoidance is a valid use for volatile 
> <https://wg21.link/p1152r0#uses>.
>
> My patch currently only affects:
> __builtin_memcpy
> __builtin_memmove
> __builtin_memset
> There’s a bunch more “mem-like” functions such as bzero, but 
> those 3 are the ones I expect to be used the most at the moment. We 
> can add others later.
>
> John brought up the following: __builtin_memcpy is a library builtin, 
> which means its primary use pattern is #define tricks in the C 
> standard library headers that redirect calls to the memcpy library 
> function. So doing what you're suggesting to __builtin_memcpy is also 
> changing the semantics of memcpy, which is not something we should do 
> lightly. If we were talking about changing a non-library builtin 
> function, or introducing a new builtin, the considerations would be 
> very different.
>
> I can instead add __builtin_volatile_* functions which are overloaded 
> on at least one pointee parameter being volatile.

So, to be clear, you would like there to be some way to request a 
volatile `memcpy` (etc.).  You don’t need it to specifically be 
`__builtin_memcpy` (etc.) — i.e. you’re not relying on this 
automatically triggering when users call `memcpy` — you just need some 
way to spell it.

A few thoughts:

- A `memcpy`/`memmove` is conceptually a load from one address and a 
store to another.  It is potentially valuable to know that e.g. only the 
store is `volatile`.  We can’t express that in today’s LLVM 
intrinsics, but it’s certainly imaginable that we could express it in 
the future, the same way that we added the ability to record different 
alignments for both sides.  So I think it would be nice if whatever we 
do here allows us to pick up on the difference, e.g. by triggering based 
on the qualification of the source/dest pointers.

- There are other qualifiers that can meaningfully contribute to the 
operation here besides `volatile`, such as `restrict` and (more 
importantly) address spaces.  And again, for the copy operations these 
might differ between the two pointer types.

In both cases, I’d say that the logical design is to allow the 
pointers to be to arbitrarily-qualified types.  We can then propagate 
that information from the builtin into the LLVM intrinsic call as best 
as we’re allowed.  So I think you should make builtins called 
something like `__builtin_overloaded_memcpy` (name to be decided) and 
just have their semantics be type-directed.

I do think it would treacherous to actually apply these semantics to 
`memcpy` via `__builtin_memcpy`, though.

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200506/3c48ca6c/attachment-0001.html>