[cfe-dev] [RFC] volatile mem* builtins
John McCall via cfe-dev
cfe-dev at lists.llvm.org
Wed May 6 16:23:39 PDT 2020
On 6 May 2020, at 18:40, JF Bastien wrote:
> Hi fans of volatility!
>
> I’d like to add volatile overloads to mem* builtins, and authored a
> patch: https://reviews.llvm.org/D79279
> <https://reviews.llvm.org/D79279>
>
> The mem* builtins are often used (or should be used) in places where
> time-of-check time-of-use security issues are important (e.g. copying
> from untrusted buffers), because it prevents multiple reads / multiple
> writes from occurring at the untrusted memory location. The current
> builtins don't accept volatile pointee parameters in C++, and merely
> warn about such parameters in C, which leads to confusion. In these
> settings, it's useful to overload the builtin and permit volatile
> pointee parameters. The code generation then directly emits the
> existing volatile variant of the mem* builtin function call, which
> ensures that the affected memory location is only accessed once
> (thereby preventing double-reads under an adversarial memory mapping).
>
> Side-note: yes, ToCToU avoidance is a valid use for volatile
> <https://wg21.link/p1152r0#uses>.
>
> My patch currently only affects:
> __builtin_memcpy
> __builtin_memmove
> __builtin_memset
> There’s a bunch more “mem-like” functions such as bzero, but
> those 3 are the ones I expect to be used the most at the moment. We
> can add others later.
>
> John brought up the following: __builtin_memcpy is a library builtin,
> which means its primary use pattern is #define tricks in the C
> standard library headers that redirect calls to the memcpy library
> function. So doing what you're suggesting to __builtin_memcpy is also
> changing the semantics of memcpy, which is not something we should do
> lightly. If we were talking about changing a non-library builtin
> function, or introducing a new builtin, the considerations would be
> very different.
>
> I can instead add __builtin_volatile_* functions which are overloaded
> on at least one pointee parameter being volatile.
So, to be clear, you would like there to be some way to request a
volatile `memcpy` (etc.). You don’t need it to specifically be
`__builtin_memcpy` (etc.) — i.e. you’re not relying on this
automatically triggering when users call `memcpy` — you just need some
way to spell it.
A few thoughts:
- A `memcpy`/`memmove` is conceptually a load from one address and a
store to another. It is potentially valuable to know that e.g. only the
store is `volatile`. We can’t express that in today’s LLVM
intrinsics, but it’s certainly imaginable that we could express it in
the future, the same way that we added the ability to record different
alignments for both sides. So I think it would be nice if whatever we
do here allows us to pick up on the difference, e.g. by triggering based
on the qualification of the source/dest pointers.
- There are other qualifiers that can meaningfully contribute to the
operation here besides `volatile`, such as `restrict` and (more
importantly) address spaces. And again, for the copy operations these
might differ between the two pointer types.
In both cases, I’d say that the logical design is to allow the
pointers to be to arbitrarily-qualified types. We can then propagate
that information from the builtin into the LLVM intrinsic call as best
as we’re allowed. So I think you should make builtins called
something like `__builtin_overloaded_memcpy` (name to be decided) and
just have their semantics be type-directed.
I do think it would treacherous to actually apply these semantics to
`memcpy` via `__builtin_memcpy`, though.
John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200506/3c48ca6c/attachment-0001.html>
More information about the cfe-dev
mailing list