[cfe-dev] Automatic variable initialization and copy_from_user()

Dmitry Vyukov via cfe-dev cfe-dev at lists.llvm.org
Tue Feb 11 04:38:25 PST 2020

On Tue, Feb 11, 2020 at 1:31 PM Florian Hahn <florian_hahn at apple.com> wrote:
> Hi,
> > On Feb 11, 2020, at 11:13, Alexander Potapenko <glider at google.com> wrote:
> > It would help to mark copy_from_user() as a function that initializes
> > one of these arguments, but only if it does not fail. We also need to
> > tell the compiler about the size of that argument, so that the
> > function prototype will look along the lines of:
> >
> > unsigned long copy_from_user (void *  to, const void __user *  from,
> > unsigned long  n)
> >  __attribute__((param_sizeof(1, 3)) __attribute__((param_init_if_ret0(1)));
> >
> > , where param_sizeof(1, 3) means that parameter #3 contains the size
> > of parameter #1, and param_init_if_ret0(1) means that parameter #1 is
> > initialized iff the function returns 0.
> >
> > This looks quite clumsy, but maybe can be made more elegant and
> > general enough to handle other cases.
> >
> > Does the idea make sense to you? Maybe you've seen other cases in
> > which a similar optimization can be applied?
> > To people working on DSE right now: do you think it's possible to
> > perform such an optimization assuming that we know about the behavior
> > of copy_from_user()?
> Yes, I think MemorySSA backed DSE should be able to handle this (it’s still WIP, but the first patch just landed). The key question is how to convey the fact that a function fully initializes `input` along some paths. The MemorySSA-backed DSE patch series will eliminate the first memset in the example below, if there’s no read access on the error path.
> From the DSE perspective, it would be easiest if Clang could inject something like an intrinsic that marks `input` as initialized on the success path (e.g. inserts a call to the not-yet existing @llvm.initialized(%ptr, size)). IMO it seems appropriate to handle this at the Clang level, as it seems quite language/attribute-specific. As a stepping stone, it might be feasible to provide a builtin to the user to specify that an object is initialized along a certain path.
> declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) noun-wind
> declare i1 @cond() readnone
> declare void @use(i8*)
> define void @test19(i32* noalias %P) {
> entry:
>   %obj = alloca [10 x i8]
>   %ptr = bitcast [10 x i8]* %obj to i8*
>   call void @llvm.memset.p0i8.i64(i8* %ptr, i8 0, i64 10, i1 false)
>   %succ = call i1 @cond()
>   br i1 %succ, label %copy, label %err
> copy:
>   call void @llvm.memset.p0i8.i64(i8* %ptr, i8 0, i64 10, i1 false)
>   call void @use(i8* %ptr)
>   br label %exit
> err:
>   ;call void @use(i8* %ptr)
>   ret void
> exit:
>   ret void
> }

One alternative for completeness. There is a version of copy_from_user
that always initializes the whole range (if part is not copied from
userspace, it's zeroed). Handling such a version is much easier,
right. However, some versions don't do this (for performance?). Maybe
we could make _all_ copy_from_user's initialize whole range always iff
memory initialization is enabled. Then we could add a simpler
attribute for clang that will convey that copy_from_user initializes
the memory.

More information about the cfe-dev mailing list