[cfe-dev] Automatic variable initialization and copy_from_user()

Florian Hahn via cfe-dev cfe-dev at lists.llvm.org
Tue Feb 11 04:47:02 PST 2020

> On Feb 11, 2020, at 12:38, Dmitry Vyukov <dvyukov at google.com> wrote:
> On Tue, Feb 11, 2020 at 1:31 PM Florian Hahn <florian_hahn at apple.com> wrote:
>> Hi,
>>> On Feb 11, 2020, at 11:13, Alexander Potapenko <glider at google.com> wrote:
>>> It would help to mark copy_from_user() as a function that initializes
>>> one of these arguments, but only if it does not fail. We also need to
>>> tell the compiler about the size of that argument, so that the
>>> function prototype will look along the lines of:
>>> unsigned long copy_from_user (void *  to, const void __user *  from,
>>> unsigned long  n)
>>> __attribute__((param_sizeof(1, 3)) __attribute__((param_init_if_ret0(1)));
>>> , where param_sizeof(1, 3) means that parameter #3 contains the size
>>> of parameter #1, and param_init_if_ret0(1) means that parameter #1 is
>>> initialized iff the function returns 0.
>>> This looks quite clumsy, but maybe can be made more elegant and
>>> general enough to handle other cases.
>>> Does the idea make sense to you? Maybe you've seen other cases in
>>> which a similar optimization can be applied?
>>> To people working on DSE right now: do you think it's possible to
>>> perform such an optimization assuming that we know about the behavior
>>> of copy_from_user()?
>> Yes, I think MemorySSA backed DSE should be able to handle this (it’s still WIP, but the first patch just landed). The key question is how to convey the fact that a function fully initializes `input` along some paths. The MemorySSA-backed DSE patch series will eliminate the first memset in the example below, if there’s no read access on the error path.
>> From the DSE perspective, it would be easiest if Clang could inject something like an intrinsic that marks `input` as initialized on the success path (e.g. inserts a call to the not-yet existing @llvm.initialized(%ptr, size)). IMO it seems appropriate to handle this at the Clang level, as it seems quite language/attribute-specific. As a stepping stone, it might be feasible to provide a builtin to the user to specify that an object is initialized along a certain path.
>> declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) noun-wind
>> declare i1 @cond() readnone
>> declare void @use(i8*)
>> define void @test19(i32* noalias %P) {
>> entry:
>>  %obj = alloca [10 x i8]
>>  %ptr = bitcast [10 x i8]* %obj to i8*
>>  call void @llvm.memset.p0i8.i64(i8* %ptr, i8 0, i64 10, i1 false)
>>  %succ = call i1 @cond()
>>  br i1 %succ, label %copy, label %err
>> copy:
>>  call void @llvm.memset.p0i8.i64(i8* %ptr, i8 0, i64 10, i1 false)
>>  call void @use(i8* %ptr)
>>  br label %exit
>> err:
>>  ;call void @use(i8* %ptr)
>>  ret void
>> exit:
>>  ret void
>> }
> One alternative for completeness. There is a version of copy_from_user
> that always initializes the whole range (if part is not copied from
> userspace, it's zeroed). Handling such a version is much easier,
> right. However, some versions don't do this (for performance?). Maybe
> we could make _all_ copy_from_user's initialize whole range always iff
> memory initialization is enabled. Then we could add a simpler
> attribute for clang that will convey that copy_from_user initializes
> the memory.

That would also work from the DSE side of things. I’m not sure if/how functions that are marked to initialize a pointer are handled, but adding that to DSE should be easy. One more thing to keep in mind: if we just rely on function attributes, that might lead to missed eliminations if copy_from_user gets inlined and we fail to determine that it initialize a parameter from its contents (e.g. due to it using inline-assembly)


More information about the cfe-dev mailing list