[LLVMdev] Handling of unsafe functions
Martinez, Javier E
javier.e.martinez at intel.com
Thu Sep 20 00:55:02 PDT 2012
Thanks for the valued feedback. I agree with you that the containers available in LLVM are preferable to char buffers but I want to point out that the proposal doesn't add any new uses of char buffer and merely works with existing ones. Changing existing uses of char buffers to other objects is beyond the scope of this proposal. It makes more sense to do that when changes to code that uses string manipulation functions are made as it could incur in larger design changes.
I'm unsure of the performance impact of using the secure functions and how to balance it with the benefit of improving the code quality. If the proposal gets support I can gather performance data to make the determination of whether there is a performance hit and if it's acceptable. Hoping that authors "know what they are doing" is not enough. If that were the case there wouldn't be bugs to fix and code to review.
I don't have the output of the static analyzer at hand but will provide it on a follow up email.
From: Sean Silva [mailto:silvas at purdue.edu]
Sent: Tuesday, September 18, 2012 6:25 PM
To: Martinez, Javier E
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Handling of unsafe functions
I generally disagree with the approach.
Generally char* strings aren't recommended for use in LLVM and this kind of string manipulation in LLVM shouldn't be done with the primitive C library functions. The Programmer's Manual gives the preferred types to use for strings  and all of them keep track of length. There are also safe routines for creating and formatting strings, such as raw_ostream which is used pervasively in LLVM.
The example routine in your patch probably should just use raw_string_ostream or raw_svector_ostream, instead of relying on C-style string routines. That way, the correctness is enforced by the compiler, instead of manually laboring over these things (like checking the return code, which your patch doesn't do...).
In other words, there are completely safe alternatives for these functions for almost all cases.
One particular use case that usually pertains to memcpy though is when performance is of significant concern and hence the author "knows what they are doing" and aren't willing to sacrifice performance calling into some "secure" version when they have other assurances that the target buffer has sufficient space. The performance difference can be significant, since usually memcpy will be turned into a compiler builtin that the compiler recognizes and optimizes specially, whereas with the suggested approach, a regular call into a "llvm::*_secure"
wrapper which then calls into the OS-provided general-purpose "secure"
version will happen.
I think that it would be useful if you used the output of your static analyzer to provide a list of the places where C-style string manipulation is being done, so that these places can be migrated to using modern, safe LLVM interfaces for these operations.
On Tue, Sep 18, 2012 at 8:00 PM, Martinez, Javier E <javier.e.martinez at intel.com> wrote:
> We have identified functions in LLVM sources using a static code
> analyzer which are marked as a "security vulnerability". There
> has been work already done to address some of them for Linux (e.g.
> snprintf). We are attempting to solve this issue in a comprehensive
> fashion across all platforms. Most of the functions identified are for manipulating strings.
> Memcpy is the most commonly used of all these unsecure methods. The
> following table lists all these functions are their recommended secure
> Recommended alternatives:
> Functions Windows Unix/Mac OS
> Memcpy memcpy_s -
> Sprint sprintf_s snprintf
> Sscanf scanf_s -
> _alloca _malloca -
> Strcat strcat_s strlcat
> Strcpy strcpy_s strlcpy
> Strtok strtok_s -
> The proposal is to add secure versions of these functions. These
> functions will be implemented in LLVM Support module and be used by
> all other LLVM modules. The interface of these methods will be
> platform independent while their implementation will be platform
> specific (like the Mutex class in Support module). In cases where the
> platform does not support the functionality natively, we are writing an implementation of these functions.
> For example, in the case of memcpy the secure function will look like
> Some secure functions require additional data that needs to be passed
> (like buffer sizes). That information has to be added in all places of invocation.
> In some cases, this requires an extra size_t argument to be passed through.
> Hence, this change would not just be a one to one function
> refactoring. The attached patch helps illustrate how an instance of memcpy would be modified.
> Is this proposal of interest to the LLVM community? Can you also
> comment if the approach specified is good to address this issue?
>  http://msdn.microsoft.com/en-us/library/ms235384(v=vs.80).aspx
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
More information about the llvm-dev