[LLVMdev] Handling of unsafe functions

Thu Sep 27 00:40:40 PDT 2012

Sean,

Thanks for the suggestion on the patch submission. I started with the first patch to address a couple of calls to C string functions in Errno.cpp. If that goes well I'll prepare a patch for each function occurrence.

> I went through a couple other random uses of memcpy() in the LLVM tree and none of them seemed like they would benefit.
I understand that not all uses of memcpy will clearly benefit from a replacement to the proposed memcpy_secure. For people using LLVM under Windows using Visual Studio there's an additional benefit; LLVM can be compiled without deprecating the functions marked as unsecure. When this is a requirement the alternative is to make the changes locally which becomes a problem when upgrading LLVM versions. I still think that abstracting the implementation of memcpy facilitates the work for users in Windows without having a significant impacting users in other platforms.

Thanks,
Javier

-----Original Message-----
From: Sean Silva [mailto:silvas at purdue.edu] 
Sent: Monday, September 24, 2012 3:37 PM
To: Martinez, Javier E
Cc: Richard Smith; Chris Lattner; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Handling of unsafe functions

> I'll prepare a patch that replaces the string manipulation functions 
> an appropriate  string object.

Please break the patch up into focused chunks, one per logical change.
We try to keep all LLVM development as incremental as possible [1]. I recommend fixing a single logical occurrence (such as fixing
APFloat::convertToHexString()) and then mailing the patch to llvm-commits. It's likely that the feedback you get from the review of the first patch will help inform future patches and simplify later patch reviews.

[1] http://llvm.org/docs/DeveloperPolicy.html#incremental-development

> I know both functions can be used the wrong way but at least the "secure"
> version makes one think about the value passed for destination size. 
> Very likely most of the 1691 uses of memcpy in Clang/LLVM are correct 
> but with such a high number of uses there are likely a few that are 
> not. I'm willing to plow through all the calls to check the parameters 
> while making the change to the memcpy_secure version from the proposal.

My inclination is that it will be redundant. For example, consider this usage, grabbed randomly from the source code:

      char *Buf = static_cast<char *>(Allocate(Directory.size()));
      memcpy(Buf, Directory.data(), Directory.size());

it's not that clear to me that just adding an extra argument of
Directory.size() buys anything. If anything, it seems like it introduces _another_ place to make an error.

I went through a couple other random uses of memcpy() in the LLVM tree and none of them seemed like they would benefit.

> [...] I'm willing
> to plow through all the calls to check the parameters while making the 
> change to the memcpy_secure version from the proposal.

By all means, please take a look at them. However, it might be overall more beneficial to go through and report "sketchy" uses, where the copy is not obviously correct, then discuss on the list how to ensure the code is safe. Once all of the uses are audited, you can then just write a little script using something like git's "pickaxe"
functionality (see `-S` option to git-diff(1), and other commands that take "diff" options) to send you an email every time a new use of memcpy gets introduced into the codebase. For example, you could have a daily cron job that runs `git log -p -S'memcpy('
origin/master@{yesterday}..origin/master` and if there is any output then it mails that output to you.

-- Sean Silva

On Thu, Sep 20, 2012 at 11:52 PM, Martinez, Javier E <javier.e.martinez at intel.com> wrote:
> From the responses it's pretty clear that the preference is to avoid 
> using C string functions altogether. I've attached at list of calls in Clang/LLVM.
> The EASY/MEDIUM/DIFFICULT tag is an estimate of the effort to replace 
> the call based on the location of the source buffer. If there are no 
> objections I'll prepare a patch that replaces the string manipulation 
> functions an appropriate  string object.
>
>
>
> The proposal comments have largely centered on the string functions. 
> Do people feel the same way about memcpy_s? What about those of you 
> building LLVM on Windows with Visual Studio?
>
>
>
> I know both functions can be used the wrong way but at least the "secure"
> version makes one think about the value passed for destination size. 
> Very likely most of the 1691 uses of memcpy in Clang/LLVM are correct 
> but with such a high number of uses there are likely a few that are 
> not. I'm willing to plow through all the calls to check the parameters 
> while making the change to the memcpy_secure version from the proposal.
>
>
>
> Thanks,
>
> Javier
>
>
>
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] 
> On Behalf Of Richard Smith
> Sent: Thursday, September 20, 2012 3:09 PM
> To: Chris Lattner
>
>
> Cc: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Handling of unsafe functions
>
>
>
> On Thu, Sep 20, 2012 at 10:13 AM, Chris Lattner <clattner at apple.com> wrote:
>
>
> On Sep 20, 2012, at 3:01 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
>
>> On Wed, Sep 19, 2012 at 3:00 AM, Martinez, Javier E 
>> <javier.e.martinez at intel.com> wrote:
>>> We have identified functions in LLVM sources using a static code 
>>> analyzer which are marked as a "security vulnerability"[1][2]. There 
>>> has been work already done to address some of them for Linux (e.g. 
>>> snprintf). We are attempting to solve this issue in a comprehensive 
>>> fashion across all platforms. Most of the functions identified are for manipulating strings.
>>> Memcpy is the most commonly used of all these unsecure methods. The 
>>> following table lists all these functions are their recommended 
>>> secure alternatives.
>>
>> I am strongly opposed to using *_s functions.  The issue is that they 
>> are no more "secure" than original functions.  One can still pass the 
>> destination buffer length incorrectly, especially if it is not known 
>> at compile time and should be computed.
>>
>> I agree with Sean that we should move the code using C strings to 
>> LLVM safe data types.
>
> I agree.
>
>
>>
>> And one more thing: it is interesting that the "unsafe"
>> APFloat::convertToHexString (from your patch) is not used anywhere.
>
> Zap it!  Oh wait, is it used by Clang or something else?
>
>
>
> Clang doesn't use it, but LLDB does, with an arbitrary-seeming 
> 256-character buffer. Perhaps we should change it to take an 
> llvm::SmallVectorImpl<char>, rather than relying on a character buffer 
> "which must be of sufficient size".
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>