[LLVMdev] Handling of unsafe functions

Mon Sep 24 15:37:29 PDT 2012

> I’ll prepare a patch that replaces the string manipulation functions an
> appropriate  string object.

Please break the patch up into focused chunks, one per logical change.
We try to keep all LLVM development as incremental as possible [1]. I
recommend fixing a single logical occurrence (such as fixing
APFloat::convertToHexString()) and then mailing the patch to
llvm-commits. It's likely that the feedback you get from the review of
the first patch will help inform future patches and simplify later
patch reviews.

[1] http://llvm.org/docs/DeveloperPolicy.html#incremental-development

> I know both functions can be used the wrong way but at least the “secure”
> version makes one think about the value passed for destination size. Very
> likely most of the 1691 uses of memcpy in Clang/LLVM are correct but with
> such a high number of uses there are likely a few that are not. I’m willing
> to plow through all the calls to check the parameters while making the
> change to the memcpy_secure version from the proposal.

My inclination is that it will be redundant. For example, consider
this usage, grabbed randomly from the source code:

      char *Buf = static_cast<char *>(Allocate(Directory.size()));
      memcpy(Buf, Directory.data(), Directory.size());

it's not that clear to me that just adding an extra argument of
Directory.size() buys anything. If anything, it seems like it
introduces _another_ place to make an error.

I went through a couple other random uses of memcpy() in the LLVM tree
and none of them seemed like they would benefit.

> [...] I’m willing
> to plow through all the calls to check the parameters while making the
> change to the memcpy_secure version from the proposal.

By all means, please take a look at them. However, it might be overall
more beneficial to go through and report "sketchy" uses, where the
copy is not obviously correct, then discuss on the list how to ensure
the code is safe. Once all of the uses are audited, you can then just
write a little script using something like git's "pickaxe"
functionality (see `-S` option to git-diff(1), and other commands that
take "diff" options) to send you an email every time a new use of
memcpy gets introduced into the codebase. For example, you could have
a daily cron job that runs `git log -p -S'memcpy('
origin/master@{yesterday}..origin/master` and if there is any output
then it mails that output to you.

-- Sean Silva

On Thu, Sep 20, 2012 at 11:52 PM, Martinez, Javier E
<javier.e.martinez at intel.com> wrote:
> From the responses it’s pretty clear that the preference is to avoid using C
> string functions altogether. I’ve attached at list of calls in Clang/LLVM.
> The EASY/MEDIUM/DIFFICULT tag is an estimate of the effort to replace the
> call based on the location of the source buffer. If there are no objections
> I’ll prepare a patch that replaces the string manipulation functions an
> appropriate  string object.
>
>
>
> The proposal comments have largely centered on the string functions. Do
> people feel the same way about memcpy_s? What about those of you building
> LLVM on Windows with Visual Studio?
>
>
>
> I know both functions can be used the wrong way but at least the “secure”
> version makes one think about the value passed for destination size. Very
> likely most of the 1691 uses of memcpy in Clang/LLVM are correct but with
> such a high number of uses there are likely a few that are not. I’m willing
> to plow through all the calls to check the parameters while making the
> change to the memcpy_secure version from the proposal.
>
>
>
> Thanks,
>
> Javier
>
>
>
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
> Behalf Of Richard Smith
> Sent: Thursday, September 20, 2012 3:09 PM
> To: Chris Lattner
>
>
> Cc: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Handling of unsafe functions
>
>
>
> On Thu, Sep 20, 2012 at 10:13 AM, Chris Lattner <clattner at apple.com> wrote:
>
>
> On Sep 20, 2012, at 3:01 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
>
>> On Wed, Sep 19, 2012 at 3:00 AM, Martinez, Javier E
>> <javier.e.martinez at intel.com> wrote:
>>> We have identified functions in LLVM sources using a static code analyzer
>>> which are marked as a “security vulnerability”[1][2]. There has been work
>>> already done to address some of them for Linux (e.g. snprintf). We are
>>> attempting to solve this issue in a comprehensive fashion across all
>>> platforms. Most of the functions identified are for manipulating strings.
>>> Memcpy is the most commonly used of all these unsecure methods. The
>>> following table lists all these functions are their recommended secure
>>> alternatives.
>>
>> I am strongly opposed to using *_s functions.  The issue is that they
>> are no more "secure" than original functions.  One can still pass the
>> destination buffer length incorrectly, especially if it is not known
>> at compile time and should be computed.
>>
>> I agree with Sean that we should move the code using C strings to LLVM
>> safe data types.
>
> I agree.
>
>
>>
>> And one more thing: it is interesting that the "unsafe"
>> APFloat::convertToHexString (from your patch) is not used anywhere.
>
> Zap it!  Oh wait, is it used by Clang or something else?
>
>
>
> Clang doesn't use it, but LLDB does, with an arbitrary-seeming 256-character
> buffer. Perhaps we should change it to take an llvm::SmallVectorImpl<char>,
> rather than relying on a character buffer "which must be of sufficient
> size".
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>