[PATCH] Allow for the use of StringRef command line options instead of cl::opt<std::string>

Reid Kleckner rnk at google.com
Tue Oct 1 12:45:19 PDT 2013


You still haven't fixed ParseEnvironmentOptions.  It is tested in
llvm/unittests/CommandLineTest.cpp, but it uses cl::opt<std::string> still.
 If you switch it to StringRef, the UAF will appear.

This API is actually used:
https://github.com/inkdev/Embedded-Mono-Sample/blob/master/Source/mono/mini/mini-llvm-cpp.cpp

---

The clang patch is basically moving the gross global string storage from
LLVM to clang.  That seems fine, since LLVM needs to embed into far more
applications than clang, and must support multi-threading.

Someone might object to these global strings and exit time destructors in
clang, but these globals were already there because we were already calling
ParseCommandLineOptions, thereby setting global std::strings in LLVM.  I
think the clang patch is fine.

---

+  .. warning::
+
+    Note: Recent changes to the CommandLine parser have made it such that
arguments
+    are stored not as std::strings bug as StringRefs. These changes were
made to

"bug" should be "but".  I don't think the std::string vs. StringRef
distinction matters to users.  I would front load the fact that strings
passed to ParseCommandLineOptions must live for the entire process
lifetime, like strings in argv from main.  After the initial warning, I
would state that the library does not make copies of strings to avoid
dynamic initializers and finalizers.

+    address certain issues with overhead in startup costs of LLVM
components that
+    use CommandLine option static initializers. This means that char
\*\*argv passed
+    to cl::ParseCommandLineOptions() should take string data that persists
throughout
+    a given programs execution (main's char \*\*argv meets this criteria
of course).
+    That is, anything not statically allocated must not be freed in some
local scope.
+


On Tue, Oct 1, 2013 at 2:46 PM, Puyan Lotfi <plotfi at apple.com> wrote:

> Ah great, thank you Michael. I’ve modified the function to follow the
> coding standards
> (clangWorkaroundParseCommandLineOptionsStringMapStaticLocal.diff).
>
> I believe the rest of my patches were already approved.
> Does anyone (Reid?) have any feedback to modifications I made to the
> CommandLine library documentation?
>
> I think this is all close to being ready for check in.
>
> Thanks
>
> -Puyan
>
>
>
>
>
>
>
>
> On Oct 1, 2013, at 11:31 AM, Michael Ilseman <milseman at apple.com> wrote:
>
> Coding Standards: http://llvm.org/docs/CodingStandards.html
>
> "The problem with anonymous namespaces is that they naturally want to
> encourage indentation of their body, and they reduce locality of reference:
> if you see a random function definition in a C++ file, it is easy to see if
> it is marked static, but seeing if it is in an anonymous namespace requires
> scanning a big chunk of the file."
>
> Thus the function should be static.
>
> "Static constructors and destructors (e.g. global variables whose types
> have a constructor or destructor) should not be added to the code base, and
> should be removed wherever possible. Besides well known problems<http://yosefk.com/c++fqa/ctors.html#fqa-10.12>
>  where the order of initialization is undefined between globals in
> different source files, the entire concept of static constructors is at
> odds with the common use case of LLVM as a library linked into a larger
> application."
>
> We don't want static constructors unless we must.
>
> "The use of #include <iostream> in library files is hereby *forbidden*,
> because many common implementations transparently inject a static
> constructor <http://llvm.org/docs/CodingStandards.html#static-constructor>
>  into every translation unit that includes it."
> "LLVM includes a lightweight, simple, and efficient stream implementation
> in llvm/Support/raw_ostream.h, which provides all of the common features
> of std::ostream. All new code should use raw_ostream instead of ostream."
>
> raw_ostream seems to be an example of avoiding the static constructor
> problem (and others). In raw_ostream.cpp:
>   /// errs() - This returns a reference to a raw_ostream for standard
> error.
>   /// Use it like: errs() << "foo" << "bar";
>   raw_ostream &llvm::errs() {
>     // Set standard error to be unbuffered by default.
>     static raw_fd_ostream S(STDERR_FILENO, false, true);
>     return S;
>   }
>
> Thus I believe using a static local is the preferred pattern to use over
> static constructor, when you must have that behavior. Only users pay the
> cost of initialization.
>
> Of course, if your data structure is not truly global/static (i.e. it
> should be tied to a context), then using either a static global or static
> local is wrong, and it needs to be put in a corresponding context. I don't
> know your intent here, but it does seems to be truly global.
>
>
> On Oct 1, 2013, at 11:23 AM, Puyan Lotfi <plotfi at apple.com> wrote:
>
> Oh man. I actually wanted to do that too, but I thought it would be
> frowned upon stylistically.
> You think that’d be ok?
>
> -Puyan
>
> On Oct 1, 2013, at 11:19 AM, Michael Ilseman <milseman at apple.com> wrote:
>
> Ah, my mistake. The LLVM Coding Standards uses my exact mistake as
> motivation for using of "static" instead of namespace on functions.
>
> For the variable, doesn't this make it a static constructor (which is also
> recommended against)? Would it make sense to have it be a static local
> variable, so that the constructor only need run when the data structure is
> actually first used?
>
> On Oct 1, 2013, at 11:14 AM, Puyan Lotfi <plotfi at apple.com> wrote:
>
> I did, but it’s already inside the unnamed namespace so I didn’t; thought
> it does the same thing.
> Do you think I should make PersistentClArgString and
> getPersistentClArgString() static for the sake of clarity?
>
> -Puyan
>
> On Oct 1, 2013, at 10:54 AM, Michael Ilseman <milseman at apple.com> wrote:
>
> Did you mean to make those static, or can they be referred to externally?
>
> On Sep 30, 2013, at 8:10 PM, Puyan Lotfi <plotfi at apple.com> wrote:
>
> Reid,
>
> After going over the code (specifically in libclang), I just can’t
> guarantee that all entry points into CompilerInvocation::CreateFromArgs()
> are passing arguments that aren’t stack or dynamically allocated then freed
> (For example, whatever calling clang::createInvocationFromCommandLine could
> pass in args pointing to strings from its stack). I think a practical
> solution is to have a persisting unique string pool in this instance. I’ve
> decided to change my code to use StringMap instead, so that should ease
> your concern over the complexity. If there is an easy to reuse global
> string pool within clang, I’d certainly like to know (I’ve heard that there
> are such pools in several LLVM projects for exactly this purpose, but I am
> not familiar with them). The global in this case is local to the file; I
> think for the sake of practicality it might be ok to allow this? Either
> that, or assume that CompilerInvocation instances are live for the lifetime
> of whatever main() is calling libclang because I’m not sure if StringRef on
> CodeGenOptions is any better of an assumption.
>
> I am open to suggestions.
>
> Thanks
>
> Puyan
>
> <clangWorkaroundParseCommandLineOptionsStringMap.diff>
>
> On Sep 30, 2013, at 2:47 PM, Reid Kleckner <rnk at google.com> wrote:
>
> Can you try changing the std::strings in CodeGenOptions to StringRefs?
>  The string persistance patch is n^2 and uses globals.
>
>
> On Mon, Sep 30, 2013 at 2:24 PM, Puyan Lotfi <plotfi at apple.com> wrote:
>
>> Hi
>>
>> As requested, I have ran the address sanitizer tests. I ran them many
>> times, and could not find any reproducible failures.
>> I did notice some intermittent failures that came and went depending on
>> what revision I had gotten from SVN.
>>
>> Due to concern for use-after-free I have included a patch to clang that
>> allows strings passed to ParseCommandLineOptions() from the
>> CreateTargetMachine() function to persist (this was the only location in
>> LLVM or clang where I found ParseCommandLineOptions being called with
>> automatic memory); what it does is strdups the args for -debug-pass,
>> -limit-float-precision as well as backend args and stores the result to a
>> global vector of StringRefs. This vector is only populated for unique
>> strings to avoid leaking repeatedly.
>>
>> The other three patches include changes to the CommandLine library to
>> enable StringRef (which I already sent out), changes to places using
>> cl::opt<std::string> to use StringRef (also already sent out), and also the
>> requested changes to the CommandLine documentation.
>>
>> I think this should cover all the basis. I’ll check in if this looks ok
>> to you all.
>>
>> Thanks
>>
>> -Puyan
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sep 23, 2013, at 10:43 AM, Reid Kleckner <rnk at google.com> wrote:
>>
>> I worry that this patch creates a significant risk for use-after-free
>> bugs.  Can you:
>>
>> 1. Explain how string use-after-free is supposed to be avoided?
>> 2. Update the docs if ParseCommandLine options is no longer going to save
>> copies of strings.
>> 3. Make sure the internal tests pass with an AddressSanitizer build,
>> because I see things like ParseEnvironmentOptions which look broken with
>> this change.
>>
>>
>> On Mon, Sep 23, 2013 at 10:00 AM, Chris Lattner <clattner at apple.com>wrote:
>>
>>>
>>> On Sep 19, 2013, at 3:01 PM, Puyan Lotfi <plotfi at apple.com> wrote:
>>>
>>> > Hi All
>>> >
>>> > I’ve improved this StringRef patch and I’d like some more feedback and
>>> hopefully a submission.
>>> > My changes to the original patch thus far have been:
>>> >
>>> > - I added changes to CreateTargerMachine() in clang to have the
>>> SmallVector passed in from the caller.
>>> > - I found more places where cl::opt<str::string> was used, and I
>>> changed them to use cl::opt<StringRef>.
>>> >
>>> > I’ve attached a patch that should apply cleanly with an llvm checkout
>>> that includes clang checked out into the tools directory.
>>>
>>> This looks great to me.  When you commit it, please split it into two
>>> pieces though:
>>>
>>> 1) The part that adds cl compatibility with StringRef.
>>> 2) The part that switches lots of cl::opt<> to use StringRef instead of
>>> std::string.
>>>
>>> Thanks Puyan,
>>>
>>> -Chris
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>
>>
>>
>>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131001/382ae2c2/attachment.html>


More information about the llvm-commits mailing list