[PATCH] Teach Clang how to use response files when calling other tools

Tue Aug 26 17:09:28 PDT 2014

On Tue, Aug 26, 2014 at 3:32 PM, Rafael Auler <rafaelauler at gmail.com> wrote:

> Hi Sean,
>
> I think you misunderstood me, I don't want to avoid refactoring the code,
> but I was only expressing high level concerns that I had when writing this
> code that are indeed not backed by real profiling, and would like to know
> if you share the same concerns or not. Since you do not think that these
> are problems, I will work on the refactoring and keep you updated.
>

Cool. I think it would make sense to do the change to
llvm::sys::argumentsFitWithinSystemLimits
anyways, since currently that code duplicates a lot of the code in this
patch (all performance concerns aside).

-- Sean Silva

>
> Thanks for putting in time to see this,
> Rafael
>
>
> On Tue, Aug 26, 2014 at 7:11 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>>
>> On Mon, Aug 25, 2014 at 7:56 PM, Rafael Auler <rafaelauler at gmail.com>
>> wrote:
>>
>>> Hi Sean,
>>>
>>> Thanks a lot for your careful review. I have answered your questions
>>> below, and I have few concerns. Please let me know if I am missing
>>> something:
>>>
>>> This code is on the critical path for commands with very large lengths
>>> and this is a non-negligible time. For UNIX, for example, this would be
>>> triggered for commands spanning more than 2MB of arguments and Clang spends
>>> a good amount of time to parse even trivial arguments, because it is a huge
>>> number of arguments.
>>>
>>> Another important point is that, currently, the Command class stores
>>> arguments as a list of char *.
>>>
>>> This is the reason for this design with lots of naked char *. It avoids
>>> creating StringRefs, that would measure the length of char* in their
>>> constructor and cause more overhead. It also avoids using high-level
>>> classes such as streams and bump allocators because, since this is likely
>>> to be called only when the number of arguments is very large, would cause
>>> them to repeatedly perform heap allocations until they finally settle to
>>> support the final size of the string. Either that or we would need to work
>>> with very large slabs, wasting memory. In this design, the length is
>>> calculated once and a single allocation is made, similar to what is done in
>>> Execute() in Support/Windows/Program.inc.
>>>
>>> Therefore, by calculating the size of the final command line beforehand,
>>> even though it looks like we are losing by iterating over all args twice
>>> (one for length calculation and other for parsing), we win by eliminating
>>> inefficiencies in the allocation mechanism in runtime or in memory
>>> consumption.
>>>
>>
>> Sorry, but this sounds like a self-contradictory mishmash of performance
>> rules of thumb cherry-picked to sound like an excuse for not refactoring
>> the code.
>>
>> It's obvious you haven't profiled this because if you had profiled (and
>> these were actually performance issues), the first thing you would do is
>> modify llvm::sys::argumentsFitWithinSystemLimits to return the required
>> buffer size, since it is already doing this exact length calculation you
>> are doing (or could be easily modified to do it). You would also have
>> considered using a raw_ostream since that bounds the memory consumption by
>> the size of the buffer that it periodically flushes to the OS.
>>
>> If you seriously want to justify not making that change solely on the
>> basis of performance, please run some concrete benchmarks and profiles and
>> share the data.
>> If you just don't have time or interest in doing the refactoring, that's
>> also fine, but please say so.
>>
>> -- Sean Silva
>>
>>
>>
>>>
>>> Regards,
>>> Rafael
>>>
>>> ================
>>> Comment at: lib/Driver/Job.cpp:122
>>> @@ +121,3 @@
>>> +/// will escape the regular backslashes (used in Windows paths) and
>>> quotes.
>>> +static std::unique_ptr<char[]> FlattenArgs(const char **args) {
>>> +  // First, determine the length of the command line.
>>> ----------------
>>> silvas wrote:
>>> > This would be a lot nicer with ArrayRef<const char *>; that way, you
>>> won't lose the length.
>>> That's a good point. However, I am afraid that I would still need to
>>> recalculate the length by traversing each argument, calculating its length
>>> and assessing whether it needs quoting or not.
>>
>>
>>> ================
>>> Comment at: lib/Driver/Job.cpp:169
>>> @@ +168,3 @@
>>> +  llvm::SmallVector<const char *, 128> NewArgv;
>>> +  unsigned len = 0;
>>> +  for (auto Input : Inputs) {
>>> ----------------
>>> silvas wrote:
>>> > Check your naming here and in other places.
>>> Thanks for pointing out, I fixed the naming. Will submit the patch soon.
>>>
>>> ================
>>> Comment at: lib/Driver/Tools.h:546-550
>>> @@ +545,7 @@
>>> +    llvm::sys::EncodingMethod getResponseFileEncoding() const override {
>>> +#if defined(LLVM_ON_WIN32)
>>> +      return llvm::sys::EM_CurrentCodePage;
>>> +#else
>>> +      return llvm::sys::EM_UTF8;
>>> +#endif
>>> +    }
>>> ----------------
>>> silvas wrote:
>>> > Ew... let's not have a bunch of random #ifdef's all over the code
>>> outside libSupport.
>>> I agree that this is inadequate. My previous patch version delegated
>>> this reasoning to libSupport. However, it was more difficult to understand
>>> and I changed it for clarity. Now, it is obvious that the response file
>>> encoding for this tool depends on whether it is running on Windows or not.
>>> In my previous libSupport version, I had enum members that meant "this is
>>> UTF on UNIX but UTF16 on Windows", and so on. I think this is not a good
>>> design because if you need to support a new tool, you may potentially need
>>> to add a new enum member in libSupport to encode your tuple
>>> <MyUNIXEncoding, MyWindowsEncoding>.
>>>
>>> Since the encoding depends on the tool and on the operating system, I
>>> would appreciate if you have any suggestion on how to code this. Meanwhile,
>>> I will think in something.
>>>
>>> http://reviews.llvm.org/D4897
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140826/4196c7c1/attachment.html>