[PATCH] D13989: This removes the eating of the error in Archive::Child::getSize() when the charactersin the size field in the archive header for the member is not a number.

Mon Nov 2 12:41:01 PST 2015

Hi Rafael,

Again thanks for all your help on this and willingness to take a second look and the option of idea of the iterator having an ErrorOr<Child>. 

I have updated your diffs with changes to the issues you noted and they are here:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: updated_t.diff
Type: application/octet-stream
Size: 31514 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151102/51991faa/attachment.obj>
-------------- next part --------------

I think the way forward at this point is to abandon the existing patch and start again with new patch based on the diffs above.

And do have a few comments in line below on some of the specifics.

Thanks again,
Kev

> On Nov 2, 2015, at 5:38 AM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
> 
> I have changed code that was using archive_iterator in cases where
> end() (or error) was not possible to use Child.
> 
> This reduced the impact of this change. An rebased patch is attached.
> 
> In addition to the previous issue, I noticed that we use 'auto' in
> case the style guide recommends against:
> 
>  if (auto Err = ChildOrErr.getError())
> 
> should be just std::error_code.
> 

Done.

> 
> -    outs() << ' ' << format("%6llu", C.getSize());
> +    ErrorOr<uint64_t> Size = C.getSize();
> +    if (Size.getError())
> +      outs() << ' ' << "bad size";
> 
> It would be much better for llvm-ar (and any tool IMHO) to fail when
> finding a corrupted file. In the case of llvm-ar, just call fail.

I would say it depends on the tool and what the tool is doing.

In the case above the llvm-ar tool is printing out the archive header.  Since I often run tools on malformed files during development I like tools to show be what is there rather than just calling “fail” and bailing out in my opinion.  In this case what I really want is to see the characters of the archive header’s field for Size as during development I likely messed things up and need to see what is there so I can fix things.

But to do this I would need to use the getHeader() method then print the Size[10] characters in this case.  And this really should be done for all the fields.  Which would mean making all the methods like getUID(), getGID(), etc, all return ErrorOr<> like getSize().  But that seams like changes that should be left for another patch.

> 
> 
> Cheers,
> Rafael
> 
> 
> On 31 October 2015 at 19:08, Rafael Espíndola
> <rafael.espindola at gmail.com> wrote:
>> The main problem with the patch was making Archive.cpp more
>> complicated. I have fixed that by not storing child_iterators in the
>> Archive.
>> 
>> Now I think the main remaining issues are cases like
>> 
>> -    if (ChildIt != A->child_end()) {
>> +    if (*ChildIt && ChildIt != A->child_end()) {
>> 
>> This silently ignores the error. To keep this patch small, I would
>> suggest that for now this be changed to
>> 
>> If (std::error_code EC = ChildIt->getError())
>>  report_fatal_error(EC.message());
>> if (ChildIt != A->child_end()) {
>> …

OK, I have made these changes.

>> 
>> A followup patch can propagate the error.

I agree with you on that but I think the two places that needed this change need to deal better with errors.  As they are in the Jit and should not call report_fatal_error() with malformed input.  But I think it best for Lang or others to clean up the error handling and work to eliminate the use of report_fatal_error() for malformed input.

>> 
>> I have attached a patch rebased on trunk that stores an ErrorOr<Child>
>> in the iterator.

Got it and my diffs above were based on that.

>> 
>> Cheers,
>> Rafael
>> 
>> 
>> 
>> 
>> On 31 October 2015 at 10:14, Rafael Espíndola
>> <rafael.espindola at gmail.com> wrote:
>>> OK, I am starting to lean a bit more on the idea of the iterator
>>> having an ErrorOr<Child>. I will take a second look at that option.
>>> 
>>> Sorry for the detour.
>>> 
>>> Cheers,
>>> Rafael
>>> 
>>> 
>>> On 29 October 2015 at 22:53, Lang Hames via llvm-commits
>>> <llvm-commits at lists.llvm.org> wrote:
>>>> That's a good point: next() has to return an ErrorOr<iterator>, and you have
>>>> to distinguish between failure and end(), so if you want to check each call
>>>> to next() you need to do something like:
>>>> 
>>>> auto ErrOrItr = File->error_or_child_begin();
>>>> if (!ErrOrItr)
>>>>  return ErrOrItr.getError();
>>>> auto Itr = *ErrOrItr;
>>>> while (Itr != file->child_end()) {
>>>>  auto &Thing = *Itr;
>>>> 
>>>>  ErrOrItr = increment(Itr);
>>>>  if (!ErrOrItr)
>>>>    return ErrOrItr.getError();
>>>> }
>>>> 
>>>> If you write an iterator wrapper that includes a failure state you can get
>>>> that down to:
>>>> 
>>>> auto ErrOrItr = File->child_begin();
>>>> for (; ErrOrItr && ErrOrItr != File->child_end(); ErrOrItr.next()) {
>>>>  Child = *ErrOrItr;
>>>>  // ...
>>>> }
>>>> if (!ErrOrItr)
>>>>  return ErrOrItr.getError();
>>>> 
>>>> But now you have to be careful to check the error outside the loop.
>>>> 
>>>> Looking at Kevin's updated diff, the change to the iterators made those
>>>> loops a lot uglier. The introduction of the Increment function fixes some of
>>>> that, but at the cost of risking the problem that you were trying to avoid:
>>>> It's suddenly very easy to "increment" the iterator without actually
>>>> checking the error.
>>>> 
>>>> - Lang.
>>>> 
>>>> 
>>>> On Thu, Oct 29, 2015 at 8:30 PM, Kevin Enderby <enderby at apple.com> wrote:
>>>>> 
>>>>> Hi Lang and others,
>>>>> 
>>>>> This iterator basically requires another indirection to get to the
>>>>> underlying Child.  So you get a compile error when trying to use the
>>>>> iterator as it is an <ErrorOr>.  So the example really is to change
>>>>> something like this (as the diff in ELF/InputFiles.cpp):
>>>>> 
>>>>> for (const Archive::Child &Child : File->children()) {
>>>>>  …
>>>>> 
>>>>> to this:
>>>>> 
>>>>> for (auto &ChildOrErr : File->children()) {
>>>>>    error(ChildOrErr, "Could not get the child of the archive " +
>>>>>                          File->getFileName());
>>>>>    const Archive::Child Child(*ChildOrErr);
>>>>>  ...
>>>>> 
>>>>> Which seems a bit cleaner than testing the result of child_begin() before
>>>>> the for loop and checking getNext() even if it can be wrapped in some case
>>>>> in and Increment() routine.
>>>>> 
>>>>> My thoughts,
>>>>> Kev
>>>>> 
>>>>> On Oct 29, 2015, at 7:50 PM, Lang Hames via llvm-commits
>>>>> <llvm-commits at lists.llvm.org> wrote:
>>>>> 
>>>>> Yep. The standard C++ iterator concept doesn't apply here, since it
>>>>> doesn't deal with failure, and checking on deref rather than increment means
>>>>> you can accidentally do multiple increments without checking. My intent was
>>>>> to have increment crash if you run it on an error'd out iterator, but that's
>>>>> not as good as a compile-time check. I think it's a matter of personal taste
>>>>> whether the cost (IMHO fairly minimal, since in practice you almost always
>>>>> deref after increment) is worth paying for the gain (also arguably minimal)
>>>>> of a neater iteration idiom.
>>>>> 
>>>>> I don't have particularly strong feelings either way.
>>>>> 
>>>>> - Lang.
>>>>> 
>>>>> On Thu, Oct 29, 2015 at 5:51 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, Oct 29, 2015 at 5:27 PM, Lang Hames via llvm-commits
>>>>>> <llvm-commits at lists.llvm.org> wrote:
>>>>>>> 
>>>>>>> Hi Rafael,
>>>>>>> 
>>>>>>> How could you ignore the error in Kevin's solution?
>>>>>>> 
>>>>>>> I think your solution and his are essentially the same, except that you
>>>>>>> check the error on increment and he checks it on dereference. In practice
>>>>>>> you always have to do both operations (you can't do multiple increments
>>>>>>> without checking the error anyway, and you can just do that by deref'ing),
>>>>>>> and checking via deref allows you to used range-based for, which makes the
>>>>>>> code neater.
>>>>>> 
>>>>>> 
>>>>>> I'm not taking a position either way - but that does present a difficult
>>>>>> problem for the iterator contract, to say that you can't increment it twice
>>>>>> without dereferencing... that's not really in the iterator playbook/contract
>>>>>> (could implement a little extra support for that by allowing an error-state
>>>>>> iterator to be incremented to become the end iterator). I know you've
>>>>>> mentioned it before (the "can't increment twice without checking") but it
>>>>>> became a bit clearer to me in this recent email exchange to the point that I
>>>>>> would be a little less comfortable with it.
>>>>>> 
>>>>>> I can't find the example you guys had:
>>>>>> 
>>>>>> for (auto &Thing : things) {
>>>>>>  if (!Thing)
>>>>>>    return Thing.getError();
>>>>>>  ...
>>>>>> }
>>>>>> 
>>>>>> & yeah, nothing that comes to mind is as tidy/simple as that, my nearest
>>>>>> might be:
>>>>>> 
>>>>>> auto Iter = things.begin();
>>>>>> while (auto &Thing = Iter.Next()) {
>>>>>>  ....
>>>>>> }
>>>>>> if (!Iter)
>>>>>>  return Iter.getError();
>>>>>> 
>>>>>> (where "Iter" isn't a traditional iterator in any sense - it has
>>>>>> ErrorOr<T> Next() and getError (& possibly boolean testability))
>>>>>> 
>>>>>> The obvious sort of problem I imagine arising from the iterator/range API
>>>>>> is something like:
>>>>>> 
>>>>>>  int count = std::distance(things.begin(), things.end());
>>>>>> 
>>>>>> Easy to write, and would produce an infinite loop (or crash?) if there
>>>>>> was an error during iteration.
>>>>>> 
>>>>>> - Dave
>>>>>> 
>>>>>> - Dave
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Lang.
>>>>>>> 
>>>>>>> On Thu, Oct 29, 2015 at 5:11 PM, Rafael Espíndola
>>>>>>> <rafael.espindola at gmail.com> wrote:
>>>>>>>> 
>>>>>>>> On 28 October 2015 at 15:21, Kevin Enderby <enderby at apple.com> wrote:
>>>>>>>>> enderby added a comment.
>>>>>>>>> 
>>>>>>>>> I agree with Lang and like the original patch where were were
>>>>>>>>> handling failure was done inside the iterator.  It does allow use of
>>>>>>>>> ranged-based for loops and appears to me much cleaner in the code especially
>>>>>>>>> in the lld changes needed in http://reviews.llvm.org/D13990 .
>>>>>>>> 
>>>>>>>> I think that the most important consideration is making the error hard
>>>>>>>> to ignore. That means using ErrorOr.
>>>>>>>> 
>>>>>>>> The sad reality is that the archive format can cause us to find an
>>>>>>>> error when going from one member to the next. That, with the above
>>>>>>>> requirement means that it doesn't fit on range loop.
>>>>>>>> 
>>>>>>>>> Rafael, I did "finish it up" based on your t.diff patch and that is
>>>>>>>>> below and includes the re-formatting with clang-format-diff, removal of the
>>>>>>>>> malformed-archives directory putting the .a files in Inputs and removal of
>>>>>>>>> the "nice but independent" changes.
>>>>>>>>> 
>>>>>>>>> F1026846: llvm_updated_t.diff <http://reviews.llvm.org/F1026846>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks!
>>>>>>>> 
>>>>>>>> You don't need to change ErrorOr.h.
>>>>>>>> 
>>>>>>>> The change to detect the end of the archive seems more complicated
>>>>>>>> than necessary (see the attached patch for a suggestion).
>>>>>>>> 
>>>>>>>> Cases like the Increment in BinaryHolder expose a naked EC, which is
>>>>>>>> what using ErrorOr is trying to avoid. Just inlining it solves the
>>>>>>>> problem (see patch).
>>>>>>>> 
>>>>>>>> Please convert the remaining user of Increment to return void when
>>>>>>>> possible.
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Rafael
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>> 
> <t.diff>