[llvm-dev] DWARF debug line error handling changes

Wed Jan 29 04:34:35 PST 2020

On Tue, 28 Jan 2020 at 19:37, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Tue, Jan 28, 2020 at 1:59 AM James Henderson <
> jh7370.2008 at my.bristol.ac.uk> wrote:
>
>>
>> On Mon, 27 Jan 2020 at 23:22, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>> Thanks for all the work & context, James!
>>>
>>
>> Thanks for the reviews!
>>
>>
>>>
>>> On Mon, Jan 27, 2020 at 5:51 AM James Henderson <
>>> jh7370.2008 at my.bristol.ac.uk> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Since December, I've made several changes to the DWARF debug line
>>>> parser to improve its error handling. Some examples include removing
>>>> redundant error checks[1] and making address size mismatches non-fatal to
>>>> parsing[2], with several more about to land or being planned.
>>>>
>>>> David Blaikie recommended I post something to give a bit more context
>>>> to these changes. I am a member of Sony's Linker and Binutils team, and as
>>>> part of the linking process, we make updates to the debug line section in
>>>> order to handle GC-section processing and similar modifications. In order
>>>> for these operations to be safe, we need the input line program to be in a
>>>> good state. As a result, in addition to the existing LLVM error checks, we
>>>> have added several additional downstream checks. These were added
>>>> downstream at the time for speed, but we've always had the intention of
>>>> bringing these to the wider community. We now have the time to do this.
>>>>
>>>> There are broadly-speaking two kinds of changes we are making:
>>>>
>>>> 1) Adding additional new checks to make sure the table is valid. Where
>>>> possible, these are made as late as reasonable. That way, we don't emit
>>>> errors or warnings until it is necessary, and potentially therefore not at
>>>> all in some cases.
>>>>
>>>> 2) Making existing error checks less fatal, i.e. changing the code to
>>>> allow the parser to continue even though something doesn't look right in
>>>> the code. This allows consumers to gather more information, whether for
>>>> displaying or otherwise. In several cases, the change is to assume that the
>>>> length recorded in a field is correct, even if it doesn't match what either
>>>> the standard says should be the length.
>>>>
>>>
>>> If the standard conflicts with the length when we read the length
>>> (without reading further) - yeah, I think we should fail/warn/something
>>> there. (as I think we do - if you have a length that's not long enough for
>>> the header we're expecting to read, for instance, etc) & if that means this
>>> contribution can't be parsed (eg: header's truncated) we treat it that way
>>> (this contribution is broken) & can still go on to the next contribution
>>> and read that.
>>>
>>
>> I think the code now uses the error callbacks for all potential length
>> conflicts (or will do after I land some more warnings - not checked the
>> current state). By default, this results in warnings, but clients can do
>> extra stuff like ignoring the table and then continuing from the next. With
>> my latest changes, the parser continues reading the header until the actual
>> end for the file format, even if that goes beyond the claimed header end,
>> reporting a warning if it does so,
>>
>
> Oh... which change caused that to happen? Because I don't think that
> should happen. I'd really much prefer that once the length is read and it's
> checked to be in-bounds of the underlying section, that only those bytes
> within that specified length are read. I was/am actually looking at tidying
> up the repeated complications of parsing the DWARF32/DWARF64 headers - and
> one idea I had was to have the DWARFDataExtractor return a new
> DWARFDataExtractor that was specifically bounded to the parsed length for
> this reason.
>

Actually, I had a slightly faulty memory. To a large extent, the behaviour
has been for a while to parse until the end of the header as per the
standard format. There is then a check once parsing is finished to make
sure the reached offset matches the expected end. Recently, I removed[1] a
few extra checks in the V5 file/directory table parsing that checked that
we hadn't run off the end, as the later check makes them unnecessary.
However, these were the only other checks to check against the prologue
length.

I think your suggestion to use a data extractor that only knows about the
bytes we expect is a good idea. Combined with the changes that made it
return 0 when running off the end, this will fit nicely into the rest of
the code without worrying about reading bytes in other tables. The reading
would continue as far as needed, the error check would still happen, but no
spurious bytes would be read.

[1]
https://github.com/llvm/llvm-project/commit/418cd8216b41f4c08e0e1b22feda381d9b2345da
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200129/161abda2/attachment.html>