[cfe-dev] [RFC] Clang SourceLocation overflow

Matt Asplund via cfe-dev cfe-dev at lists.llvm.org
Thu Oct 10 06:08:50 PDT 2019


Sure, I can share my changes. In general my changes are very localized to
the code paths I was hitting oveflow issues and tried to keep as much as
possible using 32bit encodings. Is there an ETA for when you will start
investigating, I am out of town for the next week but if needed can get
access to my desktop.

-Matt

On Thu, Oct 10, 2019, 2:13 AM Christof Douma <Christof.Douma at arm.com> wrote:

> Hi Richard.
>
>
>
> Thanks for the clarification, I certainly had not realized that location
> tracking was needed for correctness of clang. Decoupling this sounds like a
> painful processes, and the only thing we get is errors without any location
> information. Sounds like not a great trade-off. We’ll go experiment a bit
> with the size impact of using 64-bits and will attempt to take the route of
> a cmake configuration option.
>
>
>
> If there are people that have already done some experiments on the size
> impact of 64-bits location pointers, we’re welcome your insights. @Matt
> Asplund <mwasplund at gmail.com> I believe you said that you’ve done some
> prototyping, is there something you can share?
>
>
>
> Thanks,
>
> Christof
>
>
>
> *From:* Richard Smith <richard at metafoo.co.uk>
> *Sent:* 08 October 2019 19:53
> *To:* Christof Douma <Christof.Douma at arm.com>
> *Cc:* Mikhail Maltsev <Mikhail.Maltsev at arm.com>; Clang Dev <
> cfe-dev at lists.llvm.org>; nd <nd at arm.com>
> *Subject:* Re: [cfe-dev] [RFC] Clang SourceLocation overflow
>
>
>
> On Tue, 8 Oct 2019, 10:42 Christof Douma via cfe-dev, <
> cfe-dev at lists.llvm.org> wrote:
>
> Hi Richard, Paul and other.
>
>
>
> Thanks for the input so far. I wanted to point out that it’s not our
> code-base. Rather, we’re seeing more use of the LLVM technology in the
> automotive market and as usual we’re faced with existing code bases that
> are tried and tested with other toolchains (gcc or others) and when LLVM
> comes along things don’t always work directly.
>
>
>
> We’ve suggested better ways of structuring their code and your suggestions
> are certainly good input. However, legacy code is especially sticky in any
> market that has to handle ‘safety’ concerns, like automotive, aerospace and
> medical markets. Code changes are pretty expensive in those fields. So
> while I hope that over time we see more sensible coding structures, I don’t
> expect that to happen any time soon. In the mean time, we’re searching for
> a solution for this coding pattern that doesn’t play well with clang.
>
>
>
> Hope that gave some more background of where this question comes from.
>
>
>
> Do all options that were suggested by Mikhail really require fundamental
> restructuring of major parts of clang? This surprised me, I had expected
> that the option 2 to be possible without a complete overhaul. (2 is “Track
> until an overflow occurs after that make the lexer output the <invalid
> location> special value for all subsequent tokens.”)
>
> Clang uses source locations as part of the semantic representation of the
> AST in some cases (simple example: some forms of initialization might use
> parens or not, with different semantics, and we distinguish between them
> based on whether we have paren locations; there are also some checks that
> look at which of two source locations came first when determining which
> warnings or errors to produce, and so on). Maybe we could go through and
> fix all of those, but that's still a very large task and it'd be hard to
> check we got them all.
>
> Not nice user experience but maybe doable? I was hoping there was
> something slightly better that still works without a major restructuring
> (maybe something that at least gives a rough location or something that
> only gives the location of the error and not the include stack under an
> option or using some kind of heuristic to detect that things go haywire).
>
>
>
> As an alternative, I was curious if it would be possible and acceptable to
> make the switch between 32-bit and 64-bit location tracking a
> build-time/cmake decision? I’ve not done any estimation on the memory size
> growth, so maybe this is a dead end.
>
> If someone is prepared to do the work to add and maintain this build mode,
> I think this might be a feasible option.
>
> Thanks,
>
> Christof
>
>
>
> *From:* cfe-dev <cfe-dev-bounces at lists.llvm.org> *On Behalf Of *Richard
> Smith via cfe-dev
> *Sent:* 07 October 2019 20:36
> *To:* Mikhail Maltsev <Mikhail.Maltsev at arm.com>
> *Cc:* nd <nd at arm.com>; cfe-dev at lists.llvm.org
> *Subject:* Re: [cfe-dev] [RFC] Clang SourceLocation overflow
>
>
>
> On Wed, 2 Oct 2019 at 09:26, Mikhail Maltsev via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> Hi all,
>
> We are experiencing a problem with Clang SourceLocation overflow.
> Currently source locations are 32-bit values, one bit is a flag, which
> gives
> a source location space of 2^31 characters.
>
> When the Clang lexer processes an #include directive it reserves the total
> size
> of the file being included in the source location space. An overflow can
> occur
> if a large file (which does not have include guards by design) is included
> many
> times into a single TU.
>
> The pattern of including a file multiple times is for example required by
> the AUTOSAR standard [1], which is widely used in the automotive industry.
> Specifically the pattern is described in the Specification of Memory
> Mapping [2]:
>
> Section 8.2.1, MEMMAP003:
> "The start and stop symbols for section control are configured with section
> identifiers defined in MemMap.h [...] For instance:
>
> #define EEP_START_SEC_VAR_16BIT
> #include "MemMap.h"
> static uint16 EepTimer;
> static uint16 EepRemainingBytes;
> #define EEP_STOP_SEC_VAR_16BIT
> #include "MemMap.h""
>
> Section 8.2.2, MEMMAP005:
> "The file MemMap.h shall provide a mechanism to select different code,
> variable
> or constant sections by checking the definition of the module specific
> memory
> allocation key words for starting a section [...]"
>
> In practice MemMap.h can reach several MBs and can be included several
> thousand
> times causing an overflow in the source location space.
>
> The problem does not occur with GCC because it tracks line numbers rather
> than
> file offsets. Column numbers are tracked separately and are optional.
> I.e., in
> GCC a source location can be either a (line+column) tuple packed into 32
> bits or
> (when the line number exceeds a certain threshold) a 32-bit line number.
>
> We are looking for an acceptable way of resolving the problem and propose
> the
> following approaches for discussion:
> 1. Use 64 bits for source location tracking.
> 2. Track until an overflow occurs after that make the lexer output
>    the <invalid location> special value for all subsequent tokens.
> 3. Implement an approach similar to the one used by GCC and start tracking
> line
>    numbers instead of file offsets after a certain threshold. Resort to (2)
>    when even line numbers overflow.
> 4. (?) Detect the multiple inclusion pattern and track it differently (for
> now
>    we don't have specific ideas on how to implement this)
>
> Is any of these approaches viable? What caveats should we expect? (we
> already
> know about static_asserts guarding the sizes of certain class fields which
> start
> failing in the first approach).
>
> Other suggestions are welcome.
>
>
>
> I don't think any of the above approaches are reasonable; they would all
> require fundamental restructuring of major parts of Clang, an efficiency or
> memory size hit for all other users of Clang, or some combination of those.
>
>
>
> Your code pattern seems unreasonable; including a multi-megabyte file
> thousands of times is not a good idea. Can you split out parts of MemMap.h
> into a separate header that is only included once, and keep only the parts
> that actually change on repeated inclusion in MemMap.h itself?
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20191010/4993fa51/attachment.html>


More information about the cfe-dev mailing list