[cfe-dev] [RFC] Clang SourceLocation overflow

Matt Asplund via cfe-dev cfe-dev at lists.llvm.org
Thu Oct 3 09:22:56 PDT 2019


I don't want to distract from your question, but wanted to add that I have
been seeing source location overflow issues for many months when using
clangs implementation of c++20 modules. I have a personal branch where I
have made a partial conversion over to 64 bit source locations for test
purposes.

-Matt

On Wed, Oct 2, 2019, 9:26 AM Mikhail Maltsev via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hi all,
>
> We are experiencing a problem with Clang SourceLocation overflow.
> Currently source locations are 32-bit values, one bit is a flag, which
> gives
> a source location space of 2^31 characters.
>
> When the Clang lexer processes an #include directive it reserves the total
> size
> of the file being included in the source location space. An overflow can
> occur
> if a large file (which does not have include guards by design) is included
> many
> times into a single TU.
>
> The pattern of including a file multiple times is for example required by
> the AUTOSAR standard [1], which is widely used in the automotive industry.
> Specifically the pattern is described in the Specification of Memory
> Mapping [2]:
>
> Section 8.2.1, MEMMAP003:
> "The start and stop symbols for section control are configured with section
> identifiers defined in MemMap.h [...] For instance:
>
> #define EEP_START_SEC_VAR_16BIT
> #include "MemMap.h"
> static uint16 EepTimer;
> static uint16 EepRemainingBytes;
> #define EEP_STOP_SEC_VAR_16BIT
> #include "MemMap.h""
>
> Section 8.2.2, MEMMAP005:
> "The file MemMap.h shall provide a mechanism to select different code,
> variable
> or constant sections by checking the definition of the module specific
> memory
> allocation key words for starting a section [...]"
>
> In practice MemMap.h can reach several MBs and can be included several
> thousand
> times causing an overflow in the source location space.
>
> The problem does not occur with GCC because it tracks line numbers rather
> than
> file offsets. Column numbers are tracked separately and are optional.
> I.e., in
> GCC a source location can be either a (line+column) tuple packed into 32
> bits or
> (when the line number exceeds a certain threshold) a 32-bit line number.
>
> We are looking for an acceptable way of resolving the problem and propose
> the
> following approaches for discussion:
> 1. Use 64 bits for source location tracking.
> 2. Track until an overflow occurs after that make the lexer output
>    the <invalid location> special value for all subsequent tokens.
> 3. Implement an approach similar to the one used by GCC and start tracking
> line
>    numbers instead of file offsets after a certain threshold. Resort to (2)
>    when even line numbers overflow.
> 4. (?) Detect the multiple inclusion pattern and track it differently (for
> now
>    we don't have specific ideas on how to implement this)
>
> Is any of these approaches viable? What caveats should we expect? (we
> already
> know about static_asserts guarding the sizes of certain class fields which
> start
> failing in the first approach).
>
> Other suggestions are welcome.
>
> [1]. https://www.autosar.org
> [2].
>
> https://www.autosar.org/fileadmin/user_upload/standards/classic/3-0/AUTOSAR_SWS_MemoryMapping.pdf
>
> --
> Regards,
>   Mikhail Maltsev
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20191003/dbda0206/attachment.html>


More information about the cfe-dev mailing list