[cfe-dev] [RFC] Clang SourceLocation overflow

via cfe-dev cfe-dev at lists.llvm.org
Tue Oct 8 07:09:21 PDT 2019


Richard Smith wrote:
Can you split out parts of MemMap.h into a separate header that is only included once, and keep only the parts that actually change on repeated inclusion in MemMap.h itself?

In my experience, these sorts of things are doing two kinds of dispatch: by function, and by compiler.  Reading the AUTOSAR spec, it looks like there are many different compilers supported, each of which has its own set of pragmas and so forth to use for each bit of functionality.  You could have the top-level MemMap.h conditionally #include compiler-specific header files.  Or, you could break up the file functionally, for example EEP_{START,STOP}_SEC_VAR_16BIT could be implemented in one header file, other features in other header files, and have MemMap.h conditionally #include the appropriate function-specific header.  The spec says that MemMap.h “shall provide a mechanism” to do this and that; it doesn’t appear to require that the entire mechanism for all compilers be embodied in a single header file.  Either way, breaking up the file would improve compilation time (fewer lines to process) as well as addressing the source-location size problem.
--paulr

From: cfe-dev <cfe-dev-bounces at lists.llvm.org> On Behalf Of Richard Smith via cfe-dev
Sent: Monday, October 07, 2019 3:36 PM
To: Mikhail Maltsev <Mikhail.Maltsev at arm.com>
Cc: nd <nd at arm.com>; cfe-dev at lists.llvm.org
Subject: Re: [cfe-dev] [RFC] Clang SourceLocation overflow

On Wed, 2 Oct 2019 at 09:26, Mikhail Maltsev via cfe-dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:
Hi all,

We are experiencing a problem with Clang SourceLocation overflow.
Currently source locations are 32-bit values, one bit is a flag, which gives
a source location space of 2^31 characters.

When the Clang lexer processes an #include directive it reserves the total size
of the file being included in the source location space. An overflow can occur
if a large file (which does not have include guards by design) is included many
times into a single TU.

The pattern of including a file multiple times is for example required by
the AUTOSAR standard [1], which is widely used in the automotive industry.
Specifically the pattern is described in the Specification of Memory Mapping [2]:

Section 8.2.1, MEMMAP003:
"The start and stop symbols for section control are configured with section
identifiers defined in MemMap.h [...] For instance:

#define EEP_START_SEC_VAR_16BIT
#include "MemMap.h"
static uint16 EepTimer;
static uint16 EepRemainingBytes;
#define EEP_STOP_SEC_VAR_16BIT
#include "MemMap.h""

Section 8.2.2, MEMMAP005:
"The file MemMap.h shall provide a mechanism to select different code, variable
or constant sections by checking the definition of the module specific memory
allocation key words for starting a section [...]"

In practice MemMap.h can reach several MBs and can be included several thousand
times causing an overflow in the source location space.

The problem does not occur with GCC because it tracks line numbers rather than
file offsets. Column numbers are tracked separately and are optional. I.e., in
GCC a source location can be either a (line+column) tuple packed into 32 bits or
(when the line number exceeds a certain threshold) a 32-bit line number.

We are looking for an acceptable way of resolving the problem and propose the
following approaches for discussion:
1. Use 64 bits for source location tracking.
2. Track until an overflow occurs after that make the lexer output
   the <invalid location> special value for all subsequent tokens.
3. Implement an approach similar to the one used by GCC and start tracking line
   numbers instead of file offsets after a certain threshold. Resort to (2)
   when even line numbers overflow.
4. (?) Detect the multiple inclusion pattern and track it differently (for now
   we don't have specific ideas on how to implement this)

Is any of these approaches viable? What caveats should we expect? (we already
know about static_asserts guarding the sizes of certain class fields which start
failing in the first approach).

Other suggestions are welcome.

I don't think any of the above approaches are reasonable; they would all require fundamental restructuring of major parts of Clang, an efficiency or memory size hit for all other users of Clang, or some combination of those.

Your code pattern seems unreasonable; including a multi-megabyte file thousands of times is not a good idea. Can you split out parts of MemMap.h into a separate header that is only included once, and keep only the parts that actually change on repeated inclusion in MemMap.h itself?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20191008/b9de5071/attachment.html>


More information about the cfe-dev mailing list