[clang] [Serialization] Fix source location data loss during decoding. (PR #145529)

Haojian Wu via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 25 00:19:18 PDT 2025


hokein wrote:

> The design is, the higher 32 bits are used for module file index and the lower bits are used for offsets. Could you give a concrete example why the current implementation is problematic?

I don’t have a concrete failure case, but I noticed this while working on 64-bit source locations.

Currently, we encode offsets in two ways depending on the module file index:

1. If the module file index is `0`, we use delta encoding (see [this code path](https://github.com/llvm/llvm-project/blob/01b288fe6a1e627954329198ed5641f2bf55ee8d/clang/include/clang/Serialization/SourceLocationEncoding.h#L165-L168)).
2. otherwise, we use raw encoding (see [this code path](https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Serialization/SourceLocationEncoding.h#L173-L183)).

The 2) case is fine, as the encoded value fits into a 32-bit integer. However, the 1) case can produce a 33-bit value, which doesn’t fit in the lower 32 bits. 

It appears we only use 16 bits for the module file index, the fix here is to preserve an additional bit for the offset to avoid this issue.


https://github.com/llvm/llvm-project/pull/145529


More information about the cfe-commits mailing list