[clang] [Modules] No transitive source location change (PR #86912)

Jan Svoboda via cfe-commits cfe-commits at lists.llvm.org
Fri Mar 29 09:32:54 PDT 2024


jansvoboda11 wrote:

> If Clang is configured with 64 bit `SourceLocation`, we can't use 96 bits for serialization. We can at most use 64 bits for a slot. In that case, we can only assume the offset of source location **in its own module** (not the global offset!) is not large than 2^32. I feel this assumption may be valid in a lot of places.
> 
> Or otherwise we can use less bits for module file index (32 bits seems to be too much honestly), then we can use higher 16 bits to store the module file index, and leave the lower 48 bits to store the source location. In this case, the assumption becomes to "the offset of the source location may not large than 2^48". But it is slightly hard to believe we can reach such extreme cases.

Let's see if @statham-arm (who introduced the `SourceLocation::[U]IntTy` typedefs) wants to weight in here.

> I **tried** to encode this as two separate 32bit values. But it will break too many codes. Since a lot of places assume that we can encode the source location as an uint64_t.
> 
> What I mean is, with VBR6 format (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer), we can save more space for small integers in **on-disk** .pcm files (the memory representation should be the same). For example, for a 64 bits unsigned int `1`, VBR6 can use only 6 bits to store that `000001` to represent the 64 bits value `1` in the on-disk representations. So that even if I don't use more slots to store the module file index, the size of the .pcm files will increase after all.

Right. My thinking was that single 64bit value with the module file index in the upper 32 bits would basically disable VBR6 encoding for the lower 32 bits. If we split this thing into two separate 32 bit values, we are more likely to VBR6 encode both of them. But this might actually increase size for (what I assume is the most common case) local source locations. Still, I think having a rough idea of how alternative implementations compare would be great.

Do you have any data on how much recompilation this can save for real world projects?

https://github.com/llvm/llvm-project/pull/86912


More information about the cfe-commits mailing list