[clang] [Modules] No transitive source location change (PR #86912)

Chuanqi Xu via cfe-commits cfe-commits at lists.llvm.org
Thu Mar 28 20:02:57 PDT 2024


ChuanqiXu9 wrote:

> > Yes, I explained this in Some low level details section. The size of source location won't be affected. Since the size of source location is unsigned (practically, it is 32 bits in most platforms). And we use uint64_t as a unit in the serializer. So there are 32 bit not used completely. The plan is to store the module file index in the higher 32 bits and it shouldn't be a safe problem. Maybe the original wording is not so clear. I've updated it.
> 
> Thank you, using 64 bits in the serialization format makes sense! This also means that whenever Clang is configured with 64 bit `SourceLocation`, we should be using 96 bits for serialization: 32 bits for the module file index and 64 bits for the offset itself, correct?

If Clang is configured with 64 bit `SourceLocation`, we can't use 96 bits for serialization. We can at most use 64 bits for a slot. In that case, we can only assume the offset of source location **in its own module** (not the global offset!) is not large than 2^32. I hope this may not be true.

> 
> > The only trade-off I saw about this change is that it may increase the size of **on-disk** .pcm files due to we use VBR6 format to decrease the size of small numbers. But on the one side, we still need to pay for more spaces if we want to use `{local-module-index, offset-within-module} pair` (Thanks for the good name suggestion). On the other hand, from the experiment, it shows the overhead is acceptable.
> 
> Sorry, I don't quite understand. Are you saying you did or did not try to encode this as two separate 32bit values?

I **tried** to encode this as two separate 32bit values. But it will break too many codes. Since a lot of places assume that we can encode the source location as an uint64_t.

What I mean is, with VBR6 format (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer),  we can save more space for small integers in **on-disk** .pcm files (the memory representation should be the same). For example, for a 64 bits unsigned int `1`, VBR6 can use only 6 bits to store that `000001` to represent the 64 bits value `1` in the on-disk representations. So that even if I don't use more slots to store the module file index, the size of the .pcm files will increase after all.

https://github.com/llvm/llvm-project/pull/86912


More information about the cfe-commits mailing list