[PATCH] D124422: [Serialization] Improve encoding of small SourceRanges.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Apr 26 02:18:47 PDT 2022


sammccall added a comment.

In D124422#3474003 <https://reviews.llvm.org/D124422#3474003>, @ilya-biryukov wrote:

> Overall LGTM, but I didn't look closely if there are cases that break now.
> Do we have any tests to check this?

Serialization tests do exist, though I think coverage isn't amazing.
I'd have a reasonable level of confidence through:

- the fact that I made the changes systematically rather than finding them through test failures
- I'll go through and audit all the changes myself again (the failures I did have after finishing this were mostly typos)
- using it locally for a while (though this will only cover usual C++ nodes)

---

There is a limited & self-defeating nature of this patch.
To have a clear idiom, we only encode SourceRange when begin/end pair are directly stored.
Not all SourceLocations are paired, e.g. ForStmt has [KW, LParen, RParen] and we benefit from delta-encoding only once.
In the most common types of nodes, there tends to be more complexity around exactly which SourceLocations need to be stored for size reasons. (Probably size of memory representation, but this is 1:1 with serialized). So I suspect the opportunities missed are disproportionately valuable as well.

Before landing this, I'd like to try an approach of compressing all SourceLocations within a record as a sequence.
Something with a state machine like:

- at the start of each record, Current = 0
- for each Loc:
  - if Loc is far from current, encode as 33-bit integer Loc<<1 | 1
  - if Loc is close to current, encode as abs(Loc) << 2 | sign(Loc) << 1 | 0
  - if Loc is nonzero, Current = Loc

This would guarantee efficient representations of nearby and null sourcelocations, while regressing random locations by 1 bit only.

This can be done explicitly with an API like: `SrcLocSeq Locs(Stream); Locs.Add(Loc1); Locs.Add(Loc2);`, or implicitly by changing the behavior of AddSourceLocation. The latter seems easier to try, and would require more machinery but fewer local changes than this patch.

Let me try to put something together to see what the gains are...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124422/new/

https://reviews.llvm.org/D124422



More information about the cfe-commits mailing list