[clang] 741a4da - [clang] Clarify SourceLocation and (Char)SourceRange docs (#177400)

via cfe-commits cfe-commits at lists.llvm.org
Fri Feb 13 04:50:26 PST 2026


Author: Tobias Ribizel
Date: 2026-02-13T07:50:21-05:00
New Revision: 741a4daa84c602d4bdd838ad469aea95b9f1d46c

URL: https://github.com/llvm/llvm-project/commit/741a4daa84c602d4bdd838ad469aea95b9f1d46c
DIFF: https://github.com/llvm/llvm-project/commit/741a4daa84c602d4bdd838ad469aea95b9f1d46c.diff

LOG: [clang] Clarify SourceLocation and (Char)SourceRange docs (#177400)

The current documentation leaves some questions unanswered to me, which
I'm trying to clarify here.
1. It was unclear how SourceLocation differed when referring to the
character level vs. the token level. Turns out there is no such
difference, and SourceLocation always refers to characters. This should
be made explicit in the docs.
2. It was unclear in which cases (Char)SourceRange is inclusive
(containing the endpoint) or exclusive (ending before the endpoint).
>From my reading of the docs and investigating the behavior of different
AST nodes' `getSourceLoc()` result and `Lexer::getSourceText()`,
SourceRange is always inclusive and CharSourceRange is inclusive only as
a TokenRange, and exclusive as a CharRange. This is also consistent
matches with the documentation of the clang::transformer::after()
function in RangeSelector.h, where the question of inclusive/exclusive
ranges came up first for me.

Added: 
    

Modified: 
    clang/include/clang/Basic/SourceLocation.h

Removed: 
    


################################################################################
diff  --git a/clang/include/clang/Basic/SourceLocation.h b/clang/include/clang/Basic/SourceLocation.h
index bd0038d5ae1ae..b73b43d953662 100644
--- a/clang/include/clang/Basic/SourceLocation.h
+++ b/clang/include/clang/Basic/SourceLocation.h
@@ -86,6 +86,10 @@ using FileIDAndOffset = std::pair<FileID, unsigned>;
 /// In addition, one bit of SourceLocation is used for quick access to the
 /// information whether the location is in a file or a macro expansion.
 ///
+/// SourceLocation operates on a byte level, i.e. offsets describe
+/// byte distances, but in most cases, they are used on a token level,
+/// where a SourceLocation points to the first byte of a lexer token.
+///
 /// It is important that this type remains small. It is currently 32 bits wide.
 class SourceLocation {
   friend class ASTReader;
@@ -212,6 +216,11 @@ inline bool operator>=(const SourceLocation &LHS, const SourceLocation &RHS) {
 }
 
 /// A trivial tuple used to represent a source range.
+///
+/// When referring to tokens, a SourceRange is an inclusive range [begin, end]
+/// that contains its endpoints, its begin SourceLocation points to the first
+/// byte of the first token and its end SourceLocation points to the first byte
+/// of the last token.
 class SourceRange {
   SourceLocation B;
   SourceLocation E;
@@ -248,13 +257,22 @@ class SourceRange {
   void dump(const SourceManager &SM) const;
 };
 
-/// Represents a character-granular source range.
+/// Represents a byte-granular source range.
 ///
-/// The underlying SourceRange can either specify the starting/ending character
+/// The underlying SourceRange can either specify the starting/ending byte
 /// of the range, or it can specify the start of the range and the start of the
 /// last token of the range (a "token range").  In the token range case, the
 /// size of the last token must be measured to determine the actual end of the
 /// range.
+///
+/// CharSourceRange is interpreted 
diff erently depending on whether it is a
+/// TokenRange or a CharRange.
+/// For a TokenRange, the range contains the endpoint, i.e. the token containing
+/// the end SourceLocation.
+/// For a CharRange, the range doesn't contain the endpoint, i.e. it ends at the
+/// byte before the end SourceLocation. This allows representing a point
+/// CharRange [begin, begin) that points at the empty range right in front of
+/// the begin SourceLocation.
 class CharSourceRange {
   SourceRange Range;
   bool IsTokenRange = false;
@@ -280,8 +298,8 @@ class CharSourceRange {
   }
 
   /// Return true if the end of this range specifies the start of
-  /// the last token.  Return false if the end of this range specifies the last
-  /// character in the range.
+  /// the last token.  Return false if the end of this range specifies the first
+  /// byte after the range.
   bool isTokenRange() const { return IsTokenRange; }
   bool isCharRange() const { return !IsTokenRange; }
 


        


More information about the cfe-commits mailing list