[clang] [HLSL][RootSignature] Implement Lexing of DescriptorTables (PR #122981)

Sun Feb 9 13:10:13 PST 2025

================
@@ -0,0 +1,171 @@
+#include "clang/Parse/ParseHLSLRootSignature.h"
+
+namespace clang {
+namespace hlsl {
+
+// Lexer Definitions
+
+static bool IsNumberChar(char C) {
+  // TODO(#120472): extend for float support exponents
+  return isdigit(C); // integer support
+}
+
+bool RootSignatureLexer::LexNumber(RootSignatureToken &Result) {
+  // NumericLiteralParser does not handle the sign so we will manually apply it
+  bool Negative = Buffer.front() == '-';
+  bool Signed = Negative || Buffer.front() == '+';
+  if (Signed)
+    AdvanceBuffer();
+
+  // Retrieve the possible number
+  StringRef NumSpelling = Buffer.take_while(IsNumberChar);
+
+  // Catch this now as the Literal Parser will accept it as valid
+  if (NumSpelling.empty()) {
+    PP.getDiagnostics().Report(Result.TokLoc,
+                               diag::err_hlsl_invalid_number_literal);
+    return true;
+  }
+
+  // Parse the numeric value and do semantic checks on its specification
+  clang::NumericLiteralParser Literal(NumSpelling, SourceLoc,
+                                      PP.getSourceManager(), PP.getLangOpts(),
+                                      PP.getTargetInfo(), PP.getDiagnostics());
+  if (Literal.hadError)
+    return true; // Error has already been reported so just return
+
+  // Note: if IsNumberChar allows for hexidecimal we will need to turn this
+  // into a diagnostics for potential fixed-point literals
+  assert(Literal.isIntegerLiteral() && "IsNumberChar will only support digits");
+
+  // Retrieve the number value to store into the token
+  Result.Kind = TokenKind::int_literal;
+
+  // NOTE: for compabibility with DXC, we will treat any integer with '+' as an
+  // unsigned integer
+  llvm::APSInt X = llvm::APSInt(32, !Negative);
+  if (Literal.GetIntegerValue(X)) {
+    // Report that the value has overflowed
+    PP.getDiagnostics().Report(Result.TokLoc,
+                               diag::err_hlsl_number_literal_overflow)
+        << (unsigned)Signed << NumSpelling;
+    return true;
+  }
+
+  X = Negative ? -X : X;
+  Result.NumLiteral = APValue(X);
+
+  AdvanceBuffer(NumSpelling.size());
+  return false;
+}
+
+bool RootSignatureLexer::Lex(SmallVector<RootSignatureToken> &Tokens) {
----------------
llvm-beanz wrote:

I'm not sure this function should exist.

Most modern compilers don't lex entire source files into allocated lists of tokens then parse, they do the two at the same time never allocating more than a couple tokens at a time. You can see this in Clang where the `ConsumeToken` API advances the token to the next token, and is called throughout the parser.

I've seen other compilers where the tokenizer is implemented as effectively an iterator where the iterator state tracks the current token.

For context, the reason compilers tend to do this is because lexing isn't always "fast". In particular translation of numeric literals into actual integer values can be quite slow, so if you lex all the tokens, you do all the transformations even if an error would be detected during parsing. If you lex one token at a time, you stop lexing once the error is encountered providing faster user feedback.

https://github.com/llvm/llvm-project/pull/122981