[Lldb-commits] [lldb] [LLDB] Add Lexer (with tests) for DIL (Data Inspection Language). (PR #123521)

Fri Jan 31 09:11:14 PST 2025

jimingham wrote:


> On Jan 31, 2025, at 2:21 AM, Pavel Labath ***@***.***> wrote:
> 
> 
> @labath commented on this pull request.
> 
> Apart from the (mainly stylistic) inline comments, the biggest problem I see is that the definition of an identifier is still too narrow. The restriction on the dollar sign is completely unnecessary as C will let you put that anywhere <https://godbolt.org/z/o7qbfeWve>. And it doesn't allow any non-ascii characters.
> 
> I really think this should be based on an deny- rather than an allow-list. Any character we don't claim for ourselves should be fair game for an identifier. If someone manages to enter the backspace character (\x7f) into the expression, then so be it.
> 
> The question of "identifiers" starting with digits is interesting. Personally, I think it'd be fine to reject those (and require the currenly-vapourware quoting syntax), because I suspect you want to accept number suffixes, and I think it'd be confusing to explain why 123x is a valid identifier but 123u is not, but I suspect some might have a different opinion.
> 
Are there any languages we know of that allow digits as the initial identifier character?  Seems like that just makes parsing the language hard for no very clear benefit, so I'd be surprised if there are many that allow that.  If there aren't any that we're likely to have to support, I wouldn't object to not treating tokens beginning with a number as an identifier.

> We could continue discussing that here, or we could accept everything here, and postpone this discussion for the patch which starts parsing numbers. Up to you..
> 
> In lldb/unittests/ValueObject/DILLexerTests.cpp <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936864552>:
> 
> > +TEST(DILLexerTests, SimpleTest) {
> +  StringRef input_expr("simple_var");
> +  uint32_t tok_len = 10;
> +  llvm::Expected<lldb_private::dil::DILLexer> maybe_lexer =
> +      lldb_private::dil::DILLexer::Create(input_expr);
> +  ASSERT_THAT_EXPECTED(maybe_lexer, llvm::Succeeded());
> +  lldb_private::dil::DILLexer lexer(*maybe_lexer);
> +  lldb_private::dil::Token token =
> +      lldb_private::dil::Token(lldb_private::dil::Token::unknown, "", 0);
> +  EXPECT_EQ(token.GetKind(), lldb_private::dil::Token::unknown);
> +
> +  lexer.Advance();
> +  token = lexer.GetCurrentToken();
> +  EXPECT_EQ(token.GetKind(), lldb_private::dil::Token::identifier);
> +  EXPECT_EQ(token.GetSpelling(), "simple_var");
> +  EXPECT_EQ(token.GetSpelling().size(), tok_len);
> Now the size check is unnecessary, as this is just testing the StringRef implementation
> 
> In lldb/unittests/ValueObject/DILLexerTests.cpp <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936871876>:
> 
> > +  StringRef input_expr("namespace");
> +  llvm::Expected<lldb_private::dil::DILLexer> maybe_lexer =
> +      lldb_private::dil::DILLexer::Create(input_expr);
> +  ASSERT_THAT_EXPECTED(maybe_lexer, llvm::Succeeded());
> +  lldb_private::dil::DILLexer lexer(*maybe_lexer);
> +  lldb_private::dil::Token token =
> +      lldb_private::dil::Token(lldb_private::dil::Token::unknown, "", 0);
> +  lexer.Advance();
> +  token = lexer.GetCurrentToken();
> ⬇️ Suggested change
> -  StringRef input_expr("namespace");
> -  llvm::Expected<lldb_private::dil::DILLexer> maybe_lexer =
> -      lldb_private::dil::DILLexer::Create(input_expr);
> -  ASSERT_THAT_EXPECTED(maybe_lexer, llvm::Succeeded());
> -  lldb_private::dil::DILLexer lexer(*maybe_lexer);
> -  lldb_private::dil::Token token =
> -      lldb_private::dil::Token(lldb_private::dil::Token::unknown, "", 0);
> -  lexer.Advance();
> -  token = lexer.GetCurrentToken();
> +  lldb_private::dil::Token token =
> +      lldb_private::dil::Token(lldb_private::dil::Token::identifier, "ident", 0);
> (The reason is that this looks like a test for the token kind testing functions. You don't actually need the token to come out of the lexer for that. And we already have tests that check that producing a sequence of characters produces an "identifier" token.)
> 
> In lldb/unittests/ValueObject/DILLexerTests.cpp <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936872678>:
> 
> > +  token = lexer.GetCurrentToken();
> +
> +  // Current token is '('; check the next 4 tokens, to make
> +  // sure they are the identifier 'anonymous', the identifier 'namespace'
> +  // ')' and '::', in that order.
> +  EXPECT_EQ(token.GetKind(), lldb_private::dil::Token::l_paren);
> +  EXPECT_EQ(lexer.LookAhead(1).GetKind(), lldb_private::dil::Token::identifier);
> +  EXPECT_EQ(lexer.LookAhead(1).GetSpelling(), "anonymous");
> +  EXPECT_EQ(lexer.LookAhead(2).GetKind(), lldb_private::dil::Token::identifier);
> +  EXPECT_EQ(lexer.LookAhead(2).GetSpelling(), "namespace");
> +  EXPECT_EQ(lexer.LookAhead(3).GetKind(), lldb_private::dil::Token::r_paren);
> +  EXPECT_EQ(lexer.LookAhead(4).GetKind(), lldb_private::dil::Token::coloncolon);
> +
> +  // Our current index should still be 0, as we only looked ahead; we are still
> +  // officially on the '('.
> +  EXPECT_EQ(lexer.GetCurrentTokenIdx(), (uint32_t)0);
> ⬇️ Suggested change
> -  EXPECT_EQ(lexer.GetCurrentTokenIdx(), (uint32_t)0);
> +  EXPECT_EQ(lexer.GetCurrentTokenIdx(), 0u);
> (and elsewhere)
> 
> In lldb/unittests/ValueObject/DILLexerTests.cpp <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936878882>:
> 
> > +
> +  // Accept the 'lookahead', so our current token is '::', which has the index
> +  // 4 in our vector of tokens (which starts at zero).
> +  lexer.Advance(4);
> +  token = lexer.GetCurrentToken();
> +  EXPECT_EQ(token.GetKind(), lldb_private::dil::Token::coloncolon);
> +  EXPECT_EQ(lexer.GetCurrentTokenIdx(), (uint32_t)4);
> +
> +  lexer.Advance();
> +  token = lexer.GetCurrentToken();
> +  EXPECT_EQ(token.GetKind(), lldb_private::dil::Token::identifier);
> +  EXPECT_EQ(token.GetSpelling(), "some_var");
> +  EXPECT_EQ(lexer.GetCurrentTokenIdx(), (uint32_t)5);
> +  // Verify we've advanced our position counter (lexing location) in the
> +  // input 23 characters (the length of '(anonymous namespace)::'.
> +  EXPECT_EQ(token.GetLocation(), expect_loc);
> you might just say strlen("(anonymous namespace)::") and skip the constant and the comment (this is a test, so it doesn't have to be super-efficient)
> 
> In lldb/unittests/ValueObject/DILLexerTests.cpp <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936887852>:
> 
> > +  StringRef input_expr("This string has (several ) ::identifiers");
> +  llvm::Expected<lldb_private::dil::DILLexer> maybe_lexer =
> +      lldb_private::dil::DILLexer::Create(input_expr);
> +  ASSERT_THAT_EXPECTED(maybe_lexer, llvm::Succeeded());
> +  lldb_private::dil::DILLexer lexer(*maybe_lexer);
> +  lldb_private::dil::Token token =
> +      lldb_private::dil::Token(lldb_private::dil::Token::unknown, "", 0);
> +
> +  std::vector<std::pair<lldb_private::dil::Token::Kind, std::string>>
> +      lexer_tokens_data = ExtractTokenData(lexer);
> +
> +  EXPECT_THAT(
> +      lexer_tokens_data,
> +      testing::ElementsAre(
> +          testing::Pair(lldb_private::dil::Token::identifier, "This"),
> +          testing::Pair(lldb_private::dil::Token::identifier, "string"),
> +          testing::Pair(lldb_private::dil::Token::identifier, "has"),
> +          testing::Pair(lldb_private::dil::Token::l_paren, "("),
> +          testing::Pair(lldb_private::dil::Token::identifier, "several"),
> +          testing::Pair(lldb_private::dil::Token::r_paren, ")"),
> +          testing::Pair(lldb_private::dil::Token::coloncolon, "::"),
> +          testing::Pair(lldb_private::dil::Token::identifier, "identifiers"),
> +          testing::Pair(lldb_private::dil::Token::eof, "")));
> +  lexer.Advance();
> +  token = lexer.GetCurrentToken();
> +  EXPECT_EQ(token.GetSpelling(), "");
> +  EXPECT_EQ(token.GetKind(), lldb_private::dil::Token::eof);
> ⬇️ Suggested change
> -  StringRef input_expr("This string has (several ) ::identifiers");
> -  llvm::Expected<lldb_private::dil::DILLexer> maybe_lexer =
> -      lldb_private::dil::DILLexer::Create(input_expr);
> -  ASSERT_THAT_EXPECTED(maybe_lexer, llvm::Succeeded());
> -  lldb_private::dil::DILLexer lexer(*maybe_lexer);
> -  lldb_private::dil::Token token =
> -      lldb_private::dil::Token(lldb_private::dil::Token::unknown, "", 0);
> -
> -  std::vector<std::pair<lldb_private::dil::Token::Kind, std::string>>
> -      lexer_tokens_data = ExtractTokenData(lexer);
> -
> -  EXPECT_THAT(
> -      lexer_tokens_data,
> -      testing::ElementsAre(
> -          testing::Pair(lldb_private::dil::Token::identifier, "This"),
> -          testing::Pair(lldb_private::dil::Token::identifier, "string"),
> -          testing::Pair(lldb_private::dil::Token::identifier, "has"),
> -          testing::Pair(lldb_private::dil::Token::l_paren, "("),
> -          testing::Pair(lldb_private::dil::Token::identifier, "several"),
> -          testing::Pair(lldb_private::dil::Token::r_paren, ")"),
> -          testing::Pair(lldb_private::dil::Token::coloncolon, "::"),
> -          testing::Pair(lldb_private::dil::Token::identifier, "identifiers"),
> -          testing::Pair(lldb_private::dil::Token::eof, "")));
> -  lexer.Advance();
> -  token = lexer.GetCurrentToken();
> -  EXPECT_EQ(token.GetSpelling(), "");
> -  EXPECT_EQ(token.GetKind(), lldb_private::dil::Token::eof);
> +  EXPECT_THAT_EXPECTED(
> +      ExtractTokenData("This string has (several ) ::identifiers"),
> +      llvm::HasValue(testing::ElementsAre(
> +          testing::Pair(lldb_private::dil::Token::identifier, "This"),
> +          testing::Pair(lldb_private::dil::Token::identifier, "string"),
> +          testing::Pair(lldb_private::dil::Token::identifier, "has"),
> +          testing::Pair(lldb_private::dil::Token::l_paren, "("),
> +          testing::Pair(lldb_private::dil::Token::identifier, "several"),
> +          testing::Pair(lldb_private::dil::Token::r_paren, ")"),
> +          testing::Pair(lldb_private::dil::Token::coloncolon, "::"),
> +          testing::Pair(lldb_private::dil::Token::identifier, "identifiers"),
> +          testing::Pair(lldb_private::dil::Token::eof, ""))));
> (and have ExtractTokenData construct the lexer internally and return llvm::Expected<std::vector<...>>)
> 
> I think this will reduce the boiler plate in all subsequent tests.
> 
> In lldb/unittests/ValueObject/DILLexerTests.cpp <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936890692>:
> 
> > +    llvm::Expected<lldb_private::dil::DILLexer> maybe_lexer =
> +        lldb_private::dil::DILLexer::Create(str);
> +    ASSERT_THAT_EXPECTED(maybe_lexer, llvm::Succeeded());
> +    lldb_private::dil::DILLexer lexer(*maybe_lexer);
> +    lldb_private::dil::Token token =
> +        lldb_private::dil::Token(lldb_private::dil::Token::unknown, "", 0);
> +    lexer.Advance();
> +    token = lexer.GetCurrentToken();
> +    EXPECT_EQ(token.GetKind(), lldb_private::dil::Token::identifier);
> For example, this could boil down to this
> 
> ⬇️ Suggested change
> -    llvm::Expected<lldb_private::dil::DILLexer> maybe_lexer =
> -        lldb_private::dil::DILLexer::Create(str);
> -    ASSERT_THAT_EXPECTED(maybe_lexer, llvm::Succeeded());
> -    lldb_private::dil::DILLexer lexer(*maybe_lexer);
> -    lldb_private::dil::Token token =
> -        lldb_private::dil::Token(lldb_private::dil::Token::unknown, "", 0);
> -    lexer.Advance();
> -    token = lexer.GetCurrentToken();
> -    EXPECT_EQ(token.GetKind(), lldb_private::dil::Token::identifier);
> +    EXPECT_THAT_EXPECTED(ExtractTokenData(str), llvm::HasValue(testing::ElementsAre(testing::Pair(identifier, str))));
> In lldb/unittests/ValueObject/DILLexerTests.cpp <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936905720>:
> 
> > +        lldb_private::dil::DILLexer::Create(str);
> +    ASSERT_THAT_EXPECTED(maybe_lexer, llvm::Succeeded());
> +    lldb_private::dil::DILLexer lexer(*maybe_lexer);
> +    lldb_private::dil::Token token =
> +        lldb_private::dil::Token(lldb_private::dil::Token::unknown, "", 0);
> +    lexer.Advance();
> +    token = lexer.GetCurrentToken();
> +    EXPECT_EQ(token.GetKind(), lldb_private::dil::Token::identifier);
> +  }
> +
> +  // Verify that none of the invalid identifiers come out as identifier tokens.
> +  for (auto &str : invalid_identifiers) {
> +    SCOPED_TRACE(str);
> +    llvm::Expected<lldb_private::dil::DILLexer> maybe_lexer =
> +        lldb_private::dil::DILLexer::Create(str);
> +    if (!maybe_lexer) {
> Control flow in a test is usually a bad sign (among other reasons, because we don't write tests for tests). In gtest matcher language, I think what you're trying to express is EXPECT_THAT_EXPECTED(ExtractTokenData(str), testing::Not(llvm::HasValue(testing::Contains(testing::Pair(Token::identifier, testing::_)))) (i.e., parsing does not succeed and return a token of the identifier kind), though it may be better to be even more specific: instead of saying what you do not want to happen, say what you actually expect to be the result of parsing of e.g. "123". (It's true that this will result will change once you start parsing numbers, but I think churn like that is fine for a unit test and serves as evidence of the functional changes you are making)
> 
> In lldb/unittests/ValueObject/DILLexerTests.cpp <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936928055>:
> 
> > +//
> +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
> +// See https://llvm.org/LICENSE.txt for license information.
> +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#include "lldb/ValueObject/DILLexer.h"
> +#include "llvm/ADT/StringRef.h"
> +#include "llvm/Testing/Support/Error.h"
> +#include "gtest/gtest.h"
> +#include <string>
> +
> +using llvm::StringRef;
> +
> +std::vector<std::pair<lldb_private::dil::Token::Kind, std::string>>
> I'd suggest adding using namespace lldb_private::dil to avoid all of this repetition.
> 
> In lldb/unittests/ValueObject/DILLexerTests.cpp <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936937875>:
> 
> > +          testing::Pair(lldb_private::dil::Token::r_paren, ")"),
> +          testing::Pair(lldb_private::dil::Token::coloncolon, "::"),
> +          testing::Pair(lldb_private::dil::Token::identifier, "identifiers"),
> +          testing::Pair(lldb_private::dil::Token::eof, "")));
> +  lexer.Advance();
> +  token = lexer.GetCurrentToken();
> +  EXPECT_EQ(token.GetSpelling(), "");
> +  EXPECT_EQ(token.GetKind(), lldb_private::dil::Token::eof);
> +}
> +
> +TEST(DILLexerTests, IdentifiersTest) {
> +  std::vector<std::string> valid_identifiers = {
> +      "$My_name1", "$pc",  "abcd", "ab cd", "_",      "_a",       "_a_",
> +      "a_b",       "this", "self", "a",     "MyName", "namespace"};
> +  std::vector<std::string> invalid_identifiers = {"234", "2a",      "2",
> +                                                  "$",   "1MyName", ""};
> I think a lone dollar sign should be a valid identifier, at least until we have a reason for it not to be. The cases with numbers are more interesting, and I'm going to add them to our top-level discussion.
> 
> In lldb/include/lldb/ValueObject/DILLexer.h <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936938801>:
> 
> > +#include <string>
> +#include <vector>
> +
> +namespace lldb_private::dil {
> +
> +/// Class defining the tokens generated by the DIL lexer and used by the
> +/// DIL parser.
> +class Token {
> +public:
> +  enum Kind {
> +    coloncolon,
> +    eof,
> +    identifier,
> +    l_paren,
> +    r_paren,
> +    unknown,
> It looks like the only remaining use of the "unknown" token is to construct "empty" tokens in the tests. I think all of those cases could be avoided by just declaring the Token variable later in the code.
> 
> In lldb/include/lldb/ValueObject/DILLexer.h <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936939132>:
> 
> > +
> +/// Class defining the tokens generated by the DIL lexer and used by the
> +/// DIL parser.
> +class Token {
> +public:
> +  enum Kind {
> +    coloncolon,
> +    eof,
> +    identifier,
> +    l_paren,
> +    r_paren,
> +    unknown,
> +  };
> +
> +  Token(Kind kind, std::string spelling, uint32_t start)
> +      : m_kind(kind), m_spelling(spelling), m_start_pos(start) {}
> ⬇️ Suggested change
> -      : m_kind(kind), m_spelling(spelling), m_start_pos(start) {}
> +      : m_kind(kind), m_spelling(std::move(spelling)), m_start_pos(start) {}
> In lldb/include/lldb/ValueObject/DILLexer.h <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936941514>:
> 
> > +};
> +
> +/// Class for doing the simple lexing required by DIL.
> +class DILLexer {
> +public:
> +  /// Lexes all the tokens in expr and calls the private constructor
> +  /// with the lexed tokens.
> +  static llvm::Expected<DILLexer> Create(llvm::StringRef expr);
> +
> +  /// Return the current token to be handled by the DIL parser.
> +  const Token &GetCurrentToken() { return m_lexed_tokens[m_tokens_idx]; }
> +
> +  /// Advance the current token position by N.
> +  void Advance(uint32_t N = 1) {
> +    // UINT_MAX means uninitialized, no "current" position, so move to start.
> +    if (m_tokens_idx == UINT_MAX)
> Could we just have the lexer start at token zero straight away?
> 
> In lldb/include/lldb/ValueObject/DILLexer.h <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936946298>:
> 
> > +    if (m_tokens_idx == UINT_MAX)
> +      m_tokens_idx = 0;
> +    else if (m_tokens_idx + N >= m_lexed_tokens.size())
> +      // N is too large; advance to the end of the lexed tokens.
> +      m_tokens_idx = m_lexed_tokens.size() - 1;
> +    else
> +      m_tokens_idx += N;
> +  }
> +
> +  /// Return the lexed token N positions ahead of the 'current' token
> +  /// being handled by the DIL parser.
> +  const Token &LookAhead(uint32_t N) {
> +    if (m_tokens_idx + N < m_lexed_tokens.size())
> +      return m_lexed_tokens[m_tokens_idx + N];
> +
> +    return m_eof_token;
> A valid token stream should always end in and EOF token, right? If, so you can just return the last element.
> 
> In lldb/include/lldb/ValueObject/DILLexer.h <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936947851>:
> 
> > +  /// Return the index for the 'current' token being handled by the DIL parser.
> +  uint32_t GetCurrentTokenIdx() { return m_tokens_idx; }
> +
> +  /// Set the index for the 'current' token (to be handled by the parser)
> +  /// to a particular position. Used for either committing 'look ahead' parsing
> +  /// or rolling back tentative parsing.
> +  void ResetTokenIdx(uint32_t new_value) {
> +    assert(new_value == UINT_MAX || new_value < m_lexed_tokens.size());
> +    m_tokens_idx = new_value;
> +  }
> +
> +  uint32_t NumLexedTokens() { return m_lexed_tokens.size(); }
> +
> +private:
> +  DILLexer(llvm::StringRef dil_expr, std::vector<Token> lexed_tokens)
> +      : m_expr(dil_expr), m_lexed_tokens(lexed_tokens), m_tokens_idx(UINT_MAX),
> ⬇️ Suggested change
> -      : m_expr(dil_expr), m_lexed_tokens(lexed_tokens), m_tokens_idx(UINT_MAX),
> +      : m_expr(dil_expr), m_lexed_tokens(std::move(lexed_tokens)), m_tokens_idx(UINT_MAX),
> In lldb/source/ValueObject/DILLexer.cpp <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936969437>:
> 
> > +  llvm::StringRef::iterator cur_pos = expr.end() - remainder.size();
> +  llvm::StringRef::iterator start = cur_pos;
> +  bool dollar_start = false;
> +
> +  // Must not start with a digit.
> +  if (cur_pos == expr.end() || IsDigit(*cur_pos))
> +    return std::nullopt;
> +
> +  // First character *may* be a '$', for a register name or convenience
> +  // variable.
> +  if (*cur_pos == '$') {
> +    dollar_start = true;
> +    ++cur_pos;
> +  }
> +
> +  // Contains only letters, digits or underscores
> +  for (; cur_pos != expr.end(); ++cur_pos) {
> +    char c = *cur_pos;
> +    if (!IsLetter(c) && !IsDigit(c) && c != '_')
> +      break;
> +  }
> +
> +  // If first char is '$', make sure there's at least one mare char, or it's
> +  // invalid.
> +  if (dollar_start && (cur_pos - start <= 1)) {
> +    cur_pos = start;
> +    return std::nullopt;
> +  }
> +
> +  if (cur_pos == start)
> +    return std::nullopt;
> +
> +  llvm::StringRef word = expr.substr(start - expr.begin(), cur_pos - start);
> +  if (remainder.consume_front(word))
> +    return word;
> +
> +  return std::nullopt;
> This should be equivalent to your code, but I'm only suggesting this to show how code like this can be written in a shorter and more stringref-y fashion. I think the actual algorithm will have to change, as it has a very narrow definition of identifiers.
> 
> ⬇️ Suggested change
> -  llvm::StringRef::iterator cur_pos = expr.end() - remainder.size();
> -  llvm::StringRef::iterator start = cur_pos;
> -  bool dollar_start = false;
> -
> -  // Must not start with a digit.
> -  if (cur_pos == expr.end() || IsDigit(*cur_pos))
> -    return std::nullopt;
> -
> -  // First character *may* be a '$', for a register name or convenience
> -  // variable.
> -  if (*cur_pos == '$') {
> -    dollar_start = true;
> -    ++cur_pos;
> -  }
> -
> -  // Contains only letters, digits or underscores
> -  for (; cur_pos != expr.end(); ++cur_pos) {
> -    char c = *cur_pos;
> -    if (!IsLetter(c) && !IsDigit(c) && c != '_')
> -      break;
> -  }
> -
> -  // If first char is '$', make sure there's at least one mare char, or it's
> -  // invalid.
> -  if (dollar_start && (cur_pos - start <= 1)) {
> -    cur_pos = start;
> -    return std::nullopt;
> -  }
> -
> -  if (cur_pos == start)
> -    return std::nullopt;
> -
> -  llvm::StringRef word = expr.substr(start - expr.begin(), cur_pos - start);
> -  if (remainder.consume_front(word))
> -    return word;
> -
> -  return std::nullopt;
> +  const char *start = remainder.data();
> +  remainder.consume_front("$"); // initial '$' is valid
> +  remainder = remainder.drop_while([](char c){  return IsDigit(c) || IsLetter(c) || c=='_'; });
> +  llvm::StringRef candidate(start, remainder.data()-start);
> +  if (candidate.empty() || candidate == "$" || IsDigit(candidate[0]))
> +    return std::nullopt;
> +  return candidate;
> In lldb/source/ValueObject/DILLexer.cpp <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936971542>:
> 
> > +  llvm::StringRef remainder = expr;
> +  do {
> +    if (llvm::Expected<Token> t = Lex(expr, remainder)) {
> +      tokens.push_back(std::move(*t));
> +    } else {
> +      return t.takeError();
> +    }
> +  } while (tokens.back().GetKind() != Token::eof);
> +  return DILLexer(expr, std::move(tokens));
> +}
> +
> +llvm::Expected<Token> DILLexer::Lex(llvm::StringRef expr,
> +                                    llvm::StringRef &remainder) {
> +  // Skip over whitespace (spaces).
> +  remainder = remainder.ltrim();
> +  llvm::StringRef::iterator cur_pos = expr.end() - remainder.size();
> ⬇️ Suggested change
> -  llvm::StringRef::iterator cur_pos = expr.end() - remainder.size();
> +  llvm::StringRef::iterator cur_pos = remainder.begin();
> In lldb/source/ValueObject/DILLexer.cpp <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936972263>:
> 
> > +      tokens.push_back(std::move(*t));
> +    } else {
> +      return t.takeError();
> +    }
> +  } while (tokens.back().GetKind() != Token::eof);
> +  return DILLexer(expr, std::move(tokens));
> +}
> +
> +llvm::Expected<Token> DILLexer::Lex(llvm::StringRef expr,
> +                                    llvm::StringRef &remainder) {
> +  // Skip over whitespace (spaces).
> +  remainder = remainder.ltrim();
> +  llvm::StringRef::iterator cur_pos = expr.end() - remainder.size();
> +
> +  // Check to see if we've reached the end of our input string.
> +  if (remainder.empty() || cur_pos == expr.end())
> ⬇️ Suggested change
> -  if (remainder.empty() || cur_pos == expr.end())
> +  if (remainder.empty())
> If the two conditions aren't equivalent, I'd like to know why.
> 
> In lldb/source/ValueObject/DILLexer.cpp <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936973214>:
> 
> > +  if (maybe_word) {
> +    llvm::StringRef word = *maybe_word;
> +    return Token(Token::identifier, word.str(), position);
> +  }
> ⬇️ Suggested change
> -  if (maybe_word) {
> -    llvm::StringRef word = *maybe_word;
> -    return Token(Token::identifier, word.str(), position);
> -  }
> +  if (maybe_word)
> +    return Token(Token::identifier, maybe_word->str(), position);
> In lldb/source/ValueObject/DILLexer.cpp <https://github.com/llvm/llvm-project/pull/123521#discussion_r1936974931>:
> 
> > +    if (remainder.consume_front(str)) {
> +      cur_pos += strlen(str);
> +      return Token(kind, str, position);
> +    }
> ⬇️ Suggested change
> -    if (remainder.consume_front(str)) {
> -      cur_pos += strlen(str);
> -      return Token(kind, str, position);
> -    }
> +    if (remainder.consume_front(str))
> +      return Token(kind, str, position);
> (cur_pos isn't used after this)
> 
> —
> Reply to this email directly, view it on GitHub <https://github.com/llvm/llvm-project/pull/123521#pullrequestreview-2585984062>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW74P2KD3GANQKHXK6D2NNFDRAVCNFSM6AAAAABVO4RH2WVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKOBVHE4DIMBWGI>.
> You are receiving this because you were mentioned.
> 


https://github.com/llvm/llvm-project/pull/123521