[Lldb-commits] [lldb] [LLDB] Add Lexer (with tests) for DIL (Data Inspection Language). (PR #123521)

Mon Jan 27 04:37:25 PST 2025

https://github.com/labath commented:

There are some simplifications (and one rewrite :P) that I'd like to talk about, but I think we're not far.

The main thing that's bothering me is this identifier vs. keyword issue (and for this part, I'd like to loop in @jimingham (at least)). 

Your implementation takes a very strict view of what's a possible identifier (it must consist of a very specific set of characters, appearing in a specific order, and it must not be a keyword). In contrast, the current "frame variable" implementation basically treats anything it doesn't recognise (including whitespace) as a variable name (and it has no keywords):
```
(lldb) v a*b
error: no variable named 'a*b' found in this frame
(lldb) v "a b"
error: no variable named 'a b' found in this frame
(lldb) v 123
error: no variable named '123' found in this frame
(lldb) v namespace
error: no variable named 'namespace' found in this frame
```

Now, obviously, in order to expand the language, we need to restrict the set of variable names, but I don't think we need to do it so aggressively. I don't think anyone will complain if we make it harder for him to access a variable called `a*b`, but for example, `namespace`, and `💩` are perfectly valid variable names in many languages ([one of them](https://godbolt.org/z/vjfhj6dfM) is C).

For this reason, I think it'd be better to have a very liberal definition of what constitutes a possible variable name (identifier). Instead of a prescribing a character sequence, I think it be better to say that anything that doesn't contain one of the characters we recognize (basically: operators) is an identifier. IOW, to do something like `frame variable` does right now.

As for keywords, I think it'd be best to have as few of them as possible, and for the rest, to treat their keyword-ness as context-dependent whereever possible. What I mean by that is to recognise them as keywords only within contexts where such usage would be legal. The `namespace` keyword is a prime example of that. I *think* the only place where this can appear as a keyword in the context of the DIL is within the `(anonymous namespace)` group. If that's the case, then I think we should be able to detect that and disambiguate the usage, so that e.g. `v namespace` prints the variable called `namespace` and `v (anonymous namespace)::namespace` prints the variable called `namespace` in an anonymous namespace (in an evil language which has anonymous namespaces but allows you to use variables called `namespace`).

The way to do that is probably to *not* treat the string "namespace" as a keyword at the lexer level, but to recognize it later, when we have more context available.

What do you all think?

https://github.com/llvm/llvm-project/pull/123521