[Lldb-commits] [lldb] [LLDB] Add Lexer (with tests) for DIL (Data Inspection Language). (PR #123521)

Pavel Labath via lldb-commits lldb-commits at lists.llvm.org
Mon Feb 3 04:55:22 PST 2025


labath wrote:

Thanks for the summary Andy.

> I went through the thread to understand the current consensus and I see various nice ideas flying around. Let me try to summarize the requirements and the current status quo.
> 
>   * DIL should support synthetic children, whose names can be arbitrary in general case, even `a+b*c`
> 
>    * A common synthetic name is `[0]`, which is used for children of vectors/maps and people want to write `print vec[0]`
>       
>      * `frame variable` supports this by having special support for `[]` "expressions"
> 
>    * We want DIL to be easy & convenient to use in most (simple) cases, but also to be able to support complicated cases and it doesn't have to be _super_ convenient for those

SGTM

> Possible behaviour for DIL:
> 
>   * Make the definition of `identifier` in Lexer to roughly match C or similar
> 
>   * Introduce escaping of identifiers, e.g. with backticks
>       
>     * The expression `` foo->`a*b.c`+1 `` is parsed as approx `foo.GetChildWithName("a*b.c") + 1`

I already spoke at length about identifier names. Quoting of fancy names SGTM. I don't think its relevant for this patch (lexing), but since you're also mentioning the wider syntax of the language, I want to mention that there's also another kind of disambiguation to consider. Forcing a string to be treated as an identifier is one thing. Another question is forcing an identifier to be treated as a specific kind of entity. For example, if there's a variable and a type with the same name, can we say which one we mean? Or can we say we want to see the value of a global variable `foo` in a specific compile unit? Or in an outer scope that's shadowed by another variable?

We don't exactly support that right now, but e.g. `target variable foo` will print *all* global variables with that names. That gets trickier with a more complicated parser, because how do you print the result of `global1+global2` if there are multiple candidates for each name

> 
>     * Add special support for `[]` in the parser
>       
>       * The expression `` foo.`[1]`  `` is parsed as `foo.GetChildWithName("[1]")`
>       * The expression `foo[1]` tries the following:
>         
>         * `foo.GetChildWithName("1")`
>         * `foo.GetChildWithName("[1]")`
>         * `foo.GetChildAtIndex(1)`

I'd go with just the second option because I'd like to avoid ambiguities and fallbacks. I think that's more or less the status quo, at least for the types with child providers. We'd need some special handling for C pointers and arrays, but that's also what we already have

>       * The expression `foo["bar"]` tries:
>         
>         * `foo.GetChildWithName("bar")`
>         * `foo.GetChildWithName("[bar]")`
>       * The expression `foo[<expr>]` -- `expr` is evaluated and treated as cases above. If the result of `expr` is not a number or a string -- produce an error

I think this is an interesting idea whose usefulness mainly depends on what kind of string operations we have in the language. If there's no way to construct strings, then I don't think it's very useful. You could write things like `some_variable[another_variable]`, but I don't think that's going to be useful because for static types (e.g. C structs) you're unlikely to have a variable with a value that contains the name of a field, and for types with synthetic children the names of the children are not going to have any relation to the way it's seen by the code (map<string, string> is still going to have a child called `[0]`).

Overall I'd leave this out for the time being because it doesn't impact parsing or lexing, just how some (currently invalid) syntax trees can be interpreted.

> For cases where the `GetChildWithName( "1" / "[1]" )` and `GetChildAtIndex(1)` produce different valid results we could shows a warning/error to make it clear to the user that there's some ambiguity. IMO this would be an improvement over the current situation where `print` and `expr` simply produce different results.

I'm not sure how this is going to help. I assume you're referring to the `map<int, int>` scenario. In this case, the map object is not going to have a child called `"1"`, even if it happens to contain a key with the value `1`. (Depending on the other keys, the name of the child containing it could be `"[0]"`, `"[1]"`, or `"[47]"`). Or are you proposing to change that?

https://github.com/llvm/llvm-project/pull/123521


More information about the lldb-commits mailing list