[Lldb-commits] [lldb] [LLDB] Add Lexer (with tests) for DIL (Data Inspection Language). (PR #123521)
Pavel Labath via lldb-commits
lldb-commits at lists.llvm.org
Mon Feb 3 04:55:22 PST 2025
labath wrote:
Thanks for the summary Andy.
> I went through the thread to understand the current consensus and I see various nice ideas flying around. Let me try to summarize the requirements and the current status quo.
>
> * DIL should support synthetic children, whose names can be arbitrary in general case, even `a+b*c`
>
> * A common synthetic name is `[0]`, which is used for children of vectors/maps and people want to write `print vec[0]`
>
> * `frame variable` supports this by having special support for `[]` "expressions"
>
> * We want DIL to be easy & convenient to use in most (simple) cases, but also to be able to support complicated cases and it doesn't have to be _super_ convenient for those
SGTM
> Possible behaviour for DIL:
>
> * Make the definition of `identifier` in Lexer to roughly match C or similar
>
> * Introduce escaping of identifiers, e.g. with backticks
>
> * The expression `` foo->`a*b.c`+1 `` is parsed as approx `foo.GetChildWithName("a*b.c") + 1`
I already spoke at length about identifier names. Quoting of fancy names SGTM. I don't think its relevant for this patch (lexing), but since you're also mentioning the wider syntax of the language, I want to mention that there's also another kind of disambiguation to consider. Forcing a string to be treated as an identifier is one thing. Another question is forcing an identifier to be treated as a specific kind of entity. For example, if there's a variable and a type with the same name, can we say which one we mean? Or can we say we want to see the value of a global variable `foo` in a specific compile unit? Or in an outer scope that's shadowed by another variable?
We don't exactly support that right now, but e.g. `target variable foo` will print *all* global variables with that names. That gets trickier with a more complicated parser, because how do you print the result of `global1+global2` if there are multiple candidates for each name
>
> * Add special support for `[]` in the parser
>
> * The expression `` foo.`[1]` `` is parsed as `foo.GetChildWithName("[1]")`
> * The expression `foo[1]` tries the following:
>
> * `foo.GetChildWithName("1")`
> * `foo.GetChildWithName("[1]")`
> * `foo.GetChildAtIndex(1)`
I'd go with just the second option because I'd like to avoid ambiguities and fallbacks. I think that's more or less the status quo, at least for the types with child providers. We'd need some special handling for C pointers and arrays, but that's also what we already have
> * The expression `foo["bar"]` tries:
>
> * `foo.GetChildWithName("bar")`
> * `foo.GetChildWithName("[bar]")`
> * The expression `foo[<expr>]` -- `expr` is evaluated and treated as cases above. If the result of `expr` is not a number or a string -- produce an error
I think this is an interesting idea whose usefulness mainly depends on what kind of string operations we have in the language. If there's no way to construct strings, then I don't think it's very useful. You could write things like `some_variable[another_variable]`, but I don't think that's going to be useful because for static types (e.g. C structs) you're unlikely to have a variable with a value that contains the name of a field, and for types with synthetic children the names of the children are not going to have any relation to the way it's seen by the code (map<string, string> is still going to have a child called `[0]`).
Overall I'd leave this out for the time being because it doesn't impact parsing or lexing, just how some (currently invalid) syntax trees can be interpreted.
> For cases where the `GetChildWithName( "1" / "[1]" )` and `GetChildAtIndex(1)` produce different valid results we could shows a warning/error to make it clear to the user that there's some ambiguity. IMO this would be an improvement over the current situation where `print` and `expr` simply produce different results.
I'm not sure how this is going to help. I assume you're referring to the `map<int, int>` scenario. In this case, the map object is not going to have a child called `"1"`, even if it happens to contain a key with the value `1`. (Depending on the other keys, the name of the child containing it could be `"[0]"`, `"[1]"`, or `"[47]"`). Or are you proposing to change that?
https://github.com/llvm/llvm-project/pull/123521
More information about the lldb-commits
mailing list