[Lldb-commits] [lldb] [LLDB] Add Lexer (with tests) for DIL (Data Inspection Language). (PR #123521)

Thu Jan 30 00:53:32 PST 2025

labath wrote:

> We would have to change how synthetic children are made currently for array like things for this to work. If I have a std:vector called `foo` and I write:
> 
> (lldb) v foo[0]
> 
> and the DIL parses that it will either ask foo for a child named `0`, which doesn't exist, or ask it for the 0th child which currently is not required to be the one called `[0]`. I think I mentioned this back when we were first discussing this, but if we want to make this part work, then we have to add some API on the synthetic child provider like `AddVectorChild(size_t idx, ValueObjectSP child_sp)` and require you use that to make vector like things. Then we could require that the child named `[0]` HAS to be the 0th child, etc.
> 
> We still have the problem of all the synthetic child providers in the wild that might not do this.

I don't understand why we couldn't emulate exactly what happens now. Current `frame variable` doesn't simply ask for the child named `[0]`. It can't do that because pointers don't have children like that. Instead, it parsed the thing between the brackets [into an integer](https://github.com/llvm/llvm-project/blob/89ca3e72ca03efbbfb5ae9b1c71d81f2d1753521/lldb/source/Target/StackFrame.cpp#L808), and then calls `ValueObject::GetSyntheticArrayMember` with that integer. This function turns the integer back to a string, [re-adds the brackets](https://github.com/llvm/llvm-project/blob/3cf56b5f04cdec567cfab3975ac7b531422c1e2c/lldb/source/ValueObject/ValueObject.cpp#L1813) and *then* asks for the synthetic child with that name.

This roundaboutness has a very nice property in that you can specify the child number in any way you want, and it will get normalized to the right form:
```
(lldb) v m[1]
(std::__1::__value_type<int, int>::value_type) m[1] = (first = 0, second = 42)
(lldb) v m[0000001]
(std::__1::__value_type<int, int>::value_type) m[0000001] = (first = 0, second = 42)
(lldb) v m[0x000001]
(std::__1::__value_type<int, int>::value_type) m[0x000001] = (first = 0, second = 42)
```
And it definitely gives the impression that lldb "sees" into the array index, so I wouldn't at all be surprised if someone tried `v m[1+1]` and expected it to produce the child called `[2]`.

That's not to say that having the formatters provide more information about the structure of the children wouldn't be nice. For example, in the DAP protocol, we're expected to divide the children of an object into "named" and "indexed" categories. We can't do that right now, as we don't get that information. However, I don't think this needs to be tied to this work in any way.

> > These are unfortunately visible to the user, and they will have to type them in `frame var` to access the child. So we can't make them something ugly.
> 
> I don't want to make them ugly, but I _do_ think we need to ask the user to distinguish them in some way. Otherwise I don't see how we can make DIL any more powerful than the current 'frame variable'.

I think there could be some reasonable middle ground here. Quoting/escapting every synthetic child is not going to fly. However, I wouldn't have a problem with asking users to quote "weird" child names. So, like, if there's a variable called `unsigned` somewhere, we'd want `v unsigned` to work. But if some formatter decides to name its child `a*b` (or even `a.b`), then i think it's fine if we ask users to mark that specially. We would need to be careful about the choice of quoting mechanism, and we'd probably need to change the `frame variable` command to take "raw" input to prevent all the quotes from getting stripped, but I think it's doable.

Also, if we have this feature, I don't think it needs to be limited to synthetic children. There's no reason why not to use the same mechanism for regular variables as well. Even though most (but not all) languages do not support weird variable names like `a*b`, DWARF is definitely capable of expressing them, so we might as well support that.

> Yes, collection classes that use arbitrary types for the keys are problematic. How would we show the map if the keys are structures with some non-trivial compare function in a way that wouldn't be awful to use? I don't think we can get around people having to know that `v` path expressions are NOT expressions in the underlying language.

Complex key types are hard for sure, and I think the current setup is perfectly reasonable for them. It's just that the result for integral keys (which are pretty common) is not ideal. It would be nice if users had a way to answer questions like "in my map of million keys, which value does the number 47 map to?" without an expression evaluator (which will often not compile). Any solution here is most certainly going to require the data formatters to provide more info...

https://github.com/llvm/llvm-project/pull/123521