[Lldb-commits] [lldb] aee4925 - Recommit: Compress formatting of array type names (int [4] -> int[4])

Thu Oct 28 06:18:20 PDT 2021

On 26/10/2021 23:14, Jim Ingham via lldb-commits wrote:
> 
> 
>> On Oct 26, 2021, at 12:45 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>> On Tue, Oct 26, 2021 at 12:15 PM Raphael Isemann <teemperor at gmail.com> wrote:
>> Not sure how many LLDB users would (or can) write AST matchers though. And
>>      (lldb) type summary add "foo[4]" ...
>> is arguably easier on the eyes than
>>      type summary add
>> constantArrayType(allOf(hasSize(4),hasDeclaration(namedDecl(hasName("foo")))))`
>>
>> Though presumably (& the example I think Jim was giving) more often the desire would be to write a matcher for arrays of any dimension, where it's:
>>
>> "foo[\d+]" or similar (maybe having to escape the "[]"
>> compared to
>> arrayType(hasElementType(hasDeclaration(cxxRecordDecl(hasName("foo")))))
>>   
>> Which is, admittedly, still pretty verbose. But more type checked (for better and worse - annoying to get right (took me ages to figure out I needed "hasDeclaration" in there) but more robust once it is right)
>>
>> (that's my best shot at a good AST matcher, not sure if we could make
>> this shorter, but that's probably what a user ends up with).
>>
>> In this case it seems just comparing tokens instead of direct strings
>> is enough to make most users happy. And it doesn't require building
>> clang ASTs (which is always expensive).
>>
>> Yeah, as much as I like the abstract purity of structural comparison, token comparison is probably close enough. (guess that has to involve LLDB walking through typedefs incrementally, etc - if typedefs can be used in pretty printer matching/searching, otherwise got to go strip out all the typedefs (from the users regex (if they are allowed there) and from the type of the thing being printed) and other sugar before comparison - guess that's done today somewhere)
>>
>>
> 
> The formatter matching does use a type hierarchy if it has one.  We walk the chain of typedefs looking for a match, first one wins.  We also match down class hierarchies, since if you only format the base class fields, it would be annoying to have to write a new formatter for each derived class just to get the base class fields printed.
> 
> We could also invent some mini-language for the type specification, like "char [*]" to mean an array of any size.  But I hesitate to introduce non-language features in here because then some other language we need to support will use those characters making them ambiguous.  And it's better to rely on facts people already know to specify this sort of thing, if possible.  But maybe having a slightly smart tokenizer and a few conventions for the common cases might make specifying matches less awkward.
> 
> But it doesn't require that we know about then names being matched, and will fall back now on either plain string matching or regex-es.
> 

I've recently approached this issue from a slightly different angle 
(trying to format sizeless char arrays (char[]), and I've learned a 
couple of things which might be interesting to this discussion:
- besides typedefs, we also strip cv qualifiers (so a formatter for char 
* applies to const char * as well). However, we didn't to this for array 
types (so one had to include char [.*] and const char [.*] separately. 
(D112708 extends this behavior to arrays too)
- our formatters were not handling cv arrays separately, and the reason 
this worked is because they used unachored regexes, so "char [.*]" 
matched the last part of "const volatile char [47]"
- the unachored matching meant that these formatters were getting 
applied to completely unrelated types as well (e.g. MyStruct<char[57]>). 
(D112709 fixes the second two issues)

While I don't think it's realistic to expect that our users will learn 
how clang ast matchers (or any similar language) work, I think all of 
these examples show that the regex approach has a lot of pitfalls as 
well. It might be good to consider switching to a full-match regexes at 
least, as I think the template trick will affect most formatters, and I 
doubt users will remember to surround their expressions with ^$.

pl