[Lldb-commits] [lldb] [LLDB] Add Lexer (with tests) for DIL (Data Inspection Language). (PR #123521)
via lldb-commits
lldb-commits at lists.llvm.org
Mon Feb 3 10:02:12 PST 2025
jimingham wrote:
> On Feb 2, 2025, at 9:49 PM, cmtice ***@***.***> wrote:
>
>
> Apart from the (mainly stylistic) inline comments, the biggest problem I see is that the definition of an identifier is still too narrow. The restriction on the dollar sign is completely unnecessary as C will let you put that anywhere <https://godbolt.org/z/o7qbfeWve>. And it doesn't allow any non-ascii characters.
>
> I really think this should be based on an deny- rather than an allow-list. Any character we don't claim for ourselves should be fair game for an identifier. If someone manages to enter the backspace character (\x7f) into the expression, then so be it.
>
> The question of "identifiers" starting with digits is interesting. Personally, I think it'd be fine to reject those (and require the currenly-vapourware quoting syntax), because I suspect you want to accept number suffixes, and I think it'd be confusing to explain why 123x is a valid identifier but 123u is not, but I suspect some might have a different opinion.
>
> We could continue discussing that here, or we could accept everything here, and postpone this discussion for the patch which starts parsing numbers. Up to you..
>
> To the best of my knowledge, all the languages that we want to support have roughly the same definition of what a valid identifier is: a letter or underscore, followed by a sequence of letters, digits and underscores, where 'letters' are defined as 'a..z' and 'A..Z'. The one's I've been able to check do not allow arbitrary characters in their identifiers. So that's what I implemented (acknowledging that I currently only recognize ascii at the moment, and fully plan to add utf8 support in the future). I added the ability to recognize the '$' at the front specifically to allow DIL users to ask about registers and LLDB convenience variables, which (to the best of my knowledge) allow '$' only in the first position, and not all by itself.
>
> I am not sure I see that benefits of expanding what DIL recognizes as a valid identifier beyond what the languages LLDB supports recognize? Am I missing something? Or (this is quite possible) have I misunderstood the definition of what's a valid identifier for some language we want to support?
>
> Since we definitely want to support lexing/parsing of numbers, I do not think it's a good idea for DIL to also allow identifiers to start with numbers.
>
I agree here. We definitely will need to support UTF-8 characters, all the hip new languages use that character set. But allowing initial digits makes parsing sufficiently hard I don't think it likely there will be languages we need to support that do that. Can somebody even think of a language that allows this?
Jim
> —
> Reply to this email directly, view it on GitHub <https://github.com/llvm/llvm-project/pull/123521#issuecomment-2630033172>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW36H3ZJSE2MRP6OPY32N37O5AVCNFSM6AAAAABVO4RH2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZQGAZTGMJXGI>.
> You are receiving this because you were mentioned.
>
https://github.com/llvm/llvm-project/pull/123521
More information about the lldb-commits
mailing list