[cfe-dev] [clang-format] Token Annotation limitations
MyDeveloper Day via cfe-dev
cfe-dev at lists.llvm.org
Thu Dec 2 11:55:27 PST 2021
Here is an example of where I annotate a function declaration after the fact
AnnotatedTokens(L=1):
M=0 C=0 T=FunctionDeclarationReturnType S=1 F=0 B=0 BK=0 P=0 Name=void L=4
PPK=2 FakeLParens= FakeRParens=0 II=0x1f9f89f7628 Text='void'
M=0 C=1 T=FunctionDeclarationName S=1 F=0 B=0 BK=0 P=80 Name=identifier
L=8 PPK=2 FakeLParens= FakeRParens=0 II=0x1f9f89f5358 Text='Bar'
M=0 C=0 T=FunctionDeclarationLParan S=0 F=0 B=0 BK=0 P=23 Name=l_paren L=9
PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text='('
M=0 C=1 T=FunctionDeclarationParameterType S=0 F=0 B=0 BK=0 P=140
Name=identifier L=12 PPK=2 FakeLParens=1/ FakeRParens=0 II=0x1f9f89f5358
Text='Bar'
M=0 C=0 T=FunctionDeclarationParameterComma S=0 F=0 B=0 BK=0 P=41
Name=comma L=13 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=','
M=0 C=1 T=Unknown S=1 F=0 B=0 BK=0 P=41 Name=const L=19 PPK=2 FakeLParens=
FakeRParens=0 II=0x1f9f89f70c8 Text='const'
M=0 C=0 T=FunctionDeclarationParameterType S=1 F=0 B=0 BK=0 P=43 Name=int
L=23 PPK=2 FakeLParens= FakeRParens=0 II=0x1f9f89f7340 Text='int'
M=0 C=1 T=PointerOrReference S=1 F=0 B=0 BK=0 P=230 Name=star L=25 PPK=2
FakeLParens= FakeRParens=0 II=0x0 Text='*'
M=0 C=1 T=StartOfName S=0 F=0 B=0 BK=0 P=240 Name=identifier L=26 PPK=2
FakeLParens= FakeRParens=0 II=0x1f9f89f53e8 Text='b'
M=0 C=1 T=PointerOrReference S=1 F=0 B=0 BK=0 P=230 Name=star L=28 PPK=2
FakeLParens= FakeRParens=0 II=0x0 Text='*'
M=0 C=0 T=FunctionDeclarationParameterComma S=0 F=0 B=0 BK=0 P=54
Name=comma L=29 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=','
M=0 C=1 T=FunctionDeclarationParameterType S=1 F=0 B=0 BK=0 P=41 Name=int
L=33 PPK=2 FakeLParens= FakeRParens=0 II=0x1f9f89f7340 Text='int'
M=0 C=1 T=StartOfName S=1 F=0 B=0 BK=0 P=240 Name=identifier L=35 PPK=2
FakeLParens= FakeRParens=1 II=0x1f9f89f5418 Text='c'
M=0 C=0 T=FunctionDeclarationRParan S=0 F=0 B=0 BK=0 P=43 Name=r_paren
L=36 PPK=2 FakeLParens= FakeRParens=0 II=0x0 Text=')'
M=0 C=0 T=Unknown S=0 F=0 B=0 BK=0 P=23 Name=semi L=37 PPK=2 FakeLParens=
FakeRParens=0 II=0x0 Text=';'
This doesn't fix my exact issue, but if we could perform this kind of
annotation across all code it might help to break down some of the
ambiguities
MyDeveloperDay
On Thu, Dec 2, 2021 at 7:38 PM Björn Schäpers via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> Am 02.12.2021 um 12:01 schrieb MyDeveloper Day via cfe-dev:
> > I've recently been looking into a bug in clang format, that incorrectly
> treats *
> > as a TT_PointerOrReference
> >
> > void f() { operator+(a, b *b); }
> >
> > Actually the misinterpretation of * and & as BinaryOperators or
> > PointerOrReference is a major source of bugs for clang-format, this is
> because
> > we don't have semantic information and often we are looking very
> myopically at
> > adjacent tokens and there just isn't enough context to disambiguate.
> >
> > So whilst we see b * b this a easily (b x b)
> >
> > This is actually just
> >
> > tok::identifier->tok::star->tok::identifier
> >
> > Which is indistinguishable from
> >
> > MyClass *b
> >
> > which I think we'd all see as (class ptr b), but in all cases we have no
> > additional information, other than perhaps scanning up and down the line
> to find
> > something relevant.
> >
> > This literally drops out the bottom of determineStarAmpUsage() with a
> > return TT_PointerOrReference;
> >
> > For the bug I'm looking at, given the following example the first f()
> puts the *
> > as a pointer when actually its a multiply;
> >
> > void f() { operator+(Bar, Foo *Foo); }
> >
> > class A {
> > void operator+(Bar, Foo *Foo);
> > }
> >
> > Effectively these two instances of operator appears as the same with
> almost no
> > difference in terms of tokens,
> >
> > Looking at the token annotations, there is almost nothing that
> distinguishes
> > between the two.
> >
> > [snip]
> >
> > We've always known this is an issue, and it's to do with the fact that
> we don't
> > have information near the "*" that tells us in what context it is.
> >
> > I feel that actually part of the problem is that we are not setting the
> > "setType" on the tokens i.e. T=Unknown in all cases, I think we have
> more
> > contextual information
> > than we think, I just feel like we throw it away a lot of the time.
> >
> > Think about how you look at the code visually.. how do you determine
> that its a
> > definition rather than a declaration? Both bits of code are almost
> > identical so what is it that's forcing us to see the difference?
> >
> > Somehow I feel we should be classifying as many of the tokens as we
> process
> > them. Every (){} and , every type or return type. I feel with more
> information
> > we could
> > solve these types of issues.
> >
> > By introducing a few more types and then systematically trying to
> annotate all
> > the tokens we could have greater inference.
> >
> > To me I feel, that given that we see operator as FunctionDeclarationName
> we
> > should then change the ( and ) to be OverloadedOperatorDeclarationLParen
> and
> > OverloadedOperatorDeclarationRParen
> >
> > We should classify the "," in the declaration as FunctionDeclarationComma
> >
> > From that we could determine that Bar and Foo (1st) were Type
> declarations, and
> > that would make a rule for determining the * to be almost trivial.
> >
> > I think in clang-format we need to take a more holistic approach to
> > TokenAnnotations, classify as many tokens as we can as accurately as we
> can.
> > T=Unknown is almost useless to us, and that
> > lack of context easily means we misinterpret.
> >
> > Any thoughts?
> >
> > MyDeveloperDay
> >
>
> Seems reasonable to me. I also think we should add and keep more
> information in
> the unwrapped line parser.
>
> Kind regards,
> Björn.
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211202/9aad5cdd/attachment.html>
More information about the cfe-dev
mailing list