[cfe-dev] How context sensitive lexical analysis is done?

Thu Feb 14 14:25:38 PST 2019

Hi,

I see that clang uses "non-reference lexical grammar" to deal with
context-sensitive lexical analysis. Could anybody give some pointer on
how exactly this is done to deal with the context-sensitive grammar?

https://en.wikipedia.org/wiki/The_lexer_hack#Alternative_solutions

I am interested in parse shell code.

For example, in normal mode, no spaces are allowed in the assignment.

x=10

But in math mode, spaces are allowed.

((x = 10))

Bash currently just tokenizes both cases as single tokens (`x=10` and
`((x = 10))`, respectively).

I think that if one were to tokenize at a finer level and let the
grammar to handle the assignment, then one will need different lexers
in different contexts. But this potentially can allow the shell
language to be more expressive. So I'd like to understand how this can
be done.

-- 
Regards,
Peng