[llvm-dev] Linking Linux kernel with LLD
Sean Silva via llvm-dev
llvm-dev at lists.llvm.org
Thu Jan 26 19:56:57 PST 2017
On Tue, Jan 24, 2017 at 11:29 AM, Rui Ueyama <ruiu at google.com> wrote:
> Well, maybe, we should just change the Linux kernel instead of tweaking
> our tokenizer too hard.
This is silly. Writing a simple and maintainable lexer is not hard (look
e.g. at https://reviews.llvm.org/D10817). There are some complicated
context-sensitive cases in linker scripts that break our approach of
tokenizing up front (so we might want to hold off on), but we aren't going
to die from implementing enough to lex basic arithmetic expressions
independent of whitespace.
We will be laughed at. ("You seriously couldn't even be bothered to
implement a real lexer?")
-- Sean Silva
> On Tue, Jan 24, 2017 at 7:57 AM, George Rimar <grimar at accesssoftek.com>
>> >Our tokenizer recognize
>> > [A-Za-z0-9_.$/\\~=+*?\-:!<>]+
>> >as a token. gold uses more complex rules to tokenize. I don't think we
>> need that much complex rules, but there seems to be >room to improve our
>> tokenizer. In particular, I believe we can parse the Linux's linker script
>> by changing the tokenizer rules as >follows.
>> > [A-Za-z_.$/\\~=+*?\-:!<>][A-Za-z0-9_.$/\\~=+*?\-:!<>]*
>> > [0-9]+
>> After more investigation, that seems will not work so simple.
>> Next are possible examples where it will be broken:
>> . = 0x1000; (gives tokens "0, x1000")
>> . = A*10; (gives "A*10")
>> . = 10k; (gives "10, k")
>> . = 10*5; (gives "10, *5"
>> "[0-9]+" could be "[0-9][kmhKMHx0-9]*"
>> but for "10*5" that anyways gives "10" and "*5" tokens.
>> And I do not think we can involve some handling of operators,
>> as its hard to assume some context on tokenizing step.
>> We do not know if that a file name we are parsing or a math expression.
>> May be worth trying to handle this on higher level, during evaluation of
>> expressions ?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev