[llvm-dev] Linking Linux kernel with LLD

Tue Jan 24 07:57:50 PST 2017

>Our tokenizer recognize

>
>  [A-Za-z0-9_.$/\\~=+[]*?\-:!<>]+
>
>as a token. gold uses more complex rules to tokenize. I don't think we need that much complex rules, but there seems to be >room to improve our tokenizer. In particular, I believe we can parse the Linux's linker script by changing the tokenizer rules as >follows.
>
>  [A-Za-z_.$/\\~=+[]*?\-:!<>][A-Za-z0-9_.$/\\~=+[]*?\-:!<>]*
>
>or
>
>  [0-9]+?

After more investigation, that seems will not work so simple.
Next are possible examples where it will be broken:
. = 0x1000; (gives tokens "0, x1000")
. = A*10;   (gives "A*10")
. = 10k;    (gives "10, k")
. = 10*5;   (gives "10, *5"

"[0-9]+" could be "[0-9][kmhKMHx0-9]*"
but for "10*5" that anyways gives "10" and "*5" tokens.
And I do not think we can involve some handling of operators,
as its hard to assume some context on tokenizing step.
We do not know if that a file name we are parsing or a math expression.

May be worth trying to handle this on higher level, during evaluation of
expressions ?

George.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170124/2af5c409/attachment.html>