[llvm-dev] Linking Linux kernel with LLD

Rui Ueyama via llvm-dev llvm-dev at lists.llvm.org
Tue Jan 24 11:29:30 PST 2017


Well, maybe, we should just change the Linux kernel instead of tweaking our
tokenizer too hard.

On Tue, Jan 24, 2017 at 7:57 AM, George Rimar <grimar at accesssoftek.com>
wrote:

> >Our tokenizer recognize
> >
> >  [A-Za-z0-9_.$/\\~=+[]*?\-:!<>]+
> >
> >as a token. gold uses more complex rules to tokenize. I don't think we
> need that much complex rules, but there seems to be >room to improve our
> tokenizer. In particular, I believe we can parse the Linux's linker script
> by changing the tokenizer rules as >follows.
> >
> >  [A-Za-z_.$/\\~=+[]*?\-:!<>][A-Za-z0-9_.$/\\~=+[]*?\-:!<>]*
> >
> >or
> >
> >  [0-9]+​
>
> After more investigation, that seems will not work so simple.
> Next are possible examples where it will be broken:
> . = 0x1000; (gives tokens "0, x1000")
> . = A*10;   (gives "A*10")
> . = 10k;    (gives "10, k")
> . = 10*5;   (gives "10, *5"
>
> "[0-9]+" could be "[0-9][kmhKMHx0-9]*"
> but for "10*5" that anyways gives "10" and "*5" tokens.
> And I do not think we can involve some handling of operators,
> as its hard to assume some context on tokenizing step.
> We do not know if that a file name we are parsing or a math expression.
>
> May be worth trying to handle this on higher level, during evaluation of
> expressions ?
>
> George.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170124/81ba40d3/attachment.html>


More information about the llvm-dev mailing list