[llvm-dev] Linking Linux kernel with LLD

Fri Jan 27 19:48:29 PST 2017

On Fri, Jan 27, 2017 at 1:31 PM, Rui Ueyama <ruiu at google.com> wrote:

> Sean,
>
> So as you noticed that linker script tokenization rule is not very trivial
> -- it is context sensitive. The current lexer is extremely simple and
> almost always works well. Improving "almost always" to "perfect" is not
> high priority because we have many more high priority things, but I'm fine
> if someone improves it. If you are interested, please take it. Or maybe
> I'll take a look at it. It shouldn't be hard. It's probably just a half day
> work.
>

Yeah. To be clear, I wasn't saying that this was high priority. Since I'm
complaining so much about it maybe I should take a look this weekend :)

>
> As far as I know, the grammar is LL(1), so it needs only one push-back
> buffer. Handling INCLUDE directive can be a bit tricky though.
>
> Maybe we should rename ScriptParserBase ScriptLexer.
>

That sounds like a good idea.

-- Sean Silva

>
> On Fri, Jan 27, 2017 at 11:17 AM, Rafael Avila de Espindola <
> rafael.espindola at gmail.com> wrote:
>
>> > Hmm..., the crux of not being able to lex arithmetic expressions seems
>> to
>> > be due to lack of context sensitivity. E.g. consider `foo*bar`. Could
>> be a
>> > multiplication, or could be a glob pattern.
>> >
>> > Looking at the code more closely, adding context sensitivity wouldn't be
>> > that hard. In fact, our ScriptParserBase class is actually a lexer
>> (look at
>> > the interface; it is a lexer's interface). It shouldn't be hard to
>> change
>> > from an up-front tokenization to a more normal lexer approach of
>> scanning
>> > the text for each call that wants the next token. Roughly speaking, just
>> > take the body of the for loop inside ScriptParserBase::tokenize and add
>> a
>> > helper which does that on the fly and is called by consume/next/etc.
>> > Instead of an index into a token vector, just keep a `const char *`
>> pointer
>> > that we advance.
>> >
>> > Once that is done, we can easily add a `nextArithmeticToken` or
>> something
>> > like that which just lexes with different rules.
>>
>> I like that idea. I first thought of always having '*' as a token, but
>> then space has to be a token, which is an incredible pain.
>>
>> I then thought of having a "setLexMode" method, but the lex mode can
>> always be implicit from where we are in the parser. The parser should
>> always know if it should call next or nextArithmetic.
>>
>> And I agree we should probably implement this. Even if it is not common,
>> it looks pretty silly to not be able to handle 2*5.
>>
>> Cheers,
>> Rafael
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170127/4685e429/attachment.html>