[PATCH] D74687: [LLD][ELF] - Linker script: do not fail parsing when "/DISCARD/" follows the fill expression.

Wed Feb 26 08:51:28 PST 2020

MaskRay added a comment.

In D74687#1893041 <https://reviews.llvm.org/D74687#1893041>, @grimar wrote:

> In D74687#1880879 <https://reviews.llvm.org/D74687#1880879>, @MaskRay wrote:
>
> > I prefer an alternative: limit `=` to accept `readPrimary` instead of `readFill`:
> >
> >   if (peek() == "=" || peek().startswith("=")) {
> >     inExpr = true;
> >     consume("=");
> >     cmd->filler = readFill();
> >     inExpr = false;
> >   }
> >   
> >
> >
>
>
> Honestly I just do not feel comfortable to do this change.
>  If I currectly understood, you're suggesting something like:
>
>   if (peek() == "=" || peek().startswith("=")) {
>     inExpr = true;
>     consume("=");
>     uint64_t value = readPrimary()().val;
>     std::array<uint8_t, 4> buf;
>     write32be(buf.data(), (uint32_t)value);
>   
>     cmd->filler = buf;
>     inExpr = false;
>   }
>   
>
> With it we stop supporting syntax like `SECTIONS { .mysec : { *(.mysec*) } =0x11223300+0x44 }`.
>  It is a deviation from the specification which says that "fillexp is an expression" and mentions a unary +.
>
> > As you can see, GNU ld's `=` syntax is weird...
>
> Yes, but it the current state LLD seems to have a reasonable behavior.
>  I am not sure we should intentionally ignore specification and remove this already supported feature just because of a specific issue
>  with "/DISCARD/".

See my example in https://reviews.llvm.org/D74687#1878409 I suspect `=0x10+2+3` never works properly in GNU ld.
If we disallow that in lld, a user can easily work around the limitation by using `=(0x10+2+3)`.

In D74687#1893065 <https://reviews.llvm.org/D74687#1893065>, @grimar wrote:

> In D74687#1893064 <https://reviews.llvm.org/D74687#1893064>, @psmith wrote:
>
> > Would it make sense to make /DISCARD/ its own token recognised by the Lexer? I believe that is what BFD does. It would prevent it from being recognised as "/" "DISCARD" "/". Apologies if this is not appropriate, I'm not too familiar with this part of the code.
>
>
> It is what my patch does :) "/DISCARD/" is a single token initially, but during parsing of the FILL expression,
>  we call `tokenizeExpr` to return "3", "*" and "5" tokens for "3*5", for example.
>
> I.e. `tokenizeExpr` should probably ignore and don't split the "/DISCARD/", and that is what I do.

The trailing tokens of one output section description can cause other ambiguity when parsing the next output section description:

  SECTIONS {
    .foo : { } =3
    /a : { }
  }

I think it is just not necessary to support a syntax that will inherently cause problems, especially when there is no functionality loss.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74687/new/

https://reviews.llvm.org/D74687