[PATCH] [lld] Teach LLD how to parse complete linker scripts
Rafael Auler
rafaelauler at gmail.com
Thu Oct 23 18:59:34 PDT 2014
Hi Rui,
I appreciate the thorough review, thanks! I implemented your suggestions and will upload a new patch now.
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:70
@@ -41,1 +69,3 @@
+ lessequal,
+ greaterequal,
kw_entry,
----------------
ruiu wrote:
> Sort in asciii-betical order.
Done
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:120
@@ -69,3 +119,3 @@
bool canStartName(char c) const;
bool canContinueName(char c) const;
void skipWhitespace();
----------------
ruiu wrote:
> It should be done in a different patch, but these can* member functions have nothing to do with the Lexer class (they don't use member variables of the class etc). These should be non-member function.
I agree, I will submit another patch soon.
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:157
@@ +156,3 @@
+ InputSectionFile,
+ Overlay,
+ };
----------------
ruiu wrote:
> Sort.
Done
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:322
@@ +321,3 @@
+
+ virtual ~Expression() {}
+
----------------
ruiu wrote:
> Remove three blank lines above.
Done.
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:333
@@ +332,3 @@
+public:
+ Constant(uint64_t num) : Expression(Kind::Constant), _num(num) {}
+ void dump(raw_ostream &os) const override;
----------------
ruiu wrote:
> Is constant always unsigned?
Added the following comment to explain this:
/// A constant value is stored as unsigned because it represents absolute
/// values. We represent negative numbers by composing the unary '-' operator
/// with a constant.
I also added the UnaryMinus class to represent negative numbers and properly updated the parser, thanks for pointing out.
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:392
@@ +391,3 @@
+ CompareEqual,
+ CompareDifferent
+ };
----------------
ruiu wrote:
> sort
Done
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:467
@@ +466,3 @@
+/// names is important to determine which sections go first.
+enum WildcardSortMode {
+ WSM_NA,
----------------
ruiu wrote:
> Can you make this enum class?
Done
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:478
@@ +477,3 @@
+/// Represents either a single input section name or a group of sorted input
+/// section names. They specify which sections to map to a given output section.
+class InputSection {
----------------
ruiu wrote:
> Add an example to the comment. My understanding of this class is it is representing ".text" or "SORT(.text*)" in
>
> SECTIONS {
> .x: { .text }
> .y { SORT(.text*) }
> }
Done
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:535
@@ +534,3 @@
+/// An output-section-command that maps a series of sections inside a given
+/// file to an output section.
+class InputSectionFile : public Command {
----------------
ruiu wrote:
> example here.
Done
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:567
@@ +566,3 @@
+/// A sections-command to specify which input sections compose a given output
+/// section.
+class OutputSectionDescription : public Command {
----------------
ruiu wrote:
> example here
Done
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:605
@@ +604,3 @@
+
+class Overlay : public Command {
+public:
----------------
ruiu wrote:
> Or, instead of adding example here, a link to the documentation is better?
>
> For this, https://sourceware.org/binutils/docs/ld/Overlay-Description.html#Overlay-Description
Done
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:615
@@ +614,3 @@
+
+private:
+};
----------------
ruiu wrote:
> Remove
Done
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:675
@@ +674,3 @@
+ /// requires lexing additional tokens, store them in a private buffer.
+ const Token &peek(unsigned n) {
+ // Covers the case where the look ahead buffer contains the requested token
----------------
ruiu wrote:
> In this patch you pass only 1 to this function. Do you really need to look ahead arbitrary number of tokens? The buffer management looks a bit too complicated to me.
I agree. I changed this algorithm to only peek at the next token. I originally wrote this function to perform arbitrary look ahead because I was not sure if the linker script grammar would need it. Currently, I believe we won't need this in the future, so I modified this function to always peek the next token, which is a much simpler problem. If we ever come across the need to look ahead more than 1 token, I'll resubmit this complete algorithm again.
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:756
@@ +755,3 @@
+
+ /// Parse "identifier(param [, param]...)"
+ ///
----------------
ruiu wrote:
> You want to mention that identifier and "(" were already read.
Sorry, I'm not sure if I understood your observation here, you see, parseFunctionCall() is responsible for consuming this entire expression, including identifier and left paren.
================
Comment at: include/lld/ReaderWriter/LinkerScript.h:793
@@ +792,3 @@
+ /// \p LHS points to the left-hand-side operand of this operator
+ /// \p maxPrecedence has the maximum operator precedence level that this parse
+ /// function is allowed to consume.
----------------
ruiu wrote:
> I think we don't usually use \p notation.
No problem, removed.
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:85
@@ -37,1 +84,3 @@
+ CASE(greaterequal)
+ CASE(unknown)
#undef CASE
----------------
ruiu wrote:
> sort
Done
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:96
@@ +95,3 @@
+ if (c < '0' || c > '9')
+ return llvm::ErrorOr<uint64_t>(std::make_error_code(std::errc::io_error));
+ res += c - '0';
----------------
ruiu wrote:
> I think there's an implicit conversion from std::error_code to llvm::ErrorOr. So you can just write
>
> return std::errc::io_error
>
Thanks for the tip!
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:99
@@ +98,3 @@
+ }
+ return llvm::ErrorOr<uint64_t>(res);
+}
----------------
ruiu wrote:
> return res;
Fixed
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:134
@@ +133,3 @@
+ res += c - 'a' + 10;
+ else
+ res += c - 'A' + 10;
----------------
ruiu wrote:
> You can write
>
> else if (c >= 'A' && c <= 'F')
> res += c - 'A' + 10;
> else
> return llvm::ErrorOr<uint64_t>(std::make_error_code(std::errc::io_error));
>
Indeed, thanks, updated it.
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:147
@@ +146,3 @@
+ .StartsWith("0", octal)
+ .Default(decimal);
+
----------------
ruiu wrote:
> This needs to be case insensitive.
OK. I don't think that StringSwitch has something like StartsWithLower, so I just added another case.
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:156
@@ +155,3 @@
+ str = str.drop_back();
+ }
+
----------------
ruiu wrote:
> Instead of using two boolean values, you can define variable multiplier and set 1024 for K or 1024*1024 for M.
Fair, changed it.
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:176
@@ +175,3 @@
+ case hex:
+ if (str.startswith("0x"))
+ str = str.drop_front(2);
----------------
ruiu wrote:
> startswith_lower
Fixed
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:190
@@ +189,3 @@
+ }
+ if (!res.getError()) {
+ if (suffixK)
----------------
ruiu wrote:
> Use early return here.
>
> if (res.getError())
> return res;
Done
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:194
@@ +193,3 @@
+ else if (suffixM)
+ *res = *res << 20;
+ }
----------------
ruiu wrote:
> and multiply by the multiplier
OK
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:211
@@ +210,3 @@
+ case '8':
+ case '9':
+ return true;
----------------
ruiu wrote:
> return '0' <= c && c <= '9'
Fixed
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:253
@@ +252,3 @@
+ case 'M':
+ case 'K':
+ return true;
----------------
ruiu wrote:
> You want to write multiple cases in one line. Or if expression may be shorter.
In fact I did with multiple cases, but clang-format undid it :-) will fix it.
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:325
@@ +324,3 @@
+ tok = Token(_buffer.substr(0, 1), Token::l_brace);
+ _buffer = _buffer.drop_front();
+ return;
----------------
ruiu wrote:
> I think we need a function something like
>
> drop(StringRef &s, int n)
>
> that (destructively) drops n characters from s and returns them.
Good idea. Done.
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:456
@@ -110,3 +455,3 @@
// simplicity.
if ((_buffer[0] == '\"') || (_buffer[0] == '\'')) {
char c = _buffer[0];
----------------
ruiu wrote:
> This line should be before default and
>
> case '"': case '\'':
Done
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:469
@@ +468,3 @@
+ if ((_buffer.startswith("0x") && _buffer.size() > 2 &&
+ canContinueNumber(_buffer[2])) ||
+ canStartNumber(_buffer[0])) {
----------------
ruiu wrote:
> This expression
>
> (_buffer.startswith("0x") && _buffer.size() > 2 && canContinueNumber(_buffer[2])
>
> seems redundant, because if a string starts with "0x", it will always satisfy
>
> canStartNumber(_buffer[0])
Fixed.
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:552
@@ +551,3 @@
+ .Case("ONLY_IF_RW", Token::kw_only_if_rw)
+ .Case("/DISCARD/", Token::kw_discard)
+ .Case("ENTRY", Token::kw_entry)
----------------
ruiu wrote:
> Sort
Done
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:748
@@ +747,3 @@
+ numParen = 1;
+ break;
+ case WSM_ByAlignment:
----------------
ruiu wrote:
> return 1;
Done
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:758
@@ +757,3 @@
+ case WSM_ByNameAndAlignment:
+ os << "SORT_BY_NAME(SORT_BY_ALIGNMENT";
+ numParen = 2;
----------------
ruiu wrote:
> Did you forget to add the trailing "("?
Right, thanks
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:762
@@ +761,3 @@
+ case WSM_ByAlignmentAndName:
+ os << "SORT_BY_ALIGNMENT(SORT_BY_NAME";
+ numParen = 2;
----------------
ruiu wrote:
> Ditto
Fixed
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:860
@@ +859,3 @@
+
+ os << " { \n";
+ for (auto &command : _outputSectionCommands) {
----------------
ruiu wrote:
> remove space before \n
Fixed
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:1002
@@ +1001,3 @@
+ switch (peek(1)._kind) {
+ case Token::l_paren:
+ return parseFunctionCall();
----------------
ruiu wrote:
> This can be
>
> if (peek(1)._kind == Token::l_paren)
> return parseFunctionCall();
> Symbol *sym = ...
> ...
> return sym;
Fixed
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:1123
@@ +1122,3 @@
+
+const Expression *Parser::parseOperatorOperandLoop(const Expression *LHS,
+ unsigned highestPrecedence) {
----------------
ruiu wrote:
> s/LHS/lhs/
Fixed
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:1141
@@ +1140,3 @@
+ consumeToken();
+ const Expression *RHS = parseExpression(precedence - 1);
+ if (!RHS)
----------------
ruiu wrote:
> s/RHS/rhs/
Fixed
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:1149
@@ +1148,3 @@
+
+const Expression *Parser::parseTernaryCondOp(const Expression *LHS) {
+ assert(_tok._kind == Token::question && "Expected question mark");
----------------
ruiu wrote:
> lhs
Fixed
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:1334
@@ +1333,3 @@
+ if (_tok._kind == Token::kw_hidden || _tok._kind == Token::kw_provide ||
+ _tok._kind == Token::kw_provide_hidden) {
+ switch (_tok._kind) {
----------------
ruiu wrote:
> Remove this if
Done
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:1361
@@ +1360,3 @@
+ kind = SymbolAssignment::Simple;
+ consumeToken();
+ break;
----------------
ruiu wrote:
> Move consumeToken()s after this switch statement.
Done
================
Comment at: lib/ReaderWriter/LinkerScript.cpp:1589
@@ +1588,3 @@
+ archiveSortMode, inputSections);
+ ;
+ consumeToken();
----------------
ruiu wrote:
> remove
Removed
http://reviews.llvm.org/D5852
More information about the llvm-commits
mailing list