[PATCH] D150870: [lld][Arm] Big Endian - Byte invariant support.

Peter Smith via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 18 09:46:18 PDT 2023


peter.smith added a comment.

I think the overall strategy looks good. I've got quite a few detailed comments. Not had a chance to go over the tests in detail yet. Will do that next week.



================
Comment at: lld/ELF/Arch/ARM.cpp:72
+
+  // Set the EF_ARM_BE8 flag in the ELF header, if elf file is big endian
+  // with BE-8 code
----------------
A couple of nits:
* Suggest ELF instead of elf and big-endian instead of big endian to be consistent with other comments.
* The convention for LLD comments is to finish with a full stop.


================
Comment at: lld/ELF/Arch/ARM.cpp:937
+
+  // Collect mapping symbols for every executable OutputSection.
+  for (const SymbolTableEntry &entry : in.symTab->getSymbols()) {
----------------
amilendra wrote:
> InputSection?
I think this should be InputSection as the map is from InputSection to defined.


================
Comment at: lld/ELF/Arch/ARM.cpp:949
+
+  // For each IutputSection make sure the mapping symbols are in sorted in
+  // ascending order.
----------------
amilendra wrote:
> Nit: InputSection
Type IutputSection -> InputSection


================
Comment at: lld/ELF/Arch/ARM.cpp:961
+
+static void BytesexReverseInstructions(uint8_t *buf, uint64_t start,
+                                       uint64_t end, uint32_t width) {
----------------
I think `toLittleEndianInstructions` is a better name as this doesn't endian reverse, it converts from big-endian to little- endian.  


================
Comment at: lld/ELF/Arch/ARM.cpp:963
+                                       uint64_t end, uint32_t width) {
+  uint8_t *bytes = buf;
+  for (uint64_t i = start; i < end; i += width) {
----------------
Do you need bytes? It looks like neither the pointer buf or the pointer bytes is written to. For example:
```
write32le(buf + i, read32(buf + i));
``` 


================
Comment at: lld/ELF/Arch/ARM.cpp:974
+
+// Reverse code sections of every inputsection (with the parent section has
+// SHF_EXECINSTR flag ). Code sections can contain a mix of Arm, Thumb and
----------------
I'd go with:
```
// Arm BE8 big endian format requires instructions to be little-endian, with the initial contents big-endian. Convert
// the big-endian instructions to little-endian leaving literal data untouched. We use mapping symbols to identify
// half open intervals of Arm code [$a, non $a) and Thumb [$t, non  $t) code and convert these to little-endian a word
// or half-word at a time respectively.


================
Comment at: lld/ELF/Arch/ARM.cpp:976
+// SHF_EXECINSTR flag ). Code sections can contain a mix of Arm, Thumb and
+// literal data. All Words between $A and ($T or $D) are reversed a word at
+// a time. All Half-Words between $T and ($A or $D) are reversed a half-word
----------------
Mapping symbols are lower case $a instead of $A


================
Comment at: lld/ELF/Arch/ARM.cpp:979
+// at a time. All other locations remain untouched
+void elf::endianReverseArmInstructions(InputSection *sec, uint8_t *buf) {
+  DenseMap<InputSection *, std::vector<const Defined *>> sectionMap =
----------------
Similar to the comment about the function name above. This is converting big to little endian, whereas the name implies that we are reversing.

Given that this is specific to Arm BE8 perhaps `convertArmInstructionsToBE8()`


================
Comment at: lld/ELF/Arch/ARM.cpp:981
+  DenseMap<InputSection *, std::vector<const Defined *>> sectionMap =
+      getArmMappingSymbolList();
+  std::vector<const Defined *> &mapSyms = sectionMap[sec];
----------------
`getArmMappingSymbolList()` is going to recalculate the mapping symbols for the whole program each time it is called (once per inputSection).

Given that the mapping symbols are read-only then I suggest calculating them before writing sections and storting them read-only in the ARM class or Config. 


================
Comment at: lld/ELF/Arch/ARM.cpp:987
+
+  enum CodeState { WORDCODE, HALFWORDCODE, DATA };
+  CodeState cur_state = DATA;
----------------
I suggest using an enum class using the normal variable naming convention. For example in Config.h
```
enum class BsymbolicKind { None, NonWeakFunctions, Functions, All };
```

I'd also suggest Arm, Thumb and Data for the names.


================
Comment at: lld/ELF/Arch/ARM.cpp:988
+  enum CodeState { WORDCODE, HALFWORDCODE, DATA };
+  CodeState cur_state = DATA;
+
----------------
curState (LLVM coding standard)


================
Comment at: lld/ELF/Arch/ARM.cpp:992
+
+  if (auto *s = dyn_cast<SyntheticSection>(sec))
+    length = s->getSize();
----------------
I don't think you need to treat SyntheticSections differently as getSize is a virtual function. 
`length = s->getSize();` should be fine. 


================
Comment at: lld/ELF/Arch/ARM.cpp:998
+  for (auto &msym : mapSyms) {
+    CodeState new_state = DATA;
+    if (isThumbMapSymbol(msym))
----------------
newState (LLVM coding standard)


================
Comment at: lld/ELF/Arch/ARM.cpp:1005
+      new_state = DATA;
+
+    switch (cur_state) {
----------------
I wonder if this could be simplified. For example 
```
    case WORDCODE:
      if (new_state == HALFWORDCODE || new_state == DATA)
```
Is the same as
```
    case WORDCODE:
      if (new_state != WORDCODE)
```

In all cases it looks like 
```
if (new_state == cur_state)
  continue;
```
Then the if statements within the case statements could be skipped as we know we are terminating a contiguous run of cur_state.

cur_state = new_state; and start=msym->value are common to all cases apart from the default case, but I think that as we only record mapping symbols in the map then default shouldn't occur.

With all that I think this could be (using your names and enum):
```
    if (new_state == cur_state)
      continue;

    if (cur_state == WORDCODE)
      BytesexReverseInstructions(buf, start, msym->value, 4);
    else if (cur_state == HALFWORDCODE)
      BytesexReverseInstructions(buf, start, msym->value, 2);

    start = msym->value;
    cur_state = new_state;
  }
  // Passed last mapping symbol, may need to reverse
  // up to end of section
  if (cur_state == WORDCODE)
    BytesexReverseInstructions(buf, start, length, 4);
  if (cur_state == HALFWORDCODE)
    BytesexReverseInstructions(buf, start, length, 2);
}
```
A further shortening could be done by using values 4, 2 and 1 for the enum (although not with an enum class without casting) then you could use that for the length
```
 // Passed last mapping symbol, may need to reverse
  // up to end of section
  if (cur_state != DATA)
    BytesexReverseInstructions(buf, start, length, cur_state);
```


================
Comment at: lld/ELF/Arch/ARM.cpp:1037
+
+  /* Passed last mapping symbol, may need to reverse
+  up to end of section */
----------------
For a couple of sentences I suggest using // rather than /* */


================
Comment at: lld/ELF/Options.td:39
 
+def be8: F<"be8">, HelpText<"Big endian with byte invariant">;
+
----------------
I'd go with something like "Write a Big Endian ELF file using BE8 format (Arm only)".


================
Comment at: lld/ELF/OutputSections.cpp:497
 
+      // When in Arm BE8 mode, the linker has to endian reverses the
+      // instructions, but not data.
----------------
I'd say `When in Arm BE8 mode, the linker has to convert the big-endian instructions to little-endian, leaving the data big-endian.`


================
Comment at: lld/ELF/OutputSections.cpp:499
+      // instructions, but not data.
+      if (config->emachine == EM_ARM && !config->isLE && config->be8)
+        if (flags & SHF_EXECINSTR)
----------------
I've just thought that we probably do not want to endian reverse instructions when doing a relocatable link (the -r option which maps to config->relocatable; this produces a relocatable object file). We will want to double check what GNU ld does to make sure.


================
Comment at: lld/ELF/OutputSections.cpp:501
+        if (flags & SHF_EXECINSTR)
+          endianReverseArmInstructions(isec, buf + isec->outSecOff);
+
----------------
I think this test can be folded into the first if statement.
```
if (config->emachine == EM_ARM && !config->isLE && config->be8 && (flags & SHF_EXECINSTR))
```


================
Comment at: lld/test/ELF/emulation-arm.s:76
+
+## Ensure that the EF_ARM_BE8 flag is not set for be32
+## This will have to be modified on the be8 is implemented
----------------
I think this comment needs updating. This is the test that checks that BE8 has been set.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150870/new/

https://reviews.llvm.org/D150870



More information about the llvm-commits mailing list