[PATCH] D70157: Align branches within 32-Byte boundary

Thu Dec 5 07:42:22 PST 2019

skan added a comment.

In D70157#1769932 <https://reviews.llvm.org/D70157#1769932>, @MaskRay wrote:

> I find another deficiency (infinite loop) with the current approach.
>
> Say, there is a `je 0` (0x0F 0x84 0x00 0x00 0x00 0x00) at byte 0x90. (0x90+6)%32 == 0, so it ends on a 32-byte boundary.
>  MF.getMaxPrefixSize() is 4, so the size of MCMachineDependentFragment may vary from 0 to 4.
>  If there are other MCMachineDependentFragment's in the section, some may shrink while some may expand.
>  In some cases the following loop will not converge
>
>   bool MCAssembler::layoutOnce(MCAsmLayout &Layout) {
>     ++stats::RelaxationSteps;
>  
>     bool WasRelaxed = false;
>     for (iterator it = begin(), ie = end(); it != ie; ++it) {
>       MCSection &Sec = *it;
>       while (layoutSectionOnce(Layout, Sec)) ///
>         WasRelaxed = true;
>     }
>  
>     return WasRelaxed;
>   }
>   
>   // In MCAssembler::layoutSectionOnce,
>     case MCFragment::FT_MachineDependent:
>       RelaxedFrag =
>           relaxMachineDependent(Layout, *cast<MCMachineDependentFragment>(I));
>       break;
>   
>
> To give a concrete example, `clang++ -fsanitize=memory compiler-rt/test/msan/cxa_atexit.cpp -mbranches-within-32B-boundaries` does not converge. You may also try dtor-*.cpp in that directory.
>
> A simple iterative algorithm is not guaranteed to converge. We probably can solve the layout problem with a dynamic programming algorithm:
>
> f[i][j] = the minimum cost that layouts the first i instructions with j extra bytes (via NOPs or prefixes)
> or
>  g[i][j] = the minimum inserted bytes that layouts the first i instructions with cost j
>
> I am not clear which one is better. A simple greedy approach is to set an upper limit on the number of iterations when the section contains at least one MCMachineDependentFragment, i.e.
>
>   bool HasMCMachineDependentFragment = false;
>   int count = 5; // arbitrary. Please find an appropriate value.
>   while (layoutSectionOnce(Layout, Sec, HasMCMachineDependentFragment) && count > 0) {
>     WasRelaxed = true;
>     if (HasMCMachineDependentFragment)
>       count--;
>   }

I guess you originally wanted to say (90 + 6)%32 == 0 instead of (0x90 +6)%32 == 0. With the last patch, the command

`clang++ -fsanitize=memory compiler-rt/test/msan/cxa_atexit.cpp -mbranches-within-32B-boundaries` does not converge, but this is not the fault of this simple iterative algorithm.

Let me briefly introduce this algorithm. Regardless of whether the prefix or nop is filled in, `MCMachineDependentFragment` is used to occupy a certain size of space to align the branch, and it has a pointer to the branch it is responsible for. Multiple MCMachineDependentFragments can work together to align a branch. For example, there are two MCMachineDependentFragment, M1 <https://reviews.llvm.org/M1> and M2 <https://reviews.llvm.org/M2> to align branch J1 <https://reviews.llvm.org/J1>. In the first iteration, J1 <https://reviews.llvm.org/J1> needs 7 bytes of padding, so the size of M1 <https://reviews.llvm.org/M1> will grow to 7 as much as possible. However, M1 <https://reviews.llvm.org/M1> can only grow to 5 or smaller from some reason. At this time, M2 <https://reviews.llvm.org/M2> will grow as much as possible to 2. In the second iteration, when it's M1 <https://reviews.llvm.org/M1>'s turn to relax, we will subtract the size of M1 <https://reviews.llvm.org/M1> and M2 <https://reviews.llvm.org/M2> from the current address of J1 <https://reviews.llvm.org/J1> to get the size that J1 <https://reviews.llvm.org/J1> needs to padding, assuming it is 6, and then M1 <https://reviews.llvm.org/M1> will keep the size to 5 and the size of M2 <https://reviews.llvm.org/M2> will be reduced to 1. It is not difficult to see that among the MCFragment from M1 <https://reviews.llvm.org/M1> to J1 <https://reviews.llvm.org/J1>, as long as the size of MCFragment other than M1 <https://reviews.llvm.org/M1> and M2 <https://reviews.llvm.org/M2> is fixed within a limited number of iterations, the sizes of M1 <https://reviews.llvm.org/M1> and M2 <https://reviews.llvm.org/M2> will be fixed within a limited number of iterations, that is, the iteration will converge.

As far as I know, the `MCFragment` used to store instructions includes `MCDataFragment`, `MCRelaxableFragment` and `MCCompactEncodedFragment`.  `MCDataFragment` and `MCCompactEncodedFragment` have fixed sizes. `MCRelaxableFragment` is used to store instructions of variable size and will only grow and not shrink, as a result, its size will be fixed within a limited number of cycles . So as long as we ensure that there is only MCFragment that stores instructions between the  instruction to be prefixed and the branch to be aligned, this iterative algorithm will converge.

The file cxa_atexit.s  has more than one text section. The instruction to be prefixed and the branch to align may be in two different text sections. I forgot to check if there is a `MCFragment` not storing instruction between the them, which results in non-convergence. This bug has been fixed in the latest patch.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157