[PATCH] D152407: [AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre.

Tue Jul 25 21:04:37 PDT 2023

chaosdefinition added a comment.

In D152407#4531841 <https://reviews.llvm.org/D152407#4531841>, @eaeltsin wrote:

> Heads-up - this caused multiple segmentation faults in our code base. We are working on a reproducer.

In D152407#4533478 <https://reviews.llvm.org/D152407#4533478>, @scw wrote:

> We see a miscompilation after this patch in protobuf under MSan. This <https://github.com/protocolbuffers/protobuf/blob/caf55184b2d0e8cbb99e5b487b453dc8721af4fe/src/google/protobuf/repeated_ptr_field.h#L449> is the line where the miscompilation happens, and I extracted it to this godbolt <https://godbolt.org/z/hx47zbe64> with compilation option `-fsanitize=memory -O`.
>
> Before this patch, the AArch64 assembly is (registers are renamed a bit, because this is disassembled from a binary I built internally)
>
>   1bfbbc64: b8808d49      ldrsw   x9, [x10, #8]!
>   1bfbbc68: ca0d014e      eor     x14, x10, x13
>   1bfbbc6c: b8404d4c      ldr     w12, [x10, #4]!
>   1bfbbc70: ca0d014d      eor     x13, x10, x13
>
> After, it became
>
>   1bfbb51c: 29c13149      ldp     w9, w12, [x10, #8]!
>   1bfbb520: 93407d4a      sxtw    x10, w10
>   1bfbb524: ca0d014e      eor     x14, x10, x13
>   1bfbb528: ca0d014d      eor     x13, x10, x13
>
> There are two problems:
>
> 1. The after sequence touches `x10`, before calculating `x13` and `x14`, which are supposed to be the addresses of the shadow memory. This caused the crash because our `x10` was pointing to stack where the 32-th bit is 1. I believe that line should have been `sxtw x9, w9` instead, to perform the extension done in `ldrsw` in the before sequence.
> 2. This also miscalculates `x13`, where in the before sequence, after execution, it holds `old(x13) xor (x10 + 12)`, yet after the "after" sequence, it holds `old(x13) xor (x10 + 8)`, the same as `x14`.

Thanks for reporting and reproducing the miscompilation!  It looks like two pre-indexed loads `LDRSWpre` and `LDRWpre` got merged into a `LDPWpre` and `SBFMXri`, which is an unintended result by this patch.  I think we can either rollback this patch or apply a quick fix that disallows merging two pre-indexed loads like the following.  Any thoughts?

  diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
  index 419b471db3a3..e64b1d0658f0 100644
  --- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
  +++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
  @@ -1317,20 +1317,24 @@ static bool areCandidatesToMergeOrPair(MachineInstr &FirstMI, MachineInstr &MI,
                                     MI.getFlag(MachineInstr::FrameDestroy)))
       return false;

     unsigned OpcA = FirstMI.getOpcode();
     unsigned OpcB = MI.getOpcode();

     // Opcodes match: If the opcodes are pre ld/st there is nothing more to check.
     if (OpcA == OpcB)
       return !AArch64InstrInfo::isPreLdSt(FirstMI);

  +  // Two pre ld/st of different opcodes cannot be merged either
  +  if (AArch64InstrInfo::isPreLdSt(FirstMI) && AArch64InstrInfo::isPreLdSt(MI))
  +    return false;
  +
     // Try to match a sign-extended load/store with a zero-extended load/store.
     bool IsValidLdStrOpc, PairIsValidLdStrOpc;
     unsigned NonSExtOpc = getMatchingNonSExtOpcode(OpcA, &IsValidLdStrOpc);
     assert(IsValidLdStrOpc &&
            "Given Opc should be a Load or Store with an immediate");
     // OpcA will be the first instruction in the pair.
     if (NonSExtOpc == getMatchingNonSExtOpcode(OpcB, &PairIsValidLdStrOpc)) {
       Flags.setSExtIdx(NonSExtOpc == (unsigned)OpcA ? 1 : 0);
       return true;
     }

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152407/new/

https://reviews.llvm.org/D152407