[llvm] r369664 - [MBP] Disable aggressive loop rotate in plain mode

Carrot Wei via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 27 11:51:13 PDT 2019


Thanks for reporting, the failure is reproduced.

On Mon, Aug 26, 2019 at 3:10 PM Erik Pilkington
<erik.pilkington at gmail.com> wrote:
>
> Hi, this commit causes a CI failure on Darwin with -verify-machineinstrs: http://lab.llvm.org:8080/green/job/test-suite-verify-machineinstrs-x86_64-O3/
>
> I attached a reduction, if you run `/path/to/llc -O3 --verify-machineinstrs t.ll` it fails to verify.
>
> Can you please take a look at this?
>
> Thanks!
> Erik
>
> On Aug 22, 2019, at 9:21 AM, Guozhi Wei via llvm-commits <llvm-commits at lists.llvm.org> wrote:
>
> Author: carrot
> Date: Thu Aug 22 09:21:32 2019
> New Revision: 369664
>
> URL: http://llvm.org/viewvc/llvm-project?rev=369664&view=rev
> Log:
> [MBP] Disable aggressive loop rotate in plain mode
>
> Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
>
> To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
>
> Differential Revision: https://reviews.llvm.org/D65673
>
>
> Removed:
>    llvm/trunk/test/CodeGen/X86/loop-rotate.ll
> Modified:
>    llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
>    llvm/trunk/test/CodeGen/AArch64/cmpxchg-idioms.ll
>    llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll
>    llvm/trunk/test/CodeGen/AMDGPU/collapse-endcf.ll
>    llvm/trunk/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll
>    llvm/trunk/test/CodeGen/AMDGPU/global_smrd_cfg.ll
>    llvm/trunk/test/CodeGen/AMDGPU/i1-copy-from-loop.ll
>    llvm/trunk/test/CodeGen/AMDGPU/indirect-addressing-si.ll
>    llvm/trunk/test/CodeGen/AMDGPU/loop_exit_with_xor.ll
>    llvm/trunk/test/CodeGen/AMDGPU/multilevel-break.ll
>    llvm/trunk/test/CodeGen/AMDGPU/optimize-negated-cond.ll
>    llvm/trunk/test/CodeGen/AMDGPU/si-annotate-cf.ll
>    llvm/trunk/test/CodeGen/AMDGPU/wave32.ll
>    llvm/trunk/test/CodeGen/AMDGPU/wqm.ll
>    llvm/trunk/test/CodeGen/ARM/2011-03-23-PeepholeBug.ll
>    llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll
>    llvm/trunk/test/CodeGen/ARM/atomic-cmp.ll
>    llvm/trunk/test/CodeGen/ARM/atomic-cmpxchg.ll
>    llvm/trunk/test/CodeGen/ARM/code-placement.ll
>    llvm/trunk/test/CodeGen/ARM/pr32578.ll
>    llvm/trunk/test/CodeGen/Hexagon/bug6757-endloop.ll
>    llvm/trunk/test/CodeGen/Hexagon/early-if-merge-loop.ll
>    llvm/trunk/test/CodeGen/Hexagon/prof-early-if.ll
>    llvm/trunk/test/CodeGen/Hexagon/redundant-branching2.ll
>    llvm/trunk/test/CodeGen/PowerPC/atomics-regression.ll
>    llvm/trunk/test/CodeGen/PowerPC/block-placement-1.mir
>    llvm/trunk/test/CodeGen/PowerPC/cmp_elimination.ll
>    llvm/trunk/test/CodeGen/PowerPC/licm-remat.ll
>    llvm/trunk/test/CodeGen/PowerPC/machine-pre.ll
>    llvm/trunk/test/CodeGen/RISCV/atomic-rmw.ll
>    llvm/trunk/test/CodeGen/RISCV/remat.ll
>    llvm/trunk/test/CodeGen/Thumb/consthoist-physical-addr.ll
>    llvm/trunk/test/CodeGen/Thumb/pr42760.ll
>    llvm/trunk/test/CodeGen/X86/block-placement.ll
>    llvm/trunk/test/CodeGen/X86/code_placement.ll
>    llvm/trunk/test/CodeGen/X86/code_placement_ignore_succ_in_inner_loop.ll
>    llvm/trunk/test/CodeGen/X86/code_placement_no_header_change.ll
>    llvm/trunk/test/CodeGen/X86/conditional-tailcall.ll
>    llvm/trunk/test/CodeGen/X86/loop-blocks.ll
>    llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll
>    llvm/trunk/test/CodeGen/X86/move_latch_to_loop_top.ll
>    llvm/trunk/test/CodeGen/X86/pr38185.ll
>    llvm/trunk/test/CodeGen/X86/ragreedy-hoist-spill.ll
>    llvm/trunk/test/CodeGen/X86/reverse_branches.ll
>    llvm/trunk/test/CodeGen/X86/speculative-load-hardening.ll
>    llvm/trunk/test/CodeGen/X86/tail-dup-merge-loop-headers.ll
>    llvm/trunk/test/CodeGen/X86/tail-dup-repeat.ll
>    llvm/trunk/test/CodeGen/X86/vector-shift-by-select-loop.ll
>    llvm/trunk/test/CodeGen/X86/widen_arith-1.ll
>    llvm/trunk/test/CodeGen/X86/widen_arith-2.ll
>    llvm/trunk/test/CodeGen/X86/widen_arith-3.ll
>    llvm/trunk/test/CodeGen/X86/widen_arith-4.ll
>    llvm/trunk/test/CodeGen/X86/widen_arith-5.ll
>    llvm/trunk/test/CodeGen/X86/widen_arith-6.ll
>    llvm/trunk/test/CodeGen/X86/widen_cast-4.ll
>    llvm/trunk/test/DebugInfo/X86/PR37234.ll
>    llvm/trunk/test/DebugInfo/X86/dbg-value-transfer-order.ll
>
> Modified: llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp (original)
> +++ llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp Thu Aug 22 09:21:32 2019
> @@ -462,17 +462,20 @@ class MachineBlockPlacement : public Mac
>                                   const MachineBasicBlock *ExitBB,
>                                   const BlockFilterSet &LoopBlockSet);
>   MachineBasicBlock *findBestLoopTopHelper(MachineBasicBlock *OldTop,
> -      const MachineLoop &L, const BlockFilterSet &LoopBlockSet);
> +                                           const MachineLoop &L,
> +                                           const BlockFilterSet &LoopBlockSet,
> +                                           bool HasStaticProfileOnly = false);
>   MachineBasicBlock *findBestLoopTop(
>       const MachineLoop &L, const BlockFilterSet &LoopBlockSet);
> +  MachineBasicBlock *findBestLoopTopNoProfile(
> +      const MachineLoop &L, const BlockFilterSet &LoopBlockSet);
>   MachineBasicBlock *findBestLoopExit(
> -      const MachineLoop &L, const BlockFilterSet &LoopBlockSet,
> -      BlockFrequency &ExitFreq);
> +      const MachineLoop &L, const BlockFilterSet &LoopBlockSet);
>   BlockFilterSet collectLoopBlockSet(const MachineLoop &L);
>   void buildLoopChains(const MachineLoop &L);
>   void rotateLoop(
>       BlockChain &LoopChain, const MachineBasicBlock *ExitingBB,
> -      BlockFrequency ExitFreq, const BlockFilterSet &LoopBlockSet);
> +      const BlockFilterSet &LoopBlockSet);
>   void rotateLoopWithProfile(
>       BlockChain &LoopChain, const MachineLoop &L,
>       const BlockFilterSet &LoopBlockSet);
> @@ -1947,11 +1950,14 @@ MachineBlockPlacement::FallThroughGains(
> ///        At the same time, move it before old top increases the taken branch
> ///        to loop exit block, so the reduced taken branch will be compared with
> ///        the increased taken branch to the loop exit block.
> +///
> +///        This pattern is enabled only when HasStaticProfileOnly is false.
> MachineBasicBlock *
> MachineBlockPlacement::findBestLoopTopHelper(
>     MachineBasicBlock *OldTop,
>     const MachineLoop &L,
> -    const BlockFilterSet &LoopBlockSet) {
> +    const BlockFilterSet &LoopBlockSet,
> +    bool HasStaticProfileOnly) {
>   // Check that the header hasn't been fused with a preheader block due to
>   // crazy branches. If it has, we need to start with the header at the top to
>   // prevent pulling the preheader into the loop body.
> @@ -1975,22 +1981,38 @@ MachineBlockPlacement::findBestLoopTopHe
>     if (Pred->succ_size() > 2)
>       continue;
>
> -    MachineBasicBlock *OtherBB = nullptr;
> -    if (Pred->succ_size() == 2) {
> -      OtherBB = *Pred->succ_begin();
> -      if (OtherBB == OldTop)
> -        OtherBB = *Pred->succ_rbegin();
> -    }
> -
>     if (!canMoveBottomBlockToTop(Pred, OldTop))
>       continue;
>
> -    BlockFrequency Gains = FallThroughGains(Pred, OldTop, OtherBB,
> -                                            LoopBlockSet);
> -    if ((Gains > 0) && (Gains > BestGains ||
> -        ((Gains == BestGains) && Pred->isLayoutSuccessor(OldTop)))) {
> -      BestPred = Pred;
> -      BestGains = Gains;
> +    if (HasStaticProfileOnly) {
> +      // In plain mode we consider pattern 1 only.
> +      if (Pred->succ_size() > 1)
> +        continue;
> +
> +      BlockFrequency PredFreq = MBFI->getBlockFreq(Pred);
> +      if (!BestPred || PredFreq > BestGains ||
> +          (!(PredFreq < BestGains) &&
> +           Pred->isLayoutSuccessor(OldTop))) {
> +        BestPred = Pred;
> +        BestGains = PredFreq;
> +      }
> +    } else {
> +      // With profile information we also consider pattern 2.
> +      MachineBasicBlock *OtherBB = nullptr;
> +      if (Pred->succ_size() == 2) {
> +        OtherBB = *Pred->succ_begin();
> +        if (OtherBB == OldTop)
> +          OtherBB = *Pred->succ_rbegin();
> +      }
> +
> +      // And more sophisticated cost model.
> +      BlockFrequency Gains = FallThroughGains(Pred, OldTop, OtherBB,
> +                                              LoopBlockSet);
> +      if ((Gains > 0) && (Gains > BestGains ||
> +          ((Gains == BestGains) && Pred->isLayoutSuccessor(OldTop)))) {
> +        BestPred = Pred;
> +        BestGains = Gains;
> +      }
>     }
>   }
>
> @@ -2010,7 +2032,7 @@ MachineBlockPlacement::findBestLoopTopHe
>   return BestPred;
> }
>
> -/// Find the best loop top block for layout.
> +/// Find the best loop top block for layout in FDO mode.
> ///
> /// This function iteratively calls findBestLoopTopHelper, until no new better
> /// BB can be found.
> @@ -2038,6 +2060,34 @@ MachineBlockPlacement::findBestLoopTop(c
>   return NewTop;
> }
>
> +/// Find the best loop top block for layout in plain mode. It is less agressive
> +/// than findBestLoopTop.
> +///
> +/// Look for a block which is strictly better than the loop header for laying
> +/// out at the top of the loop. This looks for one and only one pattern:
> +/// a latch block with no conditional exit. This block will cause a conditional
> +/// jump around it or will be the bottom of the loop if we lay it out in place,
> +/// but if it doesn't end up at the bottom of the loop for any reason,
> +/// rotation alone won't fix it. Because such a block will always result in an
> +/// unconditional jump (for the backedge) rotating it in front of the loop
> +/// header is always profitable.
> +MachineBasicBlock *
> +MachineBlockPlacement::findBestLoopTopNoProfile(
> +    const MachineLoop &L,
> +    const BlockFilterSet &LoopBlockSet) {
> +  // Placing the latch block before the header may introduce an extra branch
> +  // that skips this block the first time the loop is executed, which we want
> +  // to avoid when optimising for size.
> +  // FIXME: in theory there is a case that does not introduce a new branch,
> +  // i.e. when the layout predecessor does not fallthrough to the loop header.
> +  // In practice this never happens though: there always seems to be a preheader
> +  // that can fallthrough and that is also placed before the header.
> +  if (F->getFunction().hasOptSize())
> +    return L.getHeader();
> +
> +  return findBestLoopTopHelper(L.getHeader(), L, LoopBlockSet, true);
> +}
> +
> /// Find the best loop exiting block for layout.
> ///
> /// This routine implements the logic to analyze the loop looking for the best
> @@ -2045,8 +2095,7 @@ MachineBlockPlacement::findBestLoopTop(c
> /// fallthrough opportunities.
> MachineBasicBlock *
> MachineBlockPlacement::findBestLoopExit(const MachineLoop &L,
> -                                        const BlockFilterSet &LoopBlockSet,
> -                                        BlockFrequency &ExitFreq) {
> +                                        const BlockFilterSet &LoopBlockSet) {
>   // We don't want to layout the loop linearly in all cases. If the loop header
>   // is just a normal basic block in the loop, we want to look for what block
>   // within the loop is the best one to layout at the top. However, if the loop
> @@ -2157,7 +2206,6 @@ MachineBlockPlacement::findBestLoopExit(
>
>   LLVM_DEBUG(dbgs() << "  Best exiting block: " << getBlockName(ExitingBB)
>                     << "\n");
> -  ExitFreq = BestExitEdgeFreq;
>   return ExitingBB;
> }
>
> @@ -2202,7 +2250,6 @@ MachineBlockPlacement::hasViableTopFallt
> /// of its bottom already, don't rotate it.
> void MachineBlockPlacement::rotateLoop(BlockChain &LoopChain,
>                                        const MachineBasicBlock *ExitingBB,
> -                                       BlockFrequency ExitFreq,
>                                        const BlockFilterSet &LoopBlockSet) {
>   if (!ExitingBB)
>     return;
> @@ -2226,12 +2273,6 @@ void MachineBlockPlacement::rotateLoop(B
>           (!SuccChain || Succ == *SuccChain->begin()))
>         return;
>     }
> -
> -    // Rotate will destroy the top fallthrough, we need to ensure the new exit
> -    // frequency is larger than top fallthrough.
> -    BlockFrequency FallThrough2Top = TopFallThroughFreq(Top, LoopBlockSet);
> -    if (FallThrough2Top >= ExitFreq)
> -      return;
>   }
>
>   BlockChain::iterator ExitIt = llvm::find(LoopChain, ExitingBB);
> @@ -2483,7 +2524,10 @@ void MachineBlockPlacement::buildLoopCha
>   // loop. This will default to the header, but may end up as one of the
>   // predecessors to the header if there is one which will result in strictly
>   // fewer branches in the loop body.
> -  MachineBasicBlock *LoopTop = findBestLoopTop(L, LoopBlockSet);
> +  MachineBasicBlock *LoopTop =
> +      (RotateLoopWithProfile || F->getFunction().hasProfileData()) ?
> +          findBestLoopTop(L, LoopBlockSet) :
> +          findBestLoopTopNoProfile(L, LoopBlockSet);
>
>   // If we selected just the header for the loop top, look for a potentially
>   // profitable exit block in the event that rotating the loop can eliminate
> @@ -2492,9 +2536,8 @@ void MachineBlockPlacement::buildLoopCha
>   // Loops are processed innermost to uttermost, make sure we clear
>   // PreferredLoopExit before processing a new loop.
>   PreferredLoopExit = nullptr;
> -  BlockFrequency ExitFreq;
>   if (!RotateLoopWithProfile && LoopTop == L.getHeader())
> -    PreferredLoopExit = findBestLoopExit(L, LoopBlockSet, ExitFreq);
> +    PreferredLoopExit = findBestLoopExit(L, LoopBlockSet);
>
>   BlockChain &LoopChain = *BlockToChain[LoopTop];
>
> @@ -2511,10 +2554,11 @@ void MachineBlockPlacement::buildLoopCha
>
>   buildChain(LoopTop, LoopChain, &LoopBlockSet);
>
> -  if (RotateLoopWithProfile)
> -    rotateLoopWithProfile(LoopChain, L, LoopBlockSet);
> -  else
> -    rotateLoop(LoopChain, PreferredLoopExit, ExitFreq, LoopBlockSet);
> +  if (RotateLoopWithProfile) {
> +    if (LoopTop == L.getHeader())
> +      rotateLoopWithProfile(LoopChain, L, LoopBlockSet);
> +  } else
> +    rotateLoop(LoopChain, PreferredLoopExit, LoopBlockSet);
>
>   LLVM_DEBUG({
>     // Crash at the end so we get all of the debugging output first.
>
> Modified: llvm/trunk/test/CodeGen/AArch64/cmpxchg-idioms.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/cmpxchg-idioms.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AArch64/cmpxchg-idioms.ll (original)
> +++ llvm/trunk/test/CodeGen/AArch64/cmpxchg-idioms.ll Thu Aug 22 09:21:32 2019
> @@ -111,7 +111,7 @@ define i1 @test_conditional2(i32 %a, i32
> ; CHECK: mov w22, #2
> ; CHECK-NOT: mov w22, #4
> ; CHECK-NOT: cmn w22, #4
> -; CHECK: [[LOOP2:LBB[0-9]+_[0-9]+]]: ; %for.cond
> +; CHECK: b [[LOOP2:LBB[0-9]+_[0-9]+]]
> ; CHECK-NOT: b.ne [[LOOP2]]
> ; CHECK-NOT: b {{LBB[0-9]+_[0-9]+}}
> ; CHECK: bl _foo
>
> Modified: llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll (original)
> +++ llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll Thu Aug 22 09:21:32 2019
> @@ -1,8 +1,9 @@
> ; RUN: llc <%s -mtriple=aarch64-eabi -verify-machine-dom-info | FileCheck %s
>
> ; CHECK-LABEL: test:
> -; CHECK-LABEL: %cond.false12.i
> -; CHECK:         b.gt
> +; CHECK:       LBB0_7:
> +; CHECK:         b.hi
> +; CHECK-NEXT:    b
> ; CHECK-NEXT:  LBB0_8:
> ; CHECK-NEXT:    mov x8, x9
> ; CHECK-NEXT:  LBB0_9:
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/collapse-endcf.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/collapse-endcf.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/collapse-endcf.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/collapse-endcf.ll Thu Aug 22 09:21:32 2019
> @@ -230,11 +230,6 @@ bb.end:
> ; Make sure scc liveness is updated if sor_b64 is removed
> ; ALL-LABEL: {{^}}scc_liveness:
>
> -; GCN: %bb10
> -; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}
> -; GCN: s_andn2_b64
> -; GCN-NEXT: s_cbranch_execz
> -
> ; GCN: [[BB1_LOOP:BB[0-9]+_[0-9]+]]:
> ; GCN: s_andn2_b64 exec, exec,
> ; GCN-NEXT: s_cbranch_execnz [[BB1_LOOP]]
> @@ -245,6 +240,10 @@ bb.end:
> ; GCN-NOT: s_or_b64 exec, exec
>
> ; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}
> +; GCN: s_andn2_b64
> +; GCN-NEXT: s_cbranch_execnz
> +
> +; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}
> ; GCN: buffer_store_dword
> ; GCN: buffer_store_dword
> ; GCN: buffer_store_dword
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll Thu Aug 22 09:21:32 2019
> @@ -20,41 +20,38 @@ define amdgpu_ps void @main(i32, float)
> ; CHECK-NEXT:    ; implicit-def: $sgpr8_sgpr9
> ; CHECK-NEXT:    ; implicit-def: $sgpr6_sgpr7
> ; CHECK-NEXT:    ; implicit-def: $sgpr2_sgpr3
> -; CHECK-NEXT:    s_branch BB0_3
> -; CHECK-NEXT:  BB0_1: ; %Flow1
> -; CHECK-NEXT:    ; in Loop: Header=BB0_3 Depth=1
> -; CHECK-NEXT:    s_or_b64 exec, exec, s[8:9]
> -; CHECK-NEXT:    s_mov_b64 s[8:9], 0
> -; CHECK-NEXT:  BB0_2: ; %Flow
> -; CHECK-NEXT:    ; in Loop: Header=BB0_3 Depth=1
> -; CHECK-NEXT:    s_and_b64 s[10:11], exec, s[6:7]
> -; CHECK-NEXT:    s_or_b64 s[10:11], s[10:11], s[4:5]
> -; CHECK-NEXT:    s_andn2_b64 s[2:3], s[2:3], exec
> -; CHECK-NEXT:    s_and_b64 s[4:5], s[8:9], exec
> -; CHECK-NEXT:    s_or_b64 s[2:3], s[2:3], s[4:5]
> -; CHECK-NEXT:    s_mov_b64 s[4:5], s[10:11]
> -; CHECK-NEXT:    s_andn2_b64 exec, exec, s[10:11]
> -; CHECK-NEXT:    s_cbranch_execz BB0_6
> -; CHECK-NEXT:  BB0_3: ; %loop
> +; CHECK-NEXT:  BB0_1: ; %loop
> ; CHECK-NEXT:    ; =>This Inner Loop Header: Depth=1
> ; CHECK-NEXT:    v_cmp_gt_u32_e32 vcc, 32, v1
> ; CHECK-NEXT:    s_and_b64 vcc, exec, vcc
> ; CHECK-NEXT:    s_or_b64 s[6:7], s[6:7], exec
> ; CHECK-NEXT:    s_or_b64 s[8:9], s[8:9], exec
> -; CHECK-NEXT:    s_cbranch_vccz BB0_2
> -; CHECK-NEXT:  ; %bb.4: ; %endif1
> -; CHECK-NEXT:    ; in Loop: Header=BB0_3 Depth=1
> +; CHECK-NEXT:    s_cbranch_vccz BB0_5
> +; CHECK-NEXT:  ; %bb.2: ; %endif1
> +; CHECK-NEXT:    ; in Loop: Header=BB0_1 Depth=1
> ; CHECK-NEXT:    s_mov_b64 s[6:7], -1
> ; CHECK-NEXT:    s_and_saveexec_b64 s[8:9], s[0:1]
> ; CHECK-NEXT:    s_xor_b64 s[8:9], exec, s[8:9]
> -; CHECK-NEXT:    ; mask branch BB0_1
> -; CHECK-NEXT:    s_cbranch_execz BB0_1
> -; CHECK-NEXT:  BB0_5: ; %endif2
> -; CHECK-NEXT:    ; in Loop: Header=BB0_3 Depth=1
> +; CHECK-NEXT:    ; mask branch BB0_4
> +; CHECK-NEXT:  BB0_3: ; %endif2
> +; CHECK-NEXT:    ; in Loop: Header=BB0_1 Depth=1
> ; CHECK-NEXT:    v_add_u32_e32 v1, 1, v1
> ; CHECK-NEXT:    s_xor_b64 s[6:7], exec, -1
> -; CHECK-NEXT:    s_branch BB0_1
> -; CHECK-NEXT:  BB0_6: ; %Flow2
> +; CHECK-NEXT:  BB0_4: ; %Flow1
> +; CHECK-NEXT:    ; in Loop: Header=BB0_1 Depth=1
> +; CHECK-NEXT:    s_or_b64 exec, exec, s[8:9]
> +; CHECK-NEXT:    s_mov_b64 s[8:9], 0
> +; CHECK-NEXT:  BB0_5: ; %Flow
> +; CHECK-NEXT:    ; in Loop: Header=BB0_1 Depth=1
> +; CHECK-NEXT:    s_and_b64 s[10:11], exec, s[6:7]
> +; CHECK-NEXT:    s_or_b64 s[10:11], s[10:11], s[4:5]
> +; CHECK-NEXT:    s_andn2_b64 s[2:3], s[2:3], exec
> +; CHECK-NEXT:    s_and_b64 s[4:5], s[8:9], exec
> +; CHECK-NEXT:    s_or_b64 s[2:3], s[2:3], s[4:5]
> +; CHECK-NEXT:    s_mov_b64 s[4:5], s[10:11]
> +; CHECK-NEXT:    s_andn2_b64 exec, exec, s[10:11]
> +; CHECK-NEXT:    s_cbranch_execnz BB0_1
> +; CHECK-NEXT:  ; %bb.6: ; %Flow2
> ; CHECK-NEXT:    s_or_b64 exec, exec, s[10:11]
> ; CHECK-NEXT:    v_mov_b32_e32 v1, 0
> ; CHECK-NEXT:    s_and_saveexec_b64 s[0:1], s[2:3]
> @@ -65,7 +62,6 @@ define amdgpu_ps void @main(i32, float)
> ; CHECK-NEXT:    s_or_b64 exec, exec, s[0:1]
> ; CHECK-NEXT:    exp mrt0 v1, v1, v1, v1 done vm
> ; CHECK-NEXT:    s_endpgm
> -; this is the divergent branch with the condition not marked as divergent
> start:
>   %v0 = call float @llvm.amdgcn.interp.p1(float %1, i32 0, i32 0, i32 %0)
>   br label %loop
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/global_smrd_cfg.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/global_smrd_cfg.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/global_smrd_cfg.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/global_smrd_cfg.ll Thu Aug 22 09:21:32 2019
> @@ -1,28 +1,27 @@
> ; RUN: llc -mtriple amdgcn--amdhsa -mcpu=fiji -amdgpu-scalarize-global-loads=true -verify-machineinstrs  < %s | FileCheck %s
>
> -; CHECK-LABEL: %bb22
> +; CHECK-LABEL: %bb11
>
> -; Load from %arg has alias store in Loop
> +; Load from %arg in a Loop body has alias store
>
> ; CHECK: flat_load_dword
>
> -; #####################################################################
> -
> -; Load from %arg1 has no-alias store in Loop - arg1[i+1] never alias arg1[i]
> -
> -; CHECK: s_load_dword
> +; CHECK-LABEL: %bb20
> +; CHECK: flat_store_dword
>
> ; #####################################################################
>
> -; CHECK-LABEL: %bb11
> +; CHECK-LABEL: %bb22
>
> -; Load from %arg in a Loop body has alias store
> +; Load from %arg has alias store in Loop
>
> ; CHECK: flat_load_dword
>
> -; CHECK-LABEL: %bb20
> +; #####################################################################
>
> -; CHECK: flat_store_dword
> +; Load from %arg1 has no-alias store in Loop - arg1[i+1] never alias arg1[i]
> +
> +; CHECK: s_load_dword
>
> define amdgpu_kernel void @cfg(i32 addrspace(1)* nocapture readonly %arg, i32 addrspace(1)* nocapture %arg1, i32 %arg2) #0 {
> bb:
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/i1-copy-from-loop.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/i1-copy-from-loop.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/i1-copy-from-loop.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/i1-copy-from-loop.ll Thu Aug 22 09:21:32 2019
> @@ -3,20 +3,20 @@
>
> ; SI-LABEL: {{^}}i1_copy_from_loop:
> ;
> -; SI: ; %Flow
> -; SI-DAG:  s_andn2_b64       [[LCSSA_ACCUM:s\[[0-9]+:[0-9]+\]]], [[LCSSA_ACCUM]], exec
> -; SI-DAG:  s_and_b64         [[CC_MASK2:s\[[0-9]+:[0-9]+\]]], [[CC_ACCUM:s\[[0-9]+:[0-9]+\]]], exec
> -; SI:      s_or_b64          [[LCSSA_ACCUM]], [[LCSSA_ACCUM]], [[CC_MASK2]]
> -
> ; SI: ; %for.body
> ; SI:      v_cmp_gt_u32_e64  [[CC_SREG:s\[[0-9]+:[0-9]+\]]], 4,
> -; SI-DAG:  s_andn2_b64       [[CC_ACCUM]], [[CC_ACCUM]], exec
> +; SI-DAG:  s_andn2_b64       [[CC_ACCUM:s\[[0-9]+:[0-9]+\]]], [[CC_ACCUM]], exec
> ; SI-DAG:  s_and_b64         [[CC_MASK:s\[[0-9]+:[0-9]+\]]], [[CC_SREG]], exec
> ; SI:      s_or_b64          [[CC_ACCUM]], [[CC_ACCUM]], [[CC_MASK]]
>
> ; SI: ; %Flow1
> ; SI:      s_or_b64          [[CC_ACCUM]], [[CC_ACCUM]], exec
>
> +; SI: ; %Flow
> +; SI-DAG:  s_andn2_b64       [[LCSSA_ACCUM:s\[[0-9]+:[0-9]+\]]], [[LCSSA_ACCUM]], exec
> +; SI-DAG:  s_and_b64         [[CC_MASK2:s\[[0-9]+:[0-9]+\]]], [[CC_ACCUM]], exec
> +; SI:      s_or_b64          [[LCSSA_ACCUM]], [[LCSSA_ACCUM]], [[CC_MASK2]]
> +
> ; SI: ; %for.end
> ; SI:      s_and_saveexec_b64 {{s\[[0-9]+:[0-9]+\]}}, [[LCSSA_ACCUM]]
>
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/indirect-addressing-si.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/indirect-addressing-si.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/indirect-addressing-si.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/indirect-addressing-si.ll Thu Aug 22 09:21:32 2019
> @@ -630,7 +630,12 @@ define amdgpu_kernel void @insertelement
> ; GCN-LABEL: {{^}}broken_phi_bb:
> ; GCN: v_mov_b32_e32 [[PHIREG:v[0-9]+]], 8
>
> -; GCN: [[BB2:BB[0-9]+_[0-9]+]]:
> +; GCN: s_branch [[BB2:BB[0-9]+_[0-9]+]]
> +
> +; GCN: {{^BB[0-9]+_[0-9]+}}:
> +; GCN: s_mov_b64 exec,
> +
> +; GCN: [[BB2]]:
> ; GCN: v_cmp_le_i32_e32 vcc, s{{[0-9]+}}, [[PHIREG]]
> ; GCN: buffer_load_dword
>
> @@ -642,11 +647,6 @@ define amdgpu_kernel void @insertelement
> ; IDXMODE: s_set_gpr_idx_off
>
> ; GCN: s_cbranch_execnz [[REGLOOP]]
> -
> -; GCN: {{^; %bb.[0-9]}}:
> -; GCN: s_mov_b64 exec,
> -; GCN: s_branch [[BB2]]
> -
> define amdgpu_kernel void @broken_phi_bb(i32 %arg, i32 %arg1) #0 {
> bb:
>   br label %bb2
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/loop_exit_with_xor.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/loop_exit_with_xor.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/loop_exit_with_xor.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/loop_exit_with_xor.ll Thu Aug 22 09:21:32 2019
> @@ -61,9 +61,9 @@ loopexit:
>
> ; GCN-LABEL: {{^}}break_cond_is_arg:
> ; GCN: s_xor_b64 [[REG1:[^ ,]*]], {{[^ ,]*, -1$}}
> -; GCN: s_andn2_b64 exec, exec, [[REG3:[^ ,]*]]
> ; GCN: s_and_b64 [[REG2:[^ ,]*]], exec, [[REG1]]
> -; GCN: s_or_b64 [[REG3]], [[REG2]],
> +; GCN: s_or_b64 [[REG3:[^ ,]*]], [[REG2]],
> +; GCN: s_andn2_b64 exec, exec, [[REG3]]
>
> define void @break_cond_is_arg(i32 %arg, i1 %breakcond) {
> entry:
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/multilevel-break.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/multilevel-break.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/multilevel-break.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/multilevel-break.ll Thu Aug 22 09:21:32 2019
> @@ -24,29 +24,13 @@
> ; GCN: ; %main_body
> ; GCN:      s_mov_b64           [[LEFT_OUTER:s\[[0-9]+:[0-9]+\]]], 0{{$}}
>
> -; GCN: [[FLOW2:BB[0-9]+_[0-9]+]]: ; %Flow2
> -; GCN:      s_or_b64            exec, exec, [[TMP0:s\[[0-9]+:[0-9]+\]]]
> -; GCN:      s_and_b64           [[TMP1:s\[[0-9]+:[0-9]+\]]], exec, [[BREAK_OUTER:s\[[0-9]+:[0-9]+\]]]
> -; GCN:      s_or_b64            [[TMP1]], [[TMP1]], [[LEFT_OUTER]]
> -; GCN:      s_mov_b64           [[LEFT_OUTER]], [[TMP1]]
> -; GCN:      s_andn2_b64         exec, exec, [[TMP1]]
> -; GCN:      s_cbranch_execz    [[IF_BLOCK:BB[0-9]+_[0-9]+]]
> -
> ; GCN: [[OUTER_LOOP:BB[0-9]+_[0-9]+]]: ; %LOOP.outer{{$}}
> ; GCN:      s_mov_b64           [[LEFT_INNER:s\[[0-9]+:[0-9]+\]]], 0{{$}}
>
> -; GCN: ; %Flow
> -; GCN:      s_or_b64            exec, exec, [[SAVE_EXEC:s\[[0-9]+:[0-9]+\]]]
> -; GCN:      s_and_b64           [[TMP0]], exec, [[BREAK_INNER:s\[[0-9]+:[0-9]+\]]]
> -; GCN:      s_or_b64            [[TMP0]], [[TMP0]], [[LEFT_INNER]]
> -; GCN:      s_mov_b64           [[LEFT_INNER]], [[TMP0]]
> -; GCN:      s_andn2_b64         exec, exec, [[TMP0]]
> -; GCN:      s_cbranch_execz    [[FLOW2]]
> -
> ; GCN: [[INNER_LOOP:BB[0-9]+_[0-9]+]]: ; %LOOP{{$}}
> -; GCN:      s_or_b64            [[BREAK_OUTER]], [[BREAK_OUTER]], exec
> -; GCN:      s_or_b64            [[BREAK_INNER]], [[BREAK_INNER]], exec
> -; GCN:      s_and_saveexec_b64  [[SAVE_EXEC]], vcc
> +; GCN:      s_or_b64            [[BREAK_OUTER:s\[[0-9]+:[0-9]+\]]], [[BREAK_OUTER]], exec
> +; GCN:      s_or_b64            [[BREAK_INNER:s\[[0-9]+:[0-9]+\]]], [[BREAK_INNER]], exec
> +; GCN:      s_and_saveexec_b64  [[SAVE_EXEC:s\[[0-9]+:[0-9]+\]]], vcc
>
> ; FIXME: duplicate comparison
> ; GCN: ; %ENDIF
> @@ -59,7 +43,23 @@
> ; GCN-DAG:  s_or_b64            [[BREAK_OUTER]], [[BREAK_OUTER]], [[TMP_EQ]]
> ; GCN-DAG:  s_or_b64            [[BREAK_INNER]], [[BREAK_INNER]], [[TMP_NE]]
>
> -; GCN: [[IF_BLOCK]]: ; %IF
> +; GCN: ; %Flow
> +; GCN:      s_or_b64            exec, exec, [[SAVE_EXEC]]
> +; GCN:      s_and_b64           [[TMP0:s\[[0-9]+:[0-9]+\]]], exec, [[BREAK_INNER]]
> +; GCN:      s_or_b64            [[TMP0]], [[TMP0]], [[LEFT_INNER]]
> +; GCN:      s_mov_b64           [[LEFT_INNER]], [[TMP0]]
> +; GCN:      s_andn2_b64         exec, exec, [[TMP0]]
> +; GCN:      s_cbranch_execnz    [[INNER_LOOP]]
> +
> +; GCN: ; %Flow2
> +; GCN:      s_or_b64            exec, exec, [[TMP0]]
> +; GCN:      s_and_b64           [[TMP1:s\[[0-9]+:[0-9]+\]]], exec, [[BREAK_OUTER]]
> +; GCN:      s_or_b64            [[TMP1]], [[TMP1]], [[LEFT_OUTER]]
> +; GCN:      s_mov_b64           [[LEFT_OUTER]], [[TMP1]]
> +; GCN:      s_andn2_b64         exec, exec, [[TMP1]]
> +; GCN:      s_cbranch_execnz    [[OUTER_LOOP]]
> +
> +; GCN: ; %IF
> ; GCN-NEXT: s_endpgm
> define amdgpu_vs void @multi_else_break(<4 x float> %vec, i32 %ub, i32 %cont) {
> main_body:
> @@ -92,18 +92,12 @@ ENDIF:
> ; GCN-LABEL: {{^}}multi_if_break_loop:
> ; GCN:      s_mov_b64          [[LEFT:s\[[0-9]+:[0-9]+\]]], 0{{$}}
>
> -; GCN: ; %Flow4
> -; GCN:      s_and_b64          [[BREAK:s\[[0-9]+:[0-9]+\]]], exec, [[BREAK]]
> -; GCN:      s_or_b64           [[LEFT]], [[BREAK]], [[OLD_LEFT:s\[[0-9]+:[0-9]+\]]]
> -; GCN:      s_andn2_b64        exec, exec, [[LEFT]]
> -; GCN-NEXT: s_cbranch_execz
> -
> ; GCN: [[LOOP:BB[0-9]+_[0-9]+]]: ; %bb1{{$}}
> -; GCN:      s_mov_b64          [[OLD_LEFT]], [[LEFT]]
> +; GCN:      s_mov_b64          [[OLD_LEFT:s\[[0-9]+:[0-9]+\]]], [[LEFT]]
>
> ; GCN: ; %LeafBlock1
> ; GCN:      s_mov_b64
> -; GCN:      s_mov_b64          [[BREAK]], -1{{$}}
> +; GCN:      s_mov_b64          [[BREAK:s\[[0-9]+:[0-9]+\]]], -1{{$}}
>
> ; GCN: ; %case1
> ; GCN:      buffer_load_dword  [[LOAD2:v[0-9]+]],
> @@ -124,6 +118,12 @@ ENDIF:
> ; GCN-DAG:  s_and_b64          [[TMP:s\[[0-9]+:[0-9]+\]]], vcc, exec
> ; GCN:      s_or_b64           [[BREAK]], [[BREAK]], [[TMP]]
>
> +; GCN: ; %Flow4
> +; GCN:      s_and_b64          [[BREAK]], exec, [[BREAK]]
> +; GCN:      s_or_b64           [[LEFT]], [[BREAK]], [[OLD_LEFT]]
> +; GCN:      s_andn2_b64        exec, exec, [[LEFT]]
> +; GCN-NEXT: s_cbranch_execnz
> +
> define amdgpu_kernel void @multi_if_break_loop(i32 %arg) #0 {
> bb:
>   %id = call i32 @llvm.amdgcn.workitem.id.x()
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/optimize-negated-cond.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/optimize-negated-cond.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/optimize-negated-cond.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/optimize-negated-cond.ll Thu Aug 22 09:21:32 2019
> @@ -3,11 +3,11 @@
> ; GCN-LABEL: {{^}}negated_cond:
> ; GCN: BB0_1:
> ; GCN:   v_cmp_eq_u32_e64 [[CC:[^,]+]],
> -; GCN: BB0_3:
> +; GCN: BB0_2:
> ; GCN-NOT: v_cndmask_b32
> ; GCN-NOT: v_cmp
> ; GCN:   s_andn2_b64 vcc, exec, [[CC]]
> -; GCN:   s_cbranch_vccnz BB0_2
> +; GCN:   s_cbranch_vccnz BB0_4
> define amdgpu_kernel void @negated_cond(i32 addrspace(1)* %arg1) {
> bb:
>   br label %bb1
> @@ -36,11 +36,11 @@ bb4:
>
> ; GCN-LABEL: {{^}}negated_cond_dominated_blocks:
> ; GCN:   v_cmp_eq_u32_e64 [[CC:[^,]+]],
> -; GCN: %bb4
> +; GCN: BB1_1:
> ; GCN-NOT: v_cndmask_b32
> ; GCN-NOT: v_cmp
> ; GCN:   s_andn2_b64 vcc, exec, [[CC]]
> -; GCN:   s_cbranch_vccnz BB1_1
> +; GCN:   s_cbranch_vccz BB1_3
> define amdgpu_kernel void @negated_cond_dominated_blocks(i32 addrspace(1)* %arg1) {
> bb:
>   br label %bb2
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/si-annotate-cf.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/si-annotate-cf.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/si-annotate-cf.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/si-annotate-cf.ll Thu Aug 22 09:21:32 2019
> @@ -96,20 +96,20 @@ declare float @llvm.fabs.f32(float) noun
> ; FUNC-LABEL: {{^}}loop_land_info_assert:
> ; SI:      v_cmp_lt_i32_e64 [[CMP4:s\[[0-9:]+\]]], s{{[0-9]+}}, 4{{$}}
> ; SI:      s_and_b64        [[CMP4M:s\[[0-9]+:[0-9]+\]]], exec, [[CMP4]]
> -
> -; SI: [[WHILELOOP:BB[0-9]+_[0-9]+]]: ; %while.cond
> -; SI:      s_cbranch_vccz [[FOR_COND_PH:BB[0-9]+_[0-9]+]]
> +; SI:      s_branch         [[INFLOOP:BB[0-9]+_[0-9]+]]
>
> ; SI:      [[CONVEX_EXIT:BB[0-9_]+]]
> ; SI:      s_mov_b64        vcc,
> ; SI-NEXT: s_cbranch_vccnz  [[ENDPGM:BB[0-9]+_[0-9]+]]
> -
> -; SI:      s_cbranch_vccnz  [[WHILELOOP]]
> +; SI:      s_cbranch_vccnz  [[INFLOOP]]
>
> ; SI: ; %if.else
> ; SI:      buffer_store_dword
>
> -; SI: [[FOR_COND_PH]]: ; %for.cond.preheader
> +; SI:      [[INFLOOP]]:
> +; SI:      s_cbranch_vccnz [[CONVEX_EXIT]]
> +
> +; SI: ; %for.cond.preheader
> ; SI:      s_cbranch_vccz [[ENDPGM]]
>
> ; SI:      [[ENDPGM]]:
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/wave32.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/wave32.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/wave32.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/wave32.ll Thu Aug 22 09:21:32 2019
> @@ -166,29 +166,30 @@ endif:
> }
>
> ; GCN-LABEL: {{^}}test_loop_with_if:
> -; GFX1032: s_or_b32 s{{[0-9]+}}, vcc_lo, s{{[0-9]+}}
> -; GFX1032: s_andn2_b32 exec_lo, exec_lo, s{{[0-9]+}}
> -; GFX1064: s_or_b64 s[{{[0-9:]+}}], vcc, s[{{[0-9:]+}}]
> -; GFX1064: s_andn2_b64 exec, exec, s[{{[0-9:]+}}]
> -; GCN:     s_cbranch_execz
> -; GCN:   BB{{.*}}:
> +; GCN:   BB{{.*}}: ; %bb2
> ; GFX1032: s_and_saveexec_b32 s{{[0-9]+}}, vcc_lo
> ; GFX1064: s_and_saveexec_b64 s[{{[0-9:]+}}], vcc{{$}}
> ; GCN:     s_cbranch_execz
> -; GCN:   BB{{.*}}:
> -; GCN:   BB{{.*}}:
> +; GCN:   BB{{.*}}: ; %bb5
> +; GCN:   BB{{.*}}: ; %Flow
> ; GFX1032: s_xor_b32 s{{[0-9]+}}, exec_lo, s{{[0-9]+}}
> ; GFX1064: s_xor_b64 s[{{[0-9:]+}}], exec, s[{{[0-9:]+}}]
> ; GCN:     ; mask branch BB
> -; GCN:   BB{{.*}}:
> -; GCN:   BB{{.*}}:
> +; GCN:   BB{{.*}}: ; %bb11
> +; GCN:   BB{{.*}}: ; %Flow1
> ; GFX1032: s_or_b32 exec_lo, exec_lo, s{{[0-9]+}}
> ; GFX1032: s_and_saveexec_b32 s{{[0-9]+}}, s{{[0-9]+}}
> ; GFX1064: s_or_b64 exec, exec, s[{{[0-9:]+}}]
> ; GFX1064: s_and_saveexec_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}]{{$}}
> ; GCN:     ; mask branch BB
> -; GCN:   BB{{.*}}:
> -; GCN:   BB{{.*}}:
> +; GCN:   BB{{.*}}: ; %bb10
> +; GCN:   BB{{.*}}: ; %bb13
> +; GFX1032: s_or_b32 s{{[0-9]+}}, vcc_lo, s{{[0-9]+}}
> +; GFX1032: s_andn2_b32 exec_lo, exec_lo, s{{[0-9]+}}
> +; GFX1064: s_or_b64 s[{{[0-9:]+}}], vcc, s[{{[0-9:]+}}]
> +; GFX1064: s_andn2_b64 exec, exec, s[{{[0-9:]+}}]
> +; GCN:     s_cbranch_execnz
> +; GCN:   ; %bb1
> ; GCN:     s_endpgm
> define amdgpu_kernel void @test_loop_with_if(i32 addrspace(1)* %arg) #0 {
> bb:
> @@ -230,16 +231,17 @@ bb13:
> ; GFX1064: s_and_saveexec_b64 s[{{[0-9:]+}}], vcc{{$}}
> ; GCN:     ; mask branch
> ; GCN:     s_cbranch_execz
> -; GCN:   BB{{.*}}:
> -; GCN:   BB{{.*}}:
> +; GCN:   BB{{.*}}: ; %.preheader
> +; GCN:   ; %bb8
> ; GFX1032: s_andn2_b32 s{{[0-9]+}}, s{{[0-9]+}}, exec_lo
> ; GFX1064: s_andn2_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], exec
> ; GFX1032: s_or_b32 s{{[0-9]+}}, vcc_lo, s{{[0-9]+}}
> ; GFX1032: s_or_b32 s{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}
> ; GFX1064: s_or_b64 s[{{[0-9:]+}}], vcc, s[{{[0-9:]+}}]
> ; GFX1064: s_or_b64 s[{{[0-9:]+}}], s[{{[0-9:]+}}], s[{{[0-9:]+}}]
> -; GCN:     s_cbranch_execz
> -; GCN:   BB{{.*}}:
> +; GCN:   BB{{.*}}: ; %Flow
> +; GCN:     s_cbranch_execnz
> +; GCN:   BB{{.*}}: ; %.loopexit
> define amdgpu_kernel void @test_loop_with_if_else_break(i32 addrspace(1)* %arg) #0 {
> bb:
>   %tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
> @@ -655,7 +657,7 @@ define amdgpu_gs void @test_kill_i1_term
> ; GCN-LABEL: {{^}}test_loop_vcc:
> ; GFX1032: v_cmp_lt_f32_e32 vcc_lo,
> ; GFX1064: v_cmp_lt_f32_e32 vcc,
> -; GCN: s_cbranch_vccnz
> +; GCN: s_cbranch_vccz
> define amdgpu_ps <4 x float> @test_loop_vcc(<4 x float> %in) #0 {
> entry:
>   br label %loop
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/wqm.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/wqm.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/wqm.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/wqm.ll Thu Aug 22 09:21:32 2019
> @@ -650,15 +650,12 @@ main_body:
> ; CHECK-DAG: v_mov_b32_e32 [[CTR:v[0-9]+]], 0
> ; CHECK-DAG: s_mov_b32 [[SEVEN:s[0-9]+]], 0x40e00000
>
> -; CHECK: [[LOOPHDR:BB[0-9]+_[0-9]+]]: ; %loop
> -; CHECK: v_cmp_lt_f32_e32 vcc, [[SEVEN]], [[CTR]]
> -; CHECK: s_cbranch_vccnz
> -
> -; CHECK: ; %body
> +; CHECK: [[LOOPHDR:BB[0-9]+_[0-9]+]]: ; %body
> ; CHECK: v_add_f32_e32 [[CTR]], 2.0, [[CTR]]
> -; CHECK: s_branch [[LOOPHDR]]
> -
> +; CHECK: v_cmp_lt_f32_e32 vcc, [[SEVEN]], [[CTR]]
> +; CHECK: s_cbranch_vccz [[LOOPHDR]]
> ; CHECK: ; %break
> +
> ; CHECK: ; return
> define amdgpu_ps <4 x float> @test_loop_vcc(<4 x float> %in) nounwind {
> entry:
>
> Modified: llvm/trunk/test/CodeGen/ARM/2011-03-23-PeepholeBug.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/2011-03-23-PeepholeBug.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/ARM/2011-03-23-PeepholeBug.ll (original)
> +++ llvm/trunk/test/CodeGen/ARM/2011-03-23-PeepholeBug.ll Thu Aug 22 09:21:32 2019
> @@ -26,7 +26,7 @@ bb1:
>
> bb2:                                              ; preds = %bb1, %entry
> ; CHECK: cmp [[REG]], #0
> -; CHECK: bgt
> +; CHECK: ble
>   %indvar = phi i32 [ %indvar.next, %bb1 ], [ 0, %entry ]
>   %tries.0 = sub i32 2147483647, %indvar
>   %tmp1 = icmp sgt i32 %tries.0, 0
>
> Modified: llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll (original)
> +++ llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll Thu Aug 22 09:21:32 2019
> @@ -47,8 +47,9 @@ tailrecurse.switch:
> ; V8-NEXT: beq
> ; V8-NEXT: %tailrecurse.switch
> ; V8: cmp
> -; V8-NEXT: bne
> -; V8-NEXT: %sw.bb
> +; V8-NEXT: beq
> +; V8-NEXT: %sw.epilog
> +; V8-NEXT: bx lr
>   switch i32 %and, label %sw.epilog [
>     i32 1, label %sw.bb
>     i32 3, label %sw.bb6
>
> Modified: llvm/trunk/test/CodeGen/ARM/atomic-cmp.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/atomic-cmp.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/ARM/atomic-cmp.ll (original)
> +++ llvm/trunk/test/CodeGen/ARM/atomic-cmp.ll Thu Aug 22 09:21:32 2019
> @@ -9,8 +9,8 @@ define i8 @t(i8* %a, i8 %b, i8 %c) nounw
> ; ARM: clrex
>
> ; T2-LABEL: t:
> -; T2: ldrexb
> ; T2: strexb
> +; T2: ldrexb
> ; T2: clrex
>   %tmp0 = cmpxchg i8* %a, i8 %b, i8 %c monotonic monotonic
>   %tmp1 = extractvalue { i8, i1 } %tmp0, 0
>
> Modified: llvm/trunk/test/CodeGen/ARM/atomic-cmpxchg.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/atomic-cmpxchg.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/ARM/atomic-cmpxchg.ll (original)
> +++ llvm/trunk/test/CodeGen/ARM/atomic-cmpxchg.ll Thu Aug 22 09:21:32 2019
> @@ -52,16 +52,16 @@ entry:
> ; CHECK-ARMV7-LABEL: test_cmpxchg_res_i8:
> ; CHECK-ARMV7-NEXT: .fnstart
> ; CHECK-ARMV7-NEXT: uxtb [[DESIRED:r[0-9]+]], r1
> -; CHECK-ARMV7-NEXT: [[TRY:.LBB[0-9_]+]]:
> -; CHECK-ARMV7-NEXT: ldrexb [[SUCCESS:r[0-9]+]], [r0]
> -; CHECK-ARMV7-NEXT: cmp [[SUCCESS]], r1
> -; CHECK-ARMV7-NEXT: bne [[EXIT:.LBB[0-9_]+]]
> -; CHECK-ARMV7-NEXT: strexb [[SUCCESS]], r2, [r0]
> +; CHECK-ARMV7-NEXT: b [[TRY:.LBB[0-9_]+]]
> +; CHECK-ARMV7-NEXT: [[HEAD:.LBB[0-9_]+]]:
> +; CHECK-ARMV7-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, [r0]
> ; CHECK-ARMV7-NEXT: cmp [[SUCCESS]], #0
> ; CHECK-ARMV7-NEXT: moveq r0, #1
> ; CHECK-ARMV7-NEXT: bxeq lr
> -; CHECK-ARMV7-NEXT: b [[TRY]]
> -; CHECK-ARMV7-NEXT: [[EXIT]]:
> +; CHECK-ARMV7-NEXT: [[TRY]]:
> +; CHECK-ARMV7-NEXT: ldrexb [[SUCCESS]], [r0]
> +; CHECK-ARMV7-NEXT: cmp [[SUCCESS]], r1
> +; CHECK-ARMV7-NEXT: beq [[HEAD]]
> ; CHECK-ARMV7-NEXT: mov r0, #0
> ; CHECK-ARMV7-NEXT: clrex
> ; CHECK-ARMV7-NEXT: bx lr
> @@ -69,17 +69,17 @@ entry:
> ; CHECK-THUMBV7-LABEL: test_cmpxchg_res_i8:
> ; CHECK-THUMBV7-NEXT: .fnstart
> ; CHECK-THUMBV7-NEXT: uxtb [[DESIRED:r[0-9]+]], r1
> -; CHECK-THUMBV7-NEXT: [[TRYLD:.LBB[0-9_]+]]
> -; CHECK-THUMBV7-NEXT: ldrexb [[LD:r[0-9]+]], [r0]
> -; CHECK-THUMBV7-NEXT: cmp [[LD]], [[DESIRED]]
> -; CHECK-THUMBV7-NEXT: bne [[EXIT:.LBB[0-9_]+]]
> +; CHECK-THUMBV7-NEXT: b [[TRYLD:.LBB[0-9_]+]]
> +; CHECK-THUMBV7-NEXT: [[TRYST:.LBB[0-9_]+]]:
> ; CHECK-THUMBV7-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, [r0]
> ; CHECK-THUMBV7-NEXT: cmp [[SUCCESS]], #0
> ; CHECK-THUMBV7-NEXT: itt eq
> ; CHECK-THUMBV7-NEXT: moveq r0, #1
> ; CHECK-THUMBV7-NEXT: bxeq lr
> -; CHECK-THUMBV7-NEXT: b [[TRYLD]]
> -; CHECK-THUMBV7-NEXT: [[EXIT]]:
> +; CHECK-THUMBV7-NEXT: [[TRYLD]]:
> +; CHECK-THUMBV7-NEXT: ldrexb [[LD:r[0-9]+]], [r0]
> +; CHECK-THUMBV7-NEXT: cmp [[LD]], [[DESIRED]]
> +; CHECK-THUMBV7-NEXT: beq [[TRYST:.LBB[0-9_]+]]
> ; CHECK-THUMBV7-NEXT: movs r0, #0
> ; CHECK-THUMBV7-NEXT: clrex
> ; CHECK-THUMBV7-NEXT: bx lr
>
> Modified: llvm/trunk/test/CodeGen/ARM/code-placement.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/code-placement.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/ARM/code-placement.ll (original)
> +++ llvm/trunk/test/CodeGen/ARM/code-placement.ll Thu Aug 22 09:21:32 2019
> @@ -38,9 +38,8 @@ entry:
>   br i1 %0, label %bb5, label %bb.nph15
>
> bb1:                                              ; preds = %bb2.preheader, %bb1
> -; CHECK: LBB1_[[BB3:.]]: @ %bb3
> ; CHECK: LBB1_[[PREHDR:.]]: @ %bb2.preheader
> -; CHECK: bmi LBB1_[[BB3]]
> +; CHECK: bmi LBB1_[[BB3:.]]
>   %indvar = phi i32 [ %indvar.next, %bb1 ], [ 0, %bb2.preheader ] ; <i32> [#uses=2]
>   %sum.08 = phi i32 [ %2, %bb1 ], [ %sum.110, %bb2.preheader ] ; <i32> [#uses=1]
>   %tmp17 = sub i32 %i.07, %indvar                 ; <i32> [#uses=1]
> @@ -54,6 +53,7 @@ bb1:
> bb3:                                              ; preds = %bb1, %bb2.preheader
> ; CHECK: LBB1_[[BB1:.]]: @ %bb1
> ; CHECK: bne LBB1_[[BB1]]
> +; CHECK: LBB1_[[BB3]]: @ %bb3
>   %sum.0.lcssa = phi i32 [ %sum.110, %bb2.preheader ], [ %2, %bb1 ] ; <i32> [#uses=2]
>   %3 = add i32 %pass.011, 1                       ; <i32> [#uses=2]
>   %exitcond18 = icmp eq i32 %3, %passes           ; <i1> [#uses=1]
>
> Modified: llvm/trunk/test/CodeGen/ARM/pr32578.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/pr32578.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/ARM/pr32578.ll (original)
> +++ llvm/trunk/test/CodeGen/ARM/pr32578.ll Thu Aug 22 09:21:32 2019
> @@ -4,7 +4,7 @@ target triple = "armv7"
> ; CHECK-LABEL: func:
> ; CHECK: push {r11, lr}
> ; CHECK: vpush {d8}
> -; CHECK: .LBB0_1: @ %tailrecurse
> +; CHECK: b .LBB0_2
> define arm_aapcscc double @func() {
>   br label %tailrecurse
>
>
> Modified: llvm/trunk/test/CodeGen/Hexagon/bug6757-endloop.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Hexagon/bug6757-endloop.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/Hexagon/bug6757-endloop.ll (original)
> +++ llvm/trunk/test/CodeGen/Hexagon/bug6757-endloop.ll Thu Aug 22 09:21:32 2019
> @@ -4,10 +4,10 @@
> ; This situation can arise due to tail duplication.
>
> ; CHECK: loop1([[LP:.LBB0_[0-9]+]]
> -; CHECK: endloop1
> ; CHECK: [[LP]]:
> ; CHECK-NOT: loop1(
> ; CHECK: endloop1
> +; CHECK: endloop1
>
> %s.0 = type { i32, i8* }
> %s.1 = type { i32, i32, i32, i32 }
>
> Modified: llvm/trunk/test/CodeGen/Hexagon/early-if-merge-loop.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Hexagon/early-if-merge-loop.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/Hexagon/early-if-merge-loop.ll (original)
> +++ llvm/trunk/test/CodeGen/Hexagon/early-if-merge-loop.ll Thu Aug 22 09:21:32 2019
> @@ -2,11 +2,9 @@
> ; Make sure that the loop in the end has only one basic block.
>
> ; CHECK-LABEL: fred
> -; CHECK: %b2
> ; Rely on the comments, make sure the one for the loop header is present.
> ; CHECK: %loop
> -; CHECK: %should_merge
> -; CHECK: %exit
> +; CHECK-NOT: %should_merge
>
> target triple = "hexagon"
>
>
> Modified: llvm/trunk/test/CodeGen/Hexagon/prof-early-if.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Hexagon/prof-early-if.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/Hexagon/prof-early-if.ll (original)
> +++ llvm/trunk/test/CodeGen/Hexagon/prof-early-if.ll Thu Aug 22 09:21:32 2019
> @@ -1,8 +1,8 @@
> ; RUN: llc -O2 -march=hexagon < %s | FileCheck %s
> ; Rely on the comments generated by llc. Check that "if.then" was not predicated.
> -; CHECK: b5
> ; CHECK: b2
> ; CHECK-NOT: if{{.*}}memd
> +; CHECK: b5
>
> %s.0 = type { [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [3 x i32], [24 x i32], [8 x %s.1], [5 x i32] }
> %s.1 = type { i32, i32 }
>
> Modified: llvm/trunk/test/CodeGen/Hexagon/redundant-branching2.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Hexagon/redundant-branching2.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/Hexagon/redundant-branching2.ll (original)
> +++ llvm/trunk/test/CodeGen/Hexagon/redundant-branching2.ll Thu Aug 22 09:21:32 2019
> @@ -3,9 +3,9 @@
>
> ; CHECK: memub
> ; CHECK: memub
> -; CHECK: cmp.eq
> ; CHECK: memub
> ; CHECK-NOT: if{{.*}}jump .LBB
> +; CHECK: cmp.eq
>
> target triple = "hexagon-unknown--elf"
>
>
> Modified: llvm/trunk/test/CodeGen/PowerPC/atomics-regression.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/atomics-regression.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/PowerPC/atomics-regression.ll (original)
> +++ llvm/trunk/test/CodeGen/PowerPC/atomics-regression.ll Thu Aug 22 09:21:32 2019
> @@ -401,15 +401,16 @@ define void @test40(i8* %ptr, i8 %cmp, i
> ; PPC64LE-LABEL: test40:
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    rlwinm 4, 4, 0, 24, 31
> +; PPC64LE-NEXT:    b .LBB40_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB40_1:
> -; PPC64LE-NEXT:    lbarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB40_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stbcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB40_1
> -; PPC64LE-NEXT:  .LBB40_3:
> +; PPC64LE-NEXT:  .LBB40_2:
> +; PPC64LE-NEXT:    lbarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB40_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stbcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i8* %ptr, i8 %cmp, i8 %val monotonic monotonic
> @@ -465,15 +466,16 @@ define void @test43(i8* %ptr, i8 %cmp, i
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    rlwinm 4, 4, 0, 24, 31
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB43_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB43_1:
> -; PPC64LE-NEXT:    lbarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB43_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stbcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB43_1
> -; PPC64LE-NEXT:  .LBB43_3:
> +; PPC64LE-NEXT:  .LBB43_2:
> +; PPC64LE-NEXT:    lbarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB43_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stbcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i8* %ptr, i8 %cmp, i8 %val release monotonic
> @@ -485,15 +487,16 @@ define void @test44(i8* %ptr, i8 %cmp, i
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    rlwinm 4, 4, 0, 24, 31
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB44_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB44_1:
> -; PPC64LE-NEXT:    lbarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB44_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stbcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB44_1
> -; PPC64LE-NEXT:  .LBB44_3:
> +; PPC64LE-NEXT:  .LBB44_2:
> +; PPC64LE-NEXT:    lbarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB44_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stbcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i8* %ptr, i8 %cmp, i8 %val release acquire
> @@ -619,15 +622,16 @@ define void @test50(i16* %ptr, i16 %cmp,
> ; PPC64LE-LABEL: test50:
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    rlwinm 4, 4, 0, 16, 31
> +; PPC64LE-NEXT:    b .LBB50_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB50_1:
> -; PPC64LE-NEXT:    lharx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB50_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    sthcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB50_1
> -; PPC64LE-NEXT:  .LBB50_3:
> +; PPC64LE-NEXT:  .LBB50_2:
> +; PPC64LE-NEXT:    lharx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB50_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    sthcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i16* %ptr, i16 %cmp, i16 %val monotonic monotonic
> @@ -683,15 +687,16 @@ define void @test53(i16* %ptr, i16 %cmp,
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    rlwinm 4, 4, 0, 16, 31
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB53_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB53_1:
> -; PPC64LE-NEXT:    lharx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB53_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    sthcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB53_1
> -; PPC64LE-NEXT:  .LBB53_3:
> +; PPC64LE-NEXT:  .LBB53_2:
> +; PPC64LE-NEXT:    lharx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB53_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    sthcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i16* %ptr, i16 %cmp, i16 %val release monotonic
> @@ -703,15 +708,16 @@ define void @test54(i16* %ptr, i16 %cmp,
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    rlwinm 4, 4, 0, 16, 31
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB54_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB54_1:
> -; PPC64LE-NEXT:    lharx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB54_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    sthcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB54_1
> -; PPC64LE-NEXT:  .LBB54_3:
> +; PPC64LE-NEXT:  .LBB54_2:
> +; PPC64LE-NEXT:    lharx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB54_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    sthcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i16* %ptr, i16 %cmp, i16 %val release acquire
> @@ -836,15 +842,16 @@ define void @test59(i16* %ptr, i16 %cmp,
> define void @test60(i32* %ptr, i32 %cmp, i32 %val) {
> ; PPC64LE-LABEL: test60:
> ; PPC64LE:       # %bb.0:
> +; PPC64LE-NEXT:    b .LBB60_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB60_1:
> -; PPC64LE-NEXT:    lwarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB60_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stwcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB60_1
> -; PPC64LE-NEXT:  .LBB60_3:
> +; PPC64LE-NEXT:  .LBB60_2:
> +; PPC64LE-NEXT:    lwarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB60_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stwcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i32* %ptr, i32 %cmp, i32 %val monotonic monotonic
> @@ -897,15 +904,16 @@ define void @test63(i32* %ptr, i32 %cmp,
> ; PPC64LE-LABEL: test63:
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB63_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB63_1:
> -; PPC64LE-NEXT:    lwarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB63_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stwcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB63_1
> -; PPC64LE-NEXT:  .LBB63_3:
> +; PPC64LE-NEXT:  .LBB63_2:
> +; PPC64LE-NEXT:    lwarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB63_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stwcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i32* %ptr, i32 %cmp, i32 %val release monotonic
> @@ -916,15 +924,16 @@ define void @test64(i32* %ptr, i32 %cmp,
> ; PPC64LE-LABEL: test64:
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB64_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB64_1:
> -; PPC64LE-NEXT:    lwarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB64_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stwcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB64_1
> -; PPC64LE-NEXT:  .LBB64_3:
> +; PPC64LE-NEXT:  .LBB64_2:
> +; PPC64LE-NEXT:    lwarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB64_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stwcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i32* %ptr, i32 %cmp, i32 %val release acquire
> @@ -1044,15 +1053,16 @@ define void @test69(i32* %ptr, i32 %cmp,
> define void @test70(i64* %ptr, i64 %cmp, i64 %val) {
> ; PPC64LE-LABEL: test70:
> ; PPC64LE:       # %bb.0:
> +; PPC64LE-NEXT:    b .LBB70_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB70_1:
> -; PPC64LE-NEXT:    ldarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpd 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB70_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stdcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB70_1
> -; PPC64LE-NEXT:  .LBB70_3:
> +; PPC64LE-NEXT:  .LBB70_2:
> +; PPC64LE-NEXT:    ldarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpd 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB70_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stdcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i64* %ptr, i64 %cmp, i64 %val monotonic monotonic
> @@ -1105,15 +1115,16 @@ define void @test73(i64* %ptr, i64 %cmp,
> ; PPC64LE-LABEL: test73:
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB73_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB73_1:
> -; PPC64LE-NEXT:    ldarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpd 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB73_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stdcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB73_1
> -; PPC64LE-NEXT:  .LBB73_3:
> +; PPC64LE-NEXT:  .LBB73_2:
> +; PPC64LE-NEXT:    ldarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpd 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB73_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stdcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i64* %ptr, i64 %cmp, i64 %val release monotonic
> @@ -1124,15 +1135,16 @@ define void @test74(i64* %ptr, i64 %cmp,
> ; PPC64LE-LABEL: test74:
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB74_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB74_1:
> -; PPC64LE-NEXT:    ldarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpd 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB74_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stdcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB74_1
> -; PPC64LE-NEXT:  .LBB74_3:
> +; PPC64LE-NEXT:  .LBB74_2:
> +; PPC64LE-NEXT:    ldarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpd 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB74_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stdcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i64* %ptr, i64 %cmp, i64 %val release acquire
> @@ -1253,15 +1265,16 @@ define void @test80(i8* %ptr, i8 %cmp, i
> ; PPC64LE-LABEL: test80:
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    rlwinm 4, 4, 0, 24, 31
> +; PPC64LE-NEXT:    b .LBB80_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB80_1:
> -; PPC64LE-NEXT:    lbarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB80_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stbcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB80_1
> -; PPC64LE-NEXT:  .LBB80_3:
> +; PPC64LE-NEXT:  .LBB80_2:
> +; PPC64LE-NEXT:    lbarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB80_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stbcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i8* %ptr, i8 %cmp, i8 %val syncscope("singlethread") monotonic monotonic
> @@ -1317,15 +1330,16 @@ define void @test83(i8* %ptr, i8 %cmp, i
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    rlwinm 4, 4, 0, 24, 31
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB83_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB83_1:
> -; PPC64LE-NEXT:    lbarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB83_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stbcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB83_1
> -; PPC64LE-NEXT:  .LBB83_3:
> +; PPC64LE-NEXT:  .LBB83_2:
> +; PPC64LE-NEXT:    lbarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB83_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stbcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i8* %ptr, i8 %cmp, i8 %val syncscope("singlethread") release monotonic
> @@ -1337,15 +1351,16 @@ define void @test84(i8* %ptr, i8 %cmp, i
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    rlwinm 4, 4, 0, 24, 31
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB84_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB84_1:
> -; PPC64LE-NEXT:    lbarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB84_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stbcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB84_1
> -; PPC64LE-NEXT:  .LBB84_3:
> +; PPC64LE-NEXT:  .LBB84_2:
> +; PPC64LE-NEXT:    lbarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB84_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stbcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i8* %ptr, i8 %cmp, i8 %val syncscope("singlethread") release acquire
> @@ -1471,15 +1486,16 @@ define void @test90(i16* %ptr, i16 %cmp,
> ; PPC64LE-LABEL: test90:
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    rlwinm 4, 4, 0, 16, 31
> +; PPC64LE-NEXT:    b .LBB90_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB90_1:
> -; PPC64LE-NEXT:    lharx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB90_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    sthcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b
> -; PPC64LE-NEXT:  .LBB90_3:
> +; PPC64LE-NEXT:  .LBB90_2:
> +; PPC64LE-NEXT:    lharx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB90_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    sthcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i16* %ptr, i16 %cmp, i16 %val syncscope("singlethread") monotonic monotonic
> @@ -1535,15 +1551,16 @@ define void @test93(i16* %ptr, i16 %cmp,
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    rlwinm 4, 4, 0, 16, 31
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB93_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB93_1:
> -; PPC64LE-NEXT:    lharx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB93_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    sthcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB93_1
> -; PPC64LE-NEXT:  .LBB93_3:
> +; PPC64LE-NEXT:  .LBB93_2:
> +; PPC64LE-NEXT:    lharx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB93_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    sthcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i16* %ptr, i16 %cmp, i16 %val syncscope("singlethread") release monotonic
> @@ -1555,15 +1572,16 @@ define void @test94(i16* %ptr, i16 %cmp,
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    rlwinm 4, 4, 0, 16, 31
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB94_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB94_1:
> -; PPC64LE-NEXT:    lharx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB94_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    sthcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB94_1
> -; PPC64LE-NEXT:  .LBB94_3:
> +; PPC64LE-NEXT:  .LBB94_2:
> +; PPC64LE-NEXT:    lharx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB94_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    sthcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i16* %ptr, i16 %cmp, i16 %val syncscope("singlethread") release acquire
> @@ -1688,15 +1706,16 @@ define void @test99(i16* %ptr, i16 %cmp,
> define void @test100(i32* %ptr, i32 %cmp, i32 %val) {
> ; PPC64LE-LABEL: test100:
> ; PPC64LE:       # %bb.0:
> +; PPC64LE-NEXT:    b .LBB100_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB100_1:
> -; PPC64LE-NEXT:    lwarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB100_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stwcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB100_1
> -; PPC64LE-NEXT:  .LBB100_3:
> +; PPC64LE-NEXT:  .LBB100_2:
> +; PPC64LE-NEXT:    lwarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB100_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stwcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i32* %ptr, i32 %cmp, i32 %val syncscope("singlethread") monotonic monotonic
> @@ -1749,15 +1768,16 @@ define void @test103(i32* %ptr, i32 %cmp
> ; PPC64LE-LABEL: test103:
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB103_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB103_1:
> -; PPC64LE-NEXT:    lwarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB103_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stwcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB103_1
> -; PPC64LE-NEXT:  .LBB103_3:
> +; PPC64LE-NEXT:  .LBB103_2:
> +; PPC64LE-NEXT:    lwarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB103_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stwcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i32* %ptr, i32 %cmp, i32 %val syncscope("singlethread") release monotonic
> @@ -1768,15 +1788,16 @@ define void @test104(i32* %ptr, i32 %cmp
> ; PPC64LE-LABEL: test104:
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB104_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB104_1:
> -; PPC64LE-NEXT:    lwarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpw 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB104_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stwcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB104_1
> -; PPC64LE-NEXT:  .LBB104_3:
> +; PPC64LE-NEXT:  .LBB104_2:
> +; PPC64LE-NEXT:    lwarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpw 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB104_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stwcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i32* %ptr, i32 %cmp, i32 %val syncscope("singlethread") release acquire
> @@ -1896,15 +1917,16 @@ define void @test109(i32* %ptr, i32 %cmp
> define void @test110(i64* %ptr, i64 %cmp, i64 %val) {
> ; PPC64LE-LABEL: test110:
> ; PPC64LE:       # %bb.0:
> +; PPC64LE-NEXT:    b .LBB110_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB110_1:
> -; PPC64LE-NEXT:    ldarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpd 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB110_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stdcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB110_1
> -; PPC64LE-NEXT:  .LBB110_3:
> +; PPC64LE-NEXT:  .LBB110_2:
> +; PPC64LE-NEXT:    ldarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpd 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB110_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stdcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i64* %ptr, i64 %cmp, i64 %val syncscope("singlethread") monotonic monotonic
> @@ -1957,15 +1979,16 @@ define void @test113(i64* %ptr, i64 %cmp
> ; PPC64LE-LABEL: test113:
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB113_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB113_1:
> -; PPC64LE-NEXT:    ldarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpd 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB113_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stdcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB113_1
> -; PPC64LE-NEXT:  .LBB113_3:
> +; PPC64LE-NEXT:  .LBB113_2:
> +; PPC64LE-NEXT:    ldarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpd 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB113_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stdcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i64* %ptr, i64 %cmp, i64 %val syncscope("singlethread") release monotonic
> @@ -1976,15 +1999,16 @@ define void @test114(i64* %ptr, i64 %cmp
> ; PPC64LE-LABEL: test114:
> ; PPC64LE:       # %bb.0:
> ; PPC64LE-NEXT:    lwsync
> +; PPC64LE-NEXT:    b .LBB114_2
> +; PPC64LE-NEXT:    .p2align 5
> ; PPC64LE-NEXT:  .LBB114_1:
> -; PPC64LE-NEXT:    ldarx 6, 0, 3
> -; PPC64LE-NEXT:    cmpd 4, 6
> -; PPC64LE-NEXT:    bne 0, .LBB114_3
> -; PPC64LE-NEXT:  # %bb.2:
> ; PPC64LE-NEXT:    stdcx. 5, 0, 3
> ; PPC64LE-NEXT:    beqlr 0
> -; PPC64LE-NEXT:    b .LBB114_1
> -; PPC64LE-NEXT:  .LBB114_3:
> +; PPC64LE-NEXT:  .LBB114_2:
> +; PPC64LE-NEXT:    ldarx 6, 0, 3
> +; PPC64LE-NEXT:    cmpd 4, 6
> +; PPC64LE-NEXT:    beq 0, .LBB114_1
> +; PPC64LE-NEXT:  # %bb.3:
> ; PPC64LE-NEXT:    stdcx. 6, 0, 3
> ; PPC64LE-NEXT:    blr
>   %res = cmpxchg i64* %ptr, i64 %cmp, i64 %val syncscope("singlethread") release acquire
>
> Modified: llvm/trunk/test/CodeGen/PowerPC/block-placement-1.mir
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/block-placement-1.mir?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/PowerPC/block-placement-1.mir (original)
> +++ llvm/trunk/test/CodeGen/PowerPC/block-placement-1.mir Thu Aug 22 09:21:32 2019
> @@ -298,14 +298,14 @@ body:             |
>
>   bb.11.unreachable:
>
> -  ; CHECK:      bb.1.for.body:
> -  ; CHECK:        successors: %bb.2(0x7ffff800), %bb.3(0x00000800)
> -  ; CHECK:        B %bb.2
> -
>   ; CHECK:      bb.4.catch4:
>   ; CHECK:        successors: %bb.11(0x7ffff800), %bb.6(0x00000800)
>   ; CHECK:        B %bb.11
>
> +  ; CHECK:      bb.1.for.body (align 4):
> +  ; CHECK:        successors: %bb.2(0x7ffff800), %bb.3(0x00000800)
> +  ; CHECK:        B %bb.2
> +
>   ; CHECK:      bb.2..noexc:
>
>   ; CHECK:      bb.11.unreachable:
>
> Modified: llvm/trunk/test/CodeGen/PowerPC/cmp_elimination.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/cmp_elimination.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/PowerPC/cmp_elimination.ll (original)
> +++ llvm/trunk/test/CodeGen/PowerPC/cmp_elimination.ll Thu Aug 22 09:21:32 2019
> @@ -718,14 +718,13 @@ if.end:
> define void @func28(i32 signext %a) {
> ; CHECK-LABEL: @func28
> ; CHECK: cmplwi [[REG1:[0-9]+]], [[REG2:[0-9]+]]
> -; CHECK: .[[LABEL2:[A-Z0-9_]+]]:
> -; CHECK: cmpwi   [[REG1]], [[REG2]]
> -; CHECK: ble     0, .[[LABEL1:[A-Z0-9_]+]]
> +; CHECK: .[[LABEL1:[A-Z0-9_]+]]:
> ; CHECK-NOT: cmp
> -; CHECK: bne     0, .[[LABEL2]]
> +; CHECK: bne 0, .[[LABEL2:[A-Z0-9_]+]]
> ; CHECK: bl dummy1
> -; CHECK: b .[[LABEL2]]
> -; CHECK: .[[LABEL1]]:
> +; CHECK: .[[LABEL2]]:
> +; CHECK: cmpwi [[REG1]], [[REG2]]
> +; CHECK: bgt 0, .[[LABEL1]]
> ; CHECK: blr
> entry:
>   br label %do.body
>
> Modified: llvm/trunk/test/CodeGen/PowerPC/licm-remat.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/licm-remat.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/PowerPC/licm-remat.ll (original)
> +++ llvm/trunk/test/CodeGen/PowerPC/licm-remat.ll Thu Aug 22 09:21:32 2019
> @@ -24,7 +24,8 @@ define linkonce_odr void @ZN6snappyDecom
> ; CHECK-DAG:   addi 25, 3, _ZN6snappy8internalL8wordmaskE at toc@l
> ; CHECK-DAG:   addis 5, 2, _ZN6snappy8internalL10char_tableE at toc@ha
> ; CHECK-DAG:   addi 24, 5, _ZN6snappy8internalL10char_tableE at toc@l
> -; CHECK:       .LBB0_2: # %for.cond
> +; CHECK:       b .[[LABEL1:[A-Z0-9_]+]]
> +; CHECK:       .[[LABEL1]]: # %for.cond
> ; CHECK-NOT:   addis {{[0-9]+}}, 2, _ZN6snappy8internalL8wordmaskE at toc@ha
> ; CHECK-NOT:   addis {{[0-9]+}}, 2, _ZN6snappy8internalL10char_tableE at toc@ha
> ; CHECK:       bctrl
>
> Modified: llvm/trunk/test/CodeGen/PowerPC/machine-pre.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/machine-pre.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/PowerPC/machine-pre.ll (original)
> +++ llvm/trunk/test/CodeGen/PowerPC/machine-pre.ll Thu Aug 22 09:21:32 2019
> @@ -75,19 +75,8 @@ define dso_local signext i32 @foo(i32 si
> ; CHECK-P9-NEXT:    lis r3, 21845
> ; CHECK-P9-NEXT:    add r28, r30, r29
> ; CHECK-P9-NEXT:    ori r27, r3, 21846
> -; CHECK-P9-NEXT:    b .LBB1_4
> ; CHECK-P9-NEXT:    .p2align 4
> -; CHECK-P9-NEXT:  .LBB1_1: # %sw.bb3
> -; CHECK-P9-NEXT:    #
> -; CHECK-P9-NEXT:    mulli r3, r30, 23
> -; CHECK-P9-NEXT:  .LBB1_2: # %sw.epilog
> -; CHECK-P9-NEXT:    #
> -; CHECK-P9-NEXT:    add r28, r3, r28
> -; CHECK-P9-NEXT:  .LBB1_3: # %sw.epilog
> -; CHECK-P9-NEXT:    #
> -; CHECK-P9-NEXT:    cmpwi r28, 1025
> -; CHECK-P9-NEXT:    bge cr0, .LBB1_7
> -; CHECK-P9-NEXT:  .LBB1_4: # %while.cond
> +; CHECK-P9-NEXT:  .LBB1_1: # %while.cond
> ; CHECK-P9-NEXT:    #
> ; CHECK-P9-NEXT:    extsw r3, r29
> ; CHECK-P9-NEXT:    bl bar
> @@ -106,16 +95,27 @@ define dso_local signext i32 @foo(i32 si
> ; CHECK-P9-NEXT:    add r4, r4, r5
> ; CHECK-P9-NEXT:    subf r3, r4, r3
> ; CHECK-P9-NEXT:    cmplwi r3, 1
> -; CHECK-P9-NEXT:    beq cr0, .LBB1_1
> -; CHECK-P9-NEXT:  # %bb.5: # %while.cond
> +; CHECK-P9-NEXT:    beq cr0, .LBB1_4
> +; CHECK-P9-NEXT:  # %bb.2: # %while.cond
> ; CHECK-P9-NEXT:    #
> ; CHECK-P9-NEXT:    cmplwi r3, 0
> -; CHECK-P9-NEXT:    bne cr0, .LBB1_3
> -; CHECK-P9-NEXT:  # %bb.6: # %sw.bb
> +; CHECK-P9-NEXT:    bne cr0, .LBB1_6
> +; CHECK-P9-NEXT:  # %bb.3: # %sw.bb
> ; CHECK-P9-NEXT:    #
> ; CHECK-P9-NEXT:    mulli r3, r29, 13
> -; CHECK-P9-NEXT:    b .LBB1_2
> -; CHECK-P9-NEXT:  .LBB1_7: # %while.end
> +; CHECK-P9-NEXT:    b .LBB1_5
> +; CHECK-P9-NEXT:    .p2align 4
> +; CHECK-P9-NEXT:  .LBB1_4: # %sw.bb3
> +; CHECK-P9-NEXT:    #
> +; CHECK-P9-NEXT:    mulli r3, r30, 23
> +; CHECK-P9-NEXT:  .LBB1_5: # %sw.epilog
> +; CHECK-P9-NEXT:    #
> +; CHECK-P9-NEXT:    add r28, r3, r28
> +; CHECK-P9-NEXT:  .LBB1_6: # %sw.epilog
> +; CHECK-P9-NEXT:    #
> +; CHECK-P9-NEXT:    cmpwi r28, 1025
> +; CHECK-P9-NEXT:    blt cr0, .LBB1_1
> +; CHECK-P9-NEXT:  # %bb.7: # %while.end
> ; CHECK-P9-NEXT:    lis r3, -13108
> ; CHECK-P9-NEXT:    ori r3, r3, 52429
> ; CHECK-P9-NEXT:    mullw r3, r28, r3
>
> Modified: llvm/trunk/test/CodeGen/RISCV/atomic-rmw.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/RISCV/atomic-rmw.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/RISCV/atomic-rmw.ll (original)
> +++ llvm/trunk/test/CodeGen/RISCV/atomic-rmw.ll Thu Aug 22 09:21:32 2019
> @@ -2083,9 +2083,17 @@ define i8 @atomicrmw_max_i8_monotonic(i8
> ; RV32I-NEXT:    slli a1, a1, 24
> ; RV32I-NEXT:    srai s0, a1, 24
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB35_2
> ; RV32I-NEXT:  .LBB35_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB35_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 24
> +; RV32I-NEXT:    srai a1, a1, 24
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    blt s0, a1, .LBB35_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB35_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB35_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB35_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -2094,18 +2102,8 @@ define i8 @atomicrmw_max_i8_monotonic(i8
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB35_4
> -; RV32I-NEXT:  .LBB35_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 24
> -; RV32I-NEXT:    srai a1, a1, 24
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    blt s0, a1, .LBB35_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB35_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB35_1
> -; RV32I-NEXT:  .LBB35_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB35_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -2158,9 +2156,17 @@ define i8 @atomicrmw_max_i8_monotonic(i8
> ; RV64I-NEXT:    slli a1, a1, 56
> ; RV64I-NEXT:    srai s0, a1, 56
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB35_2
> ; RV64I-NEXT:  .LBB35_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB35_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 56
> +; RV64I-NEXT:    srai a1, a1, 56
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB35_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB35_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB35_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB35_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -2169,18 +2175,8 @@ define i8 @atomicrmw_max_i8_monotonic(i8
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB35_4
> -; RV64I-NEXT:  .LBB35_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 56
> -; RV64I-NEXT:    srai a1, a1, 56
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB35_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB35_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB35_1
> -; RV64I-NEXT:  .LBB35_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB35_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -2237,9 +2233,17 @@ define i8 @atomicrmw_max_i8_acquire(i8 *
> ; RV32I-NEXT:    slli a1, a1, 24
> ; RV32I-NEXT:    srai s0, a1, 24
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB36_2
> ; RV32I-NEXT:  .LBB36_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB36_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 24
> +; RV32I-NEXT:    srai a1, a1, 24
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    blt s0, a1, .LBB36_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB36_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB36_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB36_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -2248,18 +2252,8 @@ define i8 @atomicrmw_max_i8_acquire(i8 *
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB36_4
> -; RV32I-NEXT:  .LBB36_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 24
> -; RV32I-NEXT:    srai a1, a1, 24
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    blt s0, a1, .LBB36_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB36_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB36_1
> -; RV32I-NEXT:  .LBB36_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB36_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -2312,9 +2306,17 @@ define i8 @atomicrmw_max_i8_acquire(i8 *
> ; RV64I-NEXT:    slli a1, a1, 56
> ; RV64I-NEXT:    srai s0, a1, 56
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB36_2
> ; RV64I-NEXT:  .LBB36_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB36_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 56
> +; RV64I-NEXT:    srai a1, a1, 56
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB36_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB36_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB36_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB36_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -2323,18 +2325,8 @@ define i8 @atomicrmw_max_i8_acquire(i8 *
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB36_4
> -; RV64I-NEXT:  .LBB36_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 56
> -; RV64I-NEXT:    srai a1, a1, 56
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB36_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB36_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB36_1
> -; RV64I-NEXT:  .LBB36_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB36_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -2391,9 +2383,17 @@ define i8 @atomicrmw_max_i8_release(i8 *
> ; RV32I-NEXT:    slli a1, a1, 24
> ; RV32I-NEXT:    srai s0, a1, 24
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB37_2
> ; RV32I-NEXT:  .LBB37_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB37_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 24
> +; RV32I-NEXT:    srai a1, a1, 24
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    blt s0, a1, .LBB37_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB37_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB37_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB37_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -2402,18 +2402,8 @@ define i8 @atomicrmw_max_i8_release(i8 *
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB37_4
> -; RV32I-NEXT:  .LBB37_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 24
> -; RV32I-NEXT:    srai a1, a1, 24
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    blt s0, a1, .LBB37_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB37_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB37_1
> -; RV32I-NEXT:  .LBB37_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB37_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -2466,9 +2456,17 @@ define i8 @atomicrmw_max_i8_release(i8 *
> ; RV64I-NEXT:    slli a1, a1, 56
> ; RV64I-NEXT:    srai s0, a1, 56
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB37_2
> ; RV64I-NEXT:  .LBB37_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB37_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 56
> +; RV64I-NEXT:    srai a1, a1, 56
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB37_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB37_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB37_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB37_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -2477,18 +2475,8 @@ define i8 @atomicrmw_max_i8_release(i8 *
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB37_4
> -; RV64I-NEXT:  .LBB37_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 56
> -; RV64I-NEXT:    srai a1, a1, 56
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB37_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB37_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB37_1
> -; RV64I-NEXT:  .LBB37_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB37_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -2545,9 +2533,17 @@ define i8 @atomicrmw_max_i8_acq_rel(i8 *
> ; RV32I-NEXT:    slli a1, a1, 24
> ; RV32I-NEXT:    srai s0, a1, 24
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB38_2
> ; RV32I-NEXT:  .LBB38_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB38_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 24
> +; RV32I-NEXT:    srai a1, a1, 24
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    blt s0, a1, .LBB38_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB38_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB38_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB38_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -2556,18 +2552,8 @@ define i8 @atomicrmw_max_i8_acq_rel(i8 *
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB38_4
> -; RV32I-NEXT:  .LBB38_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 24
> -; RV32I-NEXT:    srai a1, a1, 24
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    blt s0, a1, .LBB38_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB38_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB38_1
> -; RV32I-NEXT:  .LBB38_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB38_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -2620,9 +2606,17 @@ define i8 @atomicrmw_max_i8_acq_rel(i8 *
> ; RV64I-NEXT:    slli a1, a1, 56
> ; RV64I-NEXT:    srai s0, a1, 56
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB38_2
> ; RV64I-NEXT:  .LBB38_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB38_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 56
> +; RV64I-NEXT:    srai a1, a1, 56
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB38_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB38_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB38_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB38_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -2631,18 +2625,8 @@ define i8 @atomicrmw_max_i8_acq_rel(i8 *
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB38_4
> -; RV64I-NEXT:  .LBB38_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 56
> -; RV64I-NEXT:    srai a1, a1, 56
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB38_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB38_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB38_1
> -; RV64I-NEXT:  .LBB38_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB38_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -2699,9 +2683,17 @@ define i8 @atomicrmw_max_i8_seq_cst(i8 *
> ; RV32I-NEXT:    slli a1, a1, 24
> ; RV32I-NEXT:    srai s0, a1, 24
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB39_2
> ; RV32I-NEXT:  .LBB39_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB39_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 24
> +; RV32I-NEXT:    srai a1, a1, 24
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    blt s0, a1, .LBB39_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB39_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB39_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB39_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -2710,18 +2702,8 @@ define i8 @atomicrmw_max_i8_seq_cst(i8 *
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB39_4
> -; RV32I-NEXT:  .LBB39_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 24
> -; RV32I-NEXT:    srai a1, a1, 24
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    blt s0, a1, .LBB39_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB39_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB39_1
> -; RV32I-NEXT:  .LBB39_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB39_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -2774,9 +2756,17 @@ define i8 @atomicrmw_max_i8_seq_cst(i8 *
> ; RV64I-NEXT:    slli a1, a1, 56
> ; RV64I-NEXT:    srai s0, a1, 56
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB39_2
> ; RV64I-NEXT:  .LBB39_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB39_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 56
> +; RV64I-NEXT:    srai a1, a1, 56
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB39_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB39_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB39_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB39_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -2785,18 +2775,8 @@ define i8 @atomicrmw_max_i8_seq_cst(i8 *
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB39_4
> -; RV64I-NEXT:  .LBB39_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 56
> -; RV64I-NEXT:    srai a1, a1, 56
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB39_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB39_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB39_1
> -; RV64I-NEXT:  .LBB39_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB39_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -2853,9 +2833,17 @@ define i8 @atomicrmw_min_i8_monotonic(i8
> ; RV32I-NEXT:    slli a1, a1, 24
> ; RV32I-NEXT:    srai s0, a1, 24
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB40_2
> ; RV32I-NEXT:  .LBB40_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB40_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 24
> +; RV32I-NEXT:    srai a1, a1, 24
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bge s0, a1, .LBB40_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB40_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB40_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB40_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -2864,18 +2852,8 @@ define i8 @atomicrmw_min_i8_monotonic(i8
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB40_4
> -; RV32I-NEXT:  .LBB40_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 24
> -; RV32I-NEXT:    srai a1, a1, 24
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bge s0, a1, .LBB40_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB40_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB40_1
> -; RV32I-NEXT:  .LBB40_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB40_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -2928,9 +2906,17 @@ define i8 @atomicrmw_min_i8_monotonic(i8
> ; RV64I-NEXT:    slli a1, a1, 56
> ; RV64I-NEXT:    srai s0, a1, 56
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB40_2
> ; RV64I-NEXT:  .LBB40_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB40_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 56
> +; RV64I-NEXT:    srai a1, a1, 56
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB40_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB40_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB40_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB40_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -2939,18 +2925,8 @@ define i8 @atomicrmw_min_i8_monotonic(i8
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB40_4
> -; RV64I-NEXT:  .LBB40_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 56
> -; RV64I-NEXT:    srai a1, a1, 56
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB40_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB40_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB40_1
> -; RV64I-NEXT:  .LBB40_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB40_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -3007,9 +2983,17 @@ define i8 @atomicrmw_min_i8_acquire(i8 *
> ; RV32I-NEXT:    slli a1, a1, 24
> ; RV32I-NEXT:    srai s0, a1, 24
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB41_2
> ; RV32I-NEXT:  .LBB41_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB41_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 24
> +; RV32I-NEXT:    srai a1, a1, 24
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bge s0, a1, .LBB41_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB41_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB41_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB41_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -3018,18 +3002,8 @@ define i8 @atomicrmw_min_i8_acquire(i8 *
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB41_4
> -; RV32I-NEXT:  .LBB41_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 24
> -; RV32I-NEXT:    srai a1, a1, 24
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bge s0, a1, .LBB41_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB41_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB41_1
> -; RV32I-NEXT:  .LBB41_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB41_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -3082,9 +3056,17 @@ define i8 @atomicrmw_min_i8_acquire(i8 *
> ; RV64I-NEXT:    slli a1, a1, 56
> ; RV64I-NEXT:    srai s0, a1, 56
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB41_2
> ; RV64I-NEXT:  .LBB41_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB41_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 56
> +; RV64I-NEXT:    srai a1, a1, 56
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB41_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB41_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB41_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB41_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -3093,18 +3075,8 @@ define i8 @atomicrmw_min_i8_acquire(i8 *
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB41_4
> -; RV64I-NEXT:  .LBB41_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 56
> -; RV64I-NEXT:    srai a1, a1, 56
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB41_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB41_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB41_1
> -; RV64I-NEXT:  .LBB41_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB41_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -3161,9 +3133,17 @@ define i8 @atomicrmw_min_i8_release(i8 *
> ; RV32I-NEXT:    slli a1, a1, 24
> ; RV32I-NEXT:    srai s0, a1, 24
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB42_2
> ; RV32I-NEXT:  .LBB42_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB42_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 24
> +; RV32I-NEXT:    srai a1, a1, 24
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bge s0, a1, .LBB42_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB42_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB42_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB42_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -3172,18 +3152,8 @@ define i8 @atomicrmw_min_i8_release(i8 *
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB42_4
> -; RV32I-NEXT:  .LBB42_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 24
> -; RV32I-NEXT:    srai a1, a1, 24
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bge s0, a1, .LBB42_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB42_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB42_1
> -; RV32I-NEXT:  .LBB42_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB42_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -3236,9 +3206,17 @@ define i8 @atomicrmw_min_i8_release(i8 *
> ; RV64I-NEXT:    slli a1, a1, 56
> ; RV64I-NEXT:    srai s0, a1, 56
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB42_2
> ; RV64I-NEXT:  .LBB42_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB42_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 56
> +; RV64I-NEXT:    srai a1, a1, 56
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB42_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB42_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB42_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB42_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -3247,18 +3225,8 @@ define i8 @atomicrmw_min_i8_release(i8 *
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB42_4
> -; RV64I-NEXT:  .LBB42_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 56
> -; RV64I-NEXT:    srai a1, a1, 56
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB42_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB42_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB42_1
> -; RV64I-NEXT:  .LBB42_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB42_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -3315,9 +3283,17 @@ define i8 @atomicrmw_min_i8_acq_rel(i8 *
> ; RV32I-NEXT:    slli a1, a1, 24
> ; RV32I-NEXT:    srai s0, a1, 24
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB43_2
> ; RV32I-NEXT:  .LBB43_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB43_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 24
> +; RV32I-NEXT:    srai a1, a1, 24
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bge s0, a1, .LBB43_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB43_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB43_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB43_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -3326,18 +3302,8 @@ define i8 @atomicrmw_min_i8_acq_rel(i8 *
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB43_4
> -; RV32I-NEXT:  .LBB43_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 24
> -; RV32I-NEXT:    srai a1, a1, 24
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bge s0, a1, .LBB43_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB43_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB43_1
> -; RV32I-NEXT:  .LBB43_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB43_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -3390,9 +3356,17 @@ define i8 @atomicrmw_min_i8_acq_rel(i8 *
> ; RV64I-NEXT:    slli a1, a1, 56
> ; RV64I-NEXT:    srai s0, a1, 56
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB43_2
> ; RV64I-NEXT:  .LBB43_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB43_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 56
> +; RV64I-NEXT:    srai a1, a1, 56
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB43_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB43_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB43_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB43_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -3401,18 +3375,8 @@ define i8 @atomicrmw_min_i8_acq_rel(i8 *
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB43_4
> -; RV64I-NEXT:  .LBB43_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 56
> -; RV64I-NEXT:    srai a1, a1, 56
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB43_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB43_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB43_1
> -; RV64I-NEXT:  .LBB43_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB43_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -3469,9 +3433,17 @@ define i8 @atomicrmw_min_i8_seq_cst(i8 *
> ; RV32I-NEXT:    slli a1, a1, 24
> ; RV32I-NEXT:    srai s0, a1, 24
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB44_2
> ; RV32I-NEXT:  .LBB44_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB44_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 24
> +; RV32I-NEXT:    srai a1, a1, 24
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bge s0, a1, .LBB44_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB44_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB44_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB44_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -3480,18 +3452,8 @@ define i8 @atomicrmw_min_i8_seq_cst(i8 *
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB44_4
> -; RV32I-NEXT:  .LBB44_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 24
> -; RV32I-NEXT:    srai a1, a1, 24
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bge s0, a1, .LBB44_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB44_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB44_1
> -; RV32I-NEXT:  .LBB44_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB44_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -3544,9 +3506,17 @@ define i8 @atomicrmw_min_i8_seq_cst(i8 *
> ; RV64I-NEXT:    slli a1, a1, 56
> ; RV64I-NEXT:    srai s0, a1, 56
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB44_2
> ; RV64I-NEXT:  .LBB44_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB44_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 56
> +; RV64I-NEXT:    srai a1, a1, 56
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB44_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB44_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB44_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB44_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -3555,18 +3525,8 @@ define i8 @atomicrmw_min_i8_seq_cst(i8 *
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB44_4
> -; RV64I-NEXT:  .LBB44_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 56
> -; RV64I-NEXT:    srai a1, a1, 56
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB44_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB44_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB44_1
> -; RV64I-NEXT:  .LBB44_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB44_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -3622,9 +3582,16 @@ define i8 @atomicrmw_umax_i8_monotonic(i
> ; RV32I-NEXT:    lbu a0, 0(a0)
> ; RV32I-NEXT:    andi s0, a1, 255
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB45_2
> ; RV32I-NEXT:  .LBB45_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB45_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    andi a1, a0, 255
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bltu s0, a1, .LBB45_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB45_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB45_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB45_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -3633,17 +3600,8 @@ define i8 @atomicrmw_umax_i8_monotonic(i
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB45_4
> -; RV32I-NEXT:  .LBB45_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    andi a1, a0, 255
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bltu s0, a1, .LBB45_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB45_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB45_1
> -; RV32I-NEXT:  .LBB45_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB45_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -3690,9 +3648,16 @@ define i8 @atomicrmw_umax_i8_monotonic(i
> ; RV64I-NEXT:    lbu a0, 0(a0)
> ; RV64I-NEXT:    andi s0, a1, 255
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB45_2
> ; RV64I-NEXT:  .LBB45_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB45_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    andi a1, a0, 255
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s0, a1, .LBB45_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB45_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB45_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB45_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -3701,17 +3666,8 @@ define i8 @atomicrmw_umax_i8_monotonic(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB45_4
> -; RV64I-NEXT:  .LBB45_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    andi a1, a0, 255
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s0, a1, .LBB45_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB45_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB45_1
> -; RV64I-NEXT:  .LBB45_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB45_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -3762,9 +3718,16 @@ define i8 @atomicrmw_umax_i8_acquire(i8
> ; RV32I-NEXT:    lbu a0, 0(a0)
> ; RV32I-NEXT:    andi s0, a1, 255
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB46_2
> ; RV32I-NEXT:  .LBB46_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB46_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    andi a1, a0, 255
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bltu s0, a1, .LBB46_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB46_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB46_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB46_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -3773,17 +3736,8 @@ define i8 @atomicrmw_umax_i8_acquire(i8
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB46_4
> -; RV32I-NEXT:  .LBB46_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    andi a1, a0, 255
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bltu s0, a1, .LBB46_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB46_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB46_1
> -; RV32I-NEXT:  .LBB46_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB46_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -3830,9 +3784,16 @@ define i8 @atomicrmw_umax_i8_acquire(i8
> ; RV64I-NEXT:    lbu a0, 0(a0)
> ; RV64I-NEXT:    andi s0, a1, 255
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB46_2
> ; RV64I-NEXT:  .LBB46_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB46_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    andi a1, a0, 255
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s0, a1, .LBB46_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB46_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB46_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB46_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -3841,17 +3802,8 @@ define i8 @atomicrmw_umax_i8_acquire(i8
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB46_4
> -; RV64I-NEXT:  .LBB46_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    andi a1, a0, 255
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s0, a1, .LBB46_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB46_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB46_1
> -; RV64I-NEXT:  .LBB46_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB46_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -3902,9 +3854,16 @@ define i8 @atomicrmw_umax_i8_release(i8
> ; RV32I-NEXT:    lbu a0, 0(a0)
> ; RV32I-NEXT:    andi s0, a1, 255
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB47_2
> ; RV32I-NEXT:  .LBB47_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB47_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    andi a1, a0, 255
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bltu s0, a1, .LBB47_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB47_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB47_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB47_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -3913,17 +3872,8 @@ define i8 @atomicrmw_umax_i8_release(i8
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB47_4
> -; RV32I-NEXT:  .LBB47_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    andi a1, a0, 255
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bltu s0, a1, .LBB47_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB47_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB47_1
> -; RV32I-NEXT:  .LBB47_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB47_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -3970,9 +3920,16 @@ define i8 @atomicrmw_umax_i8_release(i8
> ; RV64I-NEXT:    lbu a0, 0(a0)
> ; RV64I-NEXT:    andi s0, a1, 255
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB47_2
> ; RV64I-NEXT:  .LBB47_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB47_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    andi a1, a0, 255
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s0, a1, .LBB47_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB47_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB47_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB47_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -3981,17 +3938,8 @@ define i8 @atomicrmw_umax_i8_release(i8
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB47_4
> -; RV64I-NEXT:  .LBB47_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    andi a1, a0, 255
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s0, a1, .LBB47_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB47_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB47_1
> -; RV64I-NEXT:  .LBB47_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB47_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -4042,9 +3990,16 @@ define i8 @atomicrmw_umax_i8_acq_rel(i8
> ; RV32I-NEXT:    lbu a0, 0(a0)
> ; RV32I-NEXT:    andi s0, a1, 255
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB48_2
> ; RV32I-NEXT:  .LBB48_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB48_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    andi a1, a0, 255
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bltu s0, a1, .LBB48_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB48_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB48_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB48_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -4053,17 +4008,8 @@ define i8 @atomicrmw_umax_i8_acq_rel(i8
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB48_4
> -; RV32I-NEXT:  .LBB48_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    andi a1, a0, 255
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bltu s0, a1, .LBB48_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB48_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB48_1
> -; RV32I-NEXT:  .LBB48_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB48_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -4110,9 +4056,16 @@ define i8 @atomicrmw_umax_i8_acq_rel(i8
> ; RV64I-NEXT:    lbu a0, 0(a0)
> ; RV64I-NEXT:    andi s0, a1, 255
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB48_2
> ; RV64I-NEXT:  .LBB48_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB48_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    andi a1, a0, 255
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s0, a1, .LBB48_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB48_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB48_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB48_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -4121,17 +4074,8 @@ define i8 @atomicrmw_umax_i8_acq_rel(i8
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB48_4
> -; RV64I-NEXT:  .LBB48_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    andi a1, a0, 255
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s0, a1, .LBB48_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB48_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB48_1
> -; RV64I-NEXT:  .LBB48_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB48_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -4182,9 +4126,16 @@ define i8 @atomicrmw_umax_i8_seq_cst(i8
> ; RV32I-NEXT:    lbu a0, 0(a0)
> ; RV32I-NEXT:    andi s0, a1, 255
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB49_2
> ; RV32I-NEXT:  .LBB49_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB49_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    andi a1, a0, 255
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bltu s0, a1, .LBB49_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB49_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB49_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB49_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -4193,17 +4144,8 @@ define i8 @atomicrmw_umax_i8_seq_cst(i8
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB49_4
> -; RV32I-NEXT:  .LBB49_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    andi a1, a0, 255
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bltu s0, a1, .LBB49_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB49_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB49_1
> -; RV32I-NEXT:  .LBB49_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB49_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -4250,9 +4192,16 @@ define i8 @atomicrmw_umax_i8_seq_cst(i8
> ; RV64I-NEXT:    lbu a0, 0(a0)
> ; RV64I-NEXT:    andi s0, a1, 255
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB49_2
> ; RV64I-NEXT:  .LBB49_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB49_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    andi a1, a0, 255
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s0, a1, .LBB49_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB49_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB49_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB49_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -4261,17 +4210,8 @@ define i8 @atomicrmw_umax_i8_seq_cst(i8
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB49_4
> -; RV64I-NEXT:  .LBB49_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    andi a1, a0, 255
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s0, a1, .LBB49_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB49_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB49_1
> -; RV64I-NEXT:  .LBB49_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB49_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -4322,9 +4262,16 @@ define i8 @atomicrmw_umin_i8_monotonic(i
> ; RV32I-NEXT:    lbu a0, 0(a0)
> ; RV32I-NEXT:    andi s0, a1, 255
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB50_2
> ; RV32I-NEXT:  .LBB50_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB50_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    andi a1, a0, 255
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bgeu s0, a1, .LBB50_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB50_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB50_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB50_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -4333,17 +4280,8 @@ define i8 @atomicrmw_umin_i8_monotonic(i
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB50_4
> -; RV32I-NEXT:  .LBB50_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    andi a1, a0, 255
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bgeu s0, a1, .LBB50_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB50_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB50_1
> -; RV32I-NEXT:  .LBB50_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB50_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -4390,9 +4328,16 @@ define i8 @atomicrmw_umin_i8_monotonic(i
> ; RV64I-NEXT:    lbu a0, 0(a0)
> ; RV64I-NEXT:    andi s0, a1, 255
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB50_2
> ; RV64I-NEXT:  .LBB50_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB50_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    andi a1, a0, 255
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s0, a1, .LBB50_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB50_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB50_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB50_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -4401,17 +4346,8 @@ define i8 @atomicrmw_umin_i8_monotonic(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB50_4
> -; RV64I-NEXT:  .LBB50_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    andi a1, a0, 255
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s0, a1, .LBB50_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB50_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB50_1
> -; RV64I-NEXT:  .LBB50_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB50_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -4462,9 +4398,16 @@ define i8 @atomicrmw_umin_i8_acquire(i8
> ; RV32I-NEXT:    lbu a0, 0(a0)
> ; RV32I-NEXT:    andi s0, a1, 255
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB51_2
> ; RV32I-NEXT:  .LBB51_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB51_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    andi a1, a0, 255
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bgeu s0, a1, .LBB51_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB51_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB51_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB51_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -4473,17 +4416,8 @@ define i8 @atomicrmw_umin_i8_acquire(i8
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB51_4
> -; RV32I-NEXT:  .LBB51_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    andi a1, a0, 255
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bgeu s0, a1, .LBB51_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB51_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB51_1
> -; RV32I-NEXT:  .LBB51_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB51_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -4530,9 +4464,16 @@ define i8 @atomicrmw_umin_i8_acquire(i8
> ; RV64I-NEXT:    lbu a0, 0(a0)
> ; RV64I-NEXT:    andi s0, a1, 255
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB51_2
> ; RV64I-NEXT:  .LBB51_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB51_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    andi a1, a0, 255
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s0, a1, .LBB51_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB51_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB51_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB51_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -4541,17 +4482,8 @@ define i8 @atomicrmw_umin_i8_acquire(i8
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB51_4
> -; RV64I-NEXT:  .LBB51_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    andi a1, a0, 255
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s0, a1, .LBB51_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB51_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB51_1
> -; RV64I-NEXT:  .LBB51_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB51_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -4602,9 +4534,16 @@ define i8 @atomicrmw_umin_i8_release(i8
> ; RV32I-NEXT:    lbu a0, 0(a0)
> ; RV32I-NEXT:    andi s0, a1, 255
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB52_2
> ; RV32I-NEXT:  .LBB52_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB52_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    andi a1, a0, 255
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bgeu s0, a1, .LBB52_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB52_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB52_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB52_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -4613,17 +4552,8 @@ define i8 @atomicrmw_umin_i8_release(i8
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB52_4
> -; RV32I-NEXT:  .LBB52_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    andi a1, a0, 255
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bgeu s0, a1, .LBB52_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB52_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB52_1
> -; RV32I-NEXT:  .LBB52_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB52_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -4670,9 +4600,16 @@ define i8 @atomicrmw_umin_i8_release(i8
> ; RV64I-NEXT:    lbu a0, 0(a0)
> ; RV64I-NEXT:    andi s0, a1, 255
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB52_2
> ; RV64I-NEXT:  .LBB52_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB52_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    andi a1, a0, 255
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s0, a1, .LBB52_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB52_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB52_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB52_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -4681,17 +4618,8 @@ define i8 @atomicrmw_umin_i8_release(i8
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB52_4
> -; RV64I-NEXT:  .LBB52_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    andi a1, a0, 255
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s0, a1, .LBB52_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB52_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB52_1
> -; RV64I-NEXT:  .LBB52_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB52_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -4742,9 +4670,16 @@ define i8 @atomicrmw_umin_i8_acq_rel(i8
> ; RV32I-NEXT:    lbu a0, 0(a0)
> ; RV32I-NEXT:    andi s0, a1, 255
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB53_2
> ; RV32I-NEXT:  .LBB53_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB53_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    andi a1, a0, 255
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bgeu s0, a1, .LBB53_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB53_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB53_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB53_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -4753,17 +4688,8 @@ define i8 @atomicrmw_umin_i8_acq_rel(i8
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB53_4
> -; RV32I-NEXT:  .LBB53_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    andi a1, a0, 255
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bgeu s0, a1, .LBB53_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB53_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB53_1
> -; RV32I-NEXT:  .LBB53_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB53_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -4810,9 +4736,16 @@ define i8 @atomicrmw_umin_i8_acq_rel(i8
> ; RV64I-NEXT:    lbu a0, 0(a0)
> ; RV64I-NEXT:    andi s0, a1, 255
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB53_2
> ; RV64I-NEXT:  .LBB53_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB53_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    andi a1, a0, 255
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s0, a1, .LBB53_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB53_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB53_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB53_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -4821,17 +4754,8 @@ define i8 @atomicrmw_umin_i8_acq_rel(i8
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB53_4
> -; RV64I-NEXT:  .LBB53_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    andi a1, a0, 255
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s0, a1, .LBB53_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB53_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB53_1
> -; RV64I-NEXT:  .LBB53_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB53_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -4882,9 +4806,16 @@ define i8 @atomicrmw_umin_i8_seq_cst(i8
> ; RV32I-NEXT:    lbu a0, 0(a0)
> ; RV32I-NEXT:    andi s0, a1, 255
> ; RV32I-NEXT:    addi s3, sp, 11
> -; RV32I-NEXT:    j .LBB54_2
> ; RV32I-NEXT:  .LBB54_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB54_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    andi a1, a0, 255
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bgeu s0, a1, .LBB54_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB54_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB54_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB54_1 Depth=1
> ; RV32I-NEXT:    sb a0, 11(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -4893,17 +4824,8 @@ define i8 @atomicrmw_umin_i8_seq_cst(i8
> ; RV32I-NEXT:    call __atomic_compare_exchange_1
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lb a0, 11(sp)
> -; RV32I-NEXT:    bnez a1, .LBB54_4
> -; RV32I-NEXT:  .LBB54_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    andi a1, a0, 255
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bgeu s0, a1, .LBB54_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB54_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB54_1
> -; RV32I-NEXT:  .LBB54_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB54_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -4950,9 +4872,16 @@ define i8 @atomicrmw_umin_i8_seq_cst(i8
> ; RV64I-NEXT:    lbu a0, 0(a0)
> ; RV64I-NEXT:    andi s0, a1, 255
> ; RV64I-NEXT:    addi s3, sp, 7
> -; RV64I-NEXT:    j .LBB54_2
> ; RV64I-NEXT:  .LBB54_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB54_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    andi a1, a0, 255
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s0, a1, .LBB54_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB54_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB54_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB54_1 Depth=1
> ; RV64I-NEXT:    sb a0, 7(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -4961,17 +4890,8 @@ define i8 @atomicrmw_umin_i8_seq_cst(i8
> ; RV64I-NEXT:    call __atomic_compare_exchange_1
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lb a0, 7(sp)
> -; RV64I-NEXT:    bnez a1, .LBB54_4
> -; RV64I-NEXT:  .LBB54_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    andi a1, a0, 255
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s0, a1, .LBB54_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB54_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB54_1
> -; RV64I-NEXT:  .LBB54_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB54_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -7173,9 +7093,17 @@ define i16 @atomicrmw_max_i16_monotonic(
> ; RV32I-NEXT:    slli a1, a1, 16
> ; RV32I-NEXT:    srai s0, a1, 16
> ; RV32I-NEXT:    addi s3, sp, 10
> -; RV32I-NEXT:    j .LBB90_2
> ; RV32I-NEXT:  .LBB90_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB90_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 16
> +; RV32I-NEXT:    srai a1, a1, 16
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    blt s0, a1, .LBB90_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB90_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB90_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB90_1 Depth=1
> ; RV32I-NEXT:    sh a0, 10(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -7184,18 +7112,8 @@ define i16 @atomicrmw_max_i16_monotonic(
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 10(sp)
> -; RV32I-NEXT:    bnez a1, .LBB90_4
> -; RV32I-NEXT:  .LBB90_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 16
> -; RV32I-NEXT:    srai a1, a1, 16
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    blt s0, a1, .LBB90_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB90_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB90_1
> -; RV32I-NEXT:  .LBB90_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB90_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -7249,9 +7167,17 @@ define i16 @atomicrmw_max_i16_monotonic(
> ; RV64I-NEXT:    slli a1, a1, 48
> ; RV64I-NEXT:    srai s0, a1, 48
> ; RV64I-NEXT:    addi s3, sp, 6
> -; RV64I-NEXT:    j .LBB90_2
> ; RV64I-NEXT:  .LBB90_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB90_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 48
> +; RV64I-NEXT:    srai a1, a1, 48
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB90_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB90_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB90_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB90_1 Depth=1
> ; RV64I-NEXT:    sh a0, 6(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -7260,18 +7186,8 @@ define i16 @atomicrmw_max_i16_monotonic(
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 6(sp)
> -; RV64I-NEXT:    bnez a1, .LBB90_4
> -; RV64I-NEXT:  .LBB90_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 48
> -; RV64I-NEXT:    srai a1, a1, 48
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB90_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB90_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB90_1
> -; RV64I-NEXT:  .LBB90_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB90_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -7329,9 +7245,17 @@ define i16 @atomicrmw_max_i16_acquire(i1
> ; RV32I-NEXT:    slli a1, a1, 16
> ; RV32I-NEXT:    srai s0, a1, 16
> ; RV32I-NEXT:    addi s3, sp, 10
> -; RV32I-NEXT:    j .LBB91_2
> ; RV32I-NEXT:  .LBB91_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB91_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 16
> +; RV32I-NEXT:    srai a1, a1, 16
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    blt s0, a1, .LBB91_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB91_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB91_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB91_1 Depth=1
> ; RV32I-NEXT:    sh a0, 10(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -7340,18 +7264,8 @@ define i16 @atomicrmw_max_i16_acquire(i1
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 10(sp)
> -; RV32I-NEXT:    bnez a1, .LBB91_4
> -; RV32I-NEXT:  .LBB91_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 16
> -; RV32I-NEXT:    srai a1, a1, 16
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    blt s0, a1, .LBB91_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB91_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB91_1
> -; RV32I-NEXT:  .LBB91_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB91_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -7405,9 +7319,17 @@ define i16 @atomicrmw_max_i16_acquire(i1
> ; RV64I-NEXT:    slli a1, a1, 48
> ; RV64I-NEXT:    srai s0, a1, 48
> ; RV64I-NEXT:    addi s3, sp, 6
> -; RV64I-NEXT:    j .LBB91_2
> ; RV64I-NEXT:  .LBB91_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB91_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 48
> +; RV64I-NEXT:    srai a1, a1, 48
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB91_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB91_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB91_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB91_1 Depth=1
> ; RV64I-NEXT:    sh a0, 6(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -7416,18 +7338,8 @@ define i16 @atomicrmw_max_i16_acquire(i1
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 6(sp)
> -; RV64I-NEXT:    bnez a1, .LBB91_4
> -; RV64I-NEXT:  .LBB91_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 48
> -; RV64I-NEXT:    srai a1, a1, 48
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB91_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB91_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB91_1
> -; RV64I-NEXT:  .LBB91_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB91_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -7485,9 +7397,17 @@ define i16 @atomicrmw_max_i16_release(i1
> ; RV32I-NEXT:    slli a1, a1, 16
> ; RV32I-NEXT:    srai s0, a1, 16
> ; RV32I-NEXT:    addi s3, sp, 10
> -; RV32I-NEXT:    j .LBB92_2
> ; RV32I-NEXT:  .LBB92_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB92_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 16
> +; RV32I-NEXT:    srai a1, a1, 16
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    blt s0, a1, .LBB92_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB92_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB92_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB92_1 Depth=1
> ; RV32I-NEXT:    sh a0, 10(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -7496,18 +7416,8 @@ define i16 @atomicrmw_max_i16_release(i1
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 10(sp)
> -; RV32I-NEXT:    bnez a1, .LBB92_4
> -; RV32I-NEXT:  .LBB92_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 16
> -; RV32I-NEXT:    srai a1, a1, 16
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    blt s0, a1, .LBB92_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB92_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB92_1
> -; RV32I-NEXT:  .LBB92_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB92_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -7561,9 +7471,17 @@ define i16 @atomicrmw_max_i16_release(i1
> ; RV64I-NEXT:    slli a1, a1, 48
> ; RV64I-NEXT:    srai s0, a1, 48
> ; RV64I-NEXT:    addi s3, sp, 6
> -; RV64I-NEXT:    j .LBB92_2
> ; RV64I-NEXT:  .LBB92_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB92_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 48
> +; RV64I-NEXT:    srai a1, a1, 48
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB92_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB92_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB92_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB92_1 Depth=1
> ; RV64I-NEXT:    sh a0, 6(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -7572,18 +7490,8 @@ define i16 @atomicrmw_max_i16_release(i1
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 6(sp)
> -; RV64I-NEXT:    bnez a1, .LBB92_4
> -; RV64I-NEXT:  .LBB92_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 48
> -; RV64I-NEXT:    srai a1, a1, 48
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB92_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB92_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB92_1
> -; RV64I-NEXT:  .LBB92_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB92_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -7641,9 +7549,17 @@ define i16 @atomicrmw_max_i16_acq_rel(i1
> ; RV32I-NEXT:    slli a1, a1, 16
> ; RV32I-NEXT:    srai s0, a1, 16
> ; RV32I-NEXT:    addi s3, sp, 10
> -; RV32I-NEXT:    j .LBB93_2
> ; RV32I-NEXT:  .LBB93_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB93_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 16
> +; RV32I-NEXT:    srai a1, a1, 16
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    blt s0, a1, .LBB93_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB93_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB93_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB93_1 Depth=1
> ; RV32I-NEXT:    sh a0, 10(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -7652,18 +7568,8 @@ define i16 @atomicrmw_max_i16_acq_rel(i1
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 10(sp)
> -; RV32I-NEXT:    bnez a1, .LBB93_4
> -; RV32I-NEXT:  .LBB93_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 16
> -; RV32I-NEXT:    srai a1, a1, 16
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    blt s0, a1, .LBB93_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB93_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB93_1
> -; RV32I-NEXT:  .LBB93_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB93_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -7717,9 +7623,17 @@ define i16 @atomicrmw_max_i16_acq_rel(i1
> ; RV64I-NEXT:    slli a1, a1, 48
> ; RV64I-NEXT:    srai s0, a1, 48
> ; RV64I-NEXT:    addi s3, sp, 6
> -; RV64I-NEXT:    j .LBB93_2
> ; RV64I-NEXT:  .LBB93_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB93_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 48
> +; RV64I-NEXT:    srai a1, a1, 48
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB93_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB93_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB93_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB93_1 Depth=1
> ; RV64I-NEXT:    sh a0, 6(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -7728,18 +7642,8 @@ define i16 @atomicrmw_max_i16_acq_rel(i1
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 6(sp)
> -; RV64I-NEXT:    bnez a1, .LBB93_4
> -; RV64I-NEXT:  .LBB93_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 48
> -; RV64I-NEXT:    srai a1, a1, 48
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB93_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB93_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB93_1
> -; RV64I-NEXT:  .LBB93_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB93_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -7797,9 +7701,17 @@ define i16 @atomicrmw_max_i16_seq_cst(i1
> ; RV32I-NEXT:    slli a1, a1, 16
> ; RV32I-NEXT:    srai s0, a1, 16
> ; RV32I-NEXT:    addi s3, sp, 10
> -; RV32I-NEXT:    j .LBB94_2
> ; RV32I-NEXT:  .LBB94_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB94_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 16
> +; RV32I-NEXT:    srai a1, a1, 16
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    blt s0, a1, .LBB94_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB94_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB94_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB94_1 Depth=1
> ; RV32I-NEXT:    sh a0, 10(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -7808,18 +7720,8 @@ define i16 @atomicrmw_max_i16_seq_cst(i1
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 10(sp)
> -; RV32I-NEXT:    bnez a1, .LBB94_4
> -; RV32I-NEXT:  .LBB94_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 16
> -; RV32I-NEXT:    srai a1, a1, 16
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    blt s0, a1, .LBB94_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB94_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB94_1
> -; RV32I-NEXT:  .LBB94_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB94_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -7873,9 +7775,17 @@ define i16 @atomicrmw_max_i16_seq_cst(i1
> ; RV64I-NEXT:    slli a1, a1, 48
> ; RV64I-NEXT:    srai s0, a1, 48
> ; RV64I-NEXT:    addi s3, sp, 6
> -; RV64I-NEXT:    j .LBB94_2
> ; RV64I-NEXT:  .LBB94_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB94_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 48
> +; RV64I-NEXT:    srai a1, a1, 48
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB94_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB94_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB94_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB94_1 Depth=1
> ; RV64I-NEXT:    sh a0, 6(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -7884,18 +7794,8 @@ define i16 @atomicrmw_max_i16_seq_cst(i1
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 6(sp)
> -; RV64I-NEXT:    bnez a1, .LBB94_4
> -; RV64I-NEXT:  .LBB94_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 48
> -; RV64I-NEXT:    srai a1, a1, 48
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB94_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB94_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB94_1
> -; RV64I-NEXT:  .LBB94_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB94_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -7953,29 +7853,27 @@ define i16 @atomicrmw_min_i16_monotonic(
> ; RV32I-NEXT:    slli a1, a1, 16
> ; RV32I-NEXT:    srai s0, a1, 16
> ; RV32I-NEXT:    addi s3, sp, 10
> -; RV32I-NEXT:    j .LBB95_2
> ; RV32I-NEXT:  .LBB95_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB95_2 Depth=1
> -; RV32I-NEXT:    sh a0, 10(sp)
> -; RV32I-NEXT:    mv a0, s1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 16
> +; RV32I-NEXT:    srai a1, a1, 16
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bge s0, a1, .LBB95_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB95_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB95_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB95_1 Depth=1
> +; RV32I-NEXT:    sh a0, 10(sp)
> +; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> ; RV32I-NEXT:    mv a3, zero
> ; RV32I-NEXT:    mv a4, zero
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 10(sp)
> -; RV32I-NEXT:    bnez a1, .LBB95_4
> -; RV32I-NEXT:  .LBB95_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 16
> -; RV32I-NEXT:    srai a1, a1, 16
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bge s0, a1, .LBB95_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB95_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB95_1
> -; RV32I-NEXT:  .LBB95_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB95_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -8029,9 +7927,17 @@ define i16 @atomicrmw_min_i16_monotonic(
> ; RV64I-NEXT:    slli a1, a1, 48
> ; RV64I-NEXT:    srai s0, a1, 48
> ; RV64I-NEXT:    addi s3, sp, 6
> -; RV64I-NEXT:    j .LBB95_2
> ; RV64I-NEXT:  .LBB95_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB95_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 48
> +; RV64I-NEXT:    srai a1, a1, 48
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB95_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB95_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB95_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB95_1 Depth=1
> ; RV64I-NEXT:    sh a0, 6(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -8040,18 +7946,8 @@ define i16 @atomicrmw_min_i16_monotonic(
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 6(sp)
> -; RV64I-NEXT:    bnez a1, .LBB95_4
> -; RV64I-NEXT:  .LBB95_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 48
> -; RV64I-NEXT:    srai a1, a1, 48
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB95_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB95_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB95_1
> -; RV64I-NEXT:  .LBB95_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB95_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -8109,9 +8005,17 @@ define i16 @atomicrmw_min_i16_acquire(i1
> ; RV32I-NEXT:    slli a1, a1, 16
> ; RV32I-NEXT:    srai s0, a1, 16
> ; RV32I-NEXT:    addi s3, sp, 10
> -; RV32I-NEXT:    j .LBB96_2
> ; RV32I-NEXT:  .LBB96_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB96_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 16
> +; RV32I-NEXT:    srai a1, a1, 16
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bge s0, a1, .LBB96_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB96_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB96_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB96_1 Depth=1
> ; RV32I-NEXT:    sh a0, 10(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -8120,18 +8024,8 @@ define i16 @atomicrmw_min_i16_acquire(i1
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 10(sp)
> -; RV32I-NEXT:    bnez a1, .LBB96_4
> -; RV32I-NEXT:  .LBB96_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 16
> -; RV32I-NEXT:    srai a1, a1, 16
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bge s0, a1, .LBB96_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB96_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB96_1
> -; RV32I-NEXT:  .LBB96_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB96_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -8185,9 +8079,17 @@ define i16 @atomicrmw_min_i16_acquire(i1
> ; RV64I-NEXT:    slli a1, a1, 48
> ; RV64I-NEXT:    srai s0, a1, 48
> ; RV64I-NEXT:    addi s3, sp, 6
> -; RV64I-NEXT:    j .LBB96_2
> ; RV64I-NEXT:  .LBB96_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB96_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 48
> +; RV64I-NEXT:    srai a1, a1, 48
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB96_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB96_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB96_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB96_1 Depth=1
> ; RV64I-NEXT:    sh a0, 6(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -8196,18 +8098,8 @@ define i16 @atomicrmw_min_i16_acquire(i1
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 6(sp)
> -; RV64I-NEXT:    bnez a1, .LBB96_4
> -; RV64I-NEXT:  .LBB96_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 48
> -; RV64I-NEXT:    srai a1, a1, 48
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB96_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB96_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB96_1
> -; RV64I-NEXT:  .LBB96_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB96_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -8265,9 +8157,17 @@ define i16 @atomicrmw_min_i16_release(i1
> ; RV32I-NEXT:    slli a1, a1, 16
> ; RV32I-NEXT:    srai s0, a1, 16
> ; RV32I-NEXT:    addi s3, sp, 10
> -; RV32I-NEXT:    j .LBB97_2
> ; RV32I-NEXT:  .LBB97_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB97_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 16
> +; RV32I-NEXT:    srai a1, a1, 16
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bge s0, a1, .LBB97_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB97_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB97_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB97_1 Depth=1
> ; RV32I-NEXT:    sh a0, 10(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -8276,18 +8176,8 @@ define i16 @atomicrmw_min_i16_release(i1
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 10(sp)
> -; RV32I-NEXT:    bnez a1, .LBB97_4
> -; RV32I-NEXT:  .LBB97_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 16
> -; RV32I-NEXT:    srai a1, a1, 16
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bge s0, a1, .LBB97_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB97_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB97_1
> -; RV32I-NEXT:  .LBB97_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB97_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -8341,9 +8231,17 @@ define i16 @atomicrmw_min_i16_release(i1
> ; RV64I-NEXT:    slli a1, a1, 48
> ; RV64I-NEXT:    srai s0, a1, 48
> ; RV64I-NEXT:    addi s3, sp, 6
> -; RV64I-NEXT:    j .LBB97_2
> ; RV64I-NEXT:  .LBB97_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB97_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 48
> +; RV64I-NEXT:    srai a1, a1, 48
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB97_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB97_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB97_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB97_1 Depth=1
> ; RV64I-NEXT:    sh a0, 6(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -8352,18 +8250,8 @@ define i16 @atomicrmw_min_i16_release(i1
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 6(sp)
> -; RV64I-NEXT:    bnez a1, .LBB97_4
> -; RV64I-NEXT:  .LBB97_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 48
> -; RV64I-NEXT:    srai a1, a1, 48
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB97_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB97_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB97_1
> -; RV64I-NEXT:  .LBB97_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB97_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -8421,9 +8309,17 @@ define i16 @atomicrmw_min_i16_acq_rel(i1
> ; RV32I-NEXT:    slli a1, a1, 16
> ; RV32I-NEXT:    srai s0, a1, 16
> ; RV32I-NEXT:    addi s3, sp, 10
> -; RV32I-NEXT:    j .LBB98_2
> ; RV32I-NEXT:  .LBB98_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB98_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 16
> +; RV32I-NEXT:    srai a1, a1, 16
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bge s0, a1, .LBB98_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB98_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB98_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB98_1 Depth=1
> ; RV32I-NEXT:    sh a0, 10(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -8432,18 +8328,8 @@ define i16 @atomicrmw_min_i16_acq_rel(i1
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 10(sp)
> -; RV32I-NEXT:    bnez a1, .LBB98_4
> -; RV32I-NEXT:  .LBB98_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 16
> -; RV32I-NEXT:    srai a1, a1, 16
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bge s0, a1, .LBB98_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB98_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB98_1
> -; RV32I-NEXT:  .LBB98_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB98_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -8497,9 +8383,17 @@ define i16 @atomicrmw_min_i16_acq_rel(i1
> ; RV64I-NEXT:    slli a1, a1, 48
> ; RV64I-NEXT:    srai s0, a1, 48
> ; RV64I-NEXT:    addi s3, sp, 6
> -; RV64I-NEXT:    j .LBB98_2
> ; RV64I-NEXT:  .LBB98_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB98_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 48
> +; RV64I-NEXT:    srai a1, a1, 48
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB98_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB98_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB98_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB98_1 Depth=1
> ; RV64I-NEXT:    sh a0, 6(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -8508,18 +8402,8 @@ define i16 @atomicrmw_min_i16_acq_rel(i1
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 6(sp)
> -; RV64I-NEXT:    bnez a1, .LBB98_4
> -; RV64I-NEXT:  .LBB98_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 48
> -; RV64I-NEXT:    srai a1, a1, 48
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB98_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB98_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB98_1
> -; RV64I-NEXT:  .LBB98_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB98_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -8577,9 +8461,17 @@ define i16 @atomicrmw_min_i16_seq_cst(i1
> ; RV32I-NEXT:    slli a1, a1, 16
> ; RV32I-NEXT:    srai s0, a1, 16
> ; RV32I-NEXT:    addi s3, sp, 10
> -; RV32I-NEXT:    j .LBB99_2
> ; RV32I-NEXT:  .LBB99_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB99_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    slli a1, a0, 16
> +; RV32I-NEXT:    srai a1, a1, 16
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bge s0, a1, .LBB99_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB99_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB99_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB99_1 Depth=1
> ; RV32I-NEXT:    sh a0, 10(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -8588,18 +8480,8 @@ define i16 @atomicrmw_min_i16_seq_cst(i1
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 10(sp)
> -; RV32I-NEXT:    bnez a1, .LBB99_4
> -; RV32I-NEXT:  .LBB99_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    slli a1, a0, 16
> -; RV32I-NEXT:    srai a1, a1, 16
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bge s0, a1, .LBB99_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB99_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB99_1
> -; RV32I-NEXT:  .LBB99_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB99_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -8653,9 +8535,17 @@ define i16 @atomicrmw_min_i16_seq_cst(i1
> ; RV64I-NEXT:    slli a1, a1, 48
> ; RV64I-NEXT:    srai s0, a1, 48
> ; RV64I-NEXT:    addi s3, sp, 6
> -; RV64I-NEXT:    j .LBB99_2
> ; RV64I-NEXT:  .LBB99_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB99_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    slli a1, a0, 48
> +; RV64I-NEXT:    srai a1, a1, 48
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB99_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB99_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB99_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB99_1 Depth=1
> ; RV64I-NEXT:    sh a0, 6(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -8664,18 +8554,8 @@ define i16 @atomicrmw_min_i16_seq_cst(i1
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 6(sp)
> -; RV64I-NEXT:    bnez a1, .LBB99_4
> -; RV64I-NEXT:  .LBB99_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    slli a1, a0, 48
> -; RV64I-NEXT:    srai a1, a1, 48
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB99_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB99_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB99_1
> -; RV64I-NEXT:  .LBB99_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB99_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -8735,9 +8615,16 @@ define i16 @atomicrmw_umax_i16_monotonic
> ; RV32I-NEXT:    addi s0, a1, -1
> ; RV32I-NEXT:    and s1, s2, s0
> ; RV32I-NEXT:    addi s3, sp, 6
> -; RV32I-NEXT:    j .LBB100_2
> ; RV32I-NEXT:  .LBB100_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB100_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    and a1, a0, s0
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bltu s1, a1, .LBB100_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB100_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB100_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB100_1 Depth=1
> ; RV32I-NEXT:    sh a0, 6(sp)
> ; RV32I-NEXT:    mv a0, s4
> ; RV32I-NEXT:    mv a1, s3
> @@ -8746,17 +8633,8 @@ define i16 @atomicrmw_umax_i16_monotonic
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 6(sp)
> -; RV32I-NEXT:    bnez a1, .LBB100_4
> -; RV32I-NEXT:  .LBB100_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    and a1, a0, s0
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bltu s1, a1, .LBB100_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB100_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB100_1
> -; RV32I-NEXT:  .LBB100_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB100_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s4, 8(sp)
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -8808,9 +8686,16 @@ define i16 @atomicrmw_umax_i16_monotonic
> ; RV64I-NEXT:    addiw s0, a1, -1
> ; RV64I-NEXT:    and s1, s2, s0
> ; RV64I-NEXT:    addi s3, sp, 14
> -; RV64I-NEXT:    j .LBB100_2
> ; RV64I-NEXT:  .LBB100_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB100_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    and a1, a0, s0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s1, a1, .LBB100_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB100_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB100_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB100_1 Depth=1
> ; RV64I-NEXT:    sh a0, 14(sp)
> ; RV64I-NEXT:    mv a0, s4
> ; RV64I-NEXT:    mv a1, s3
> @@ -8819,17 +8704,8 @@ define i16 @atomicrmw_umax_i16_monotonic
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 14(sp)
> -; RV64I-NEXT:    bnez a1, .LBB100_4
> -; RV64I-NEXT:  .LBB100_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    and a1, a0, s0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s1, a1, .LBB100_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB100_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB100_1
> -; RV64I-NEXT:  .LBB100_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB100_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s4, 16(sp)
> ; RV64I-NEXT:    ld s3, 24(sp)
> ; RV64I-NEXT:    ld s2, 32(sp)
> @@ -8885,9 +8761,16 @@ define i16 @atomicrmw_umax_i16_acquire(i
> ; RV32I-NEXT:    addi s0, a1, -1
> ; RV32I-NEXT:    and s1, s2, s0
> ; RV32I-NEXT:    addi s3, sp, 6
> -; RV32I-NEXT:    j .LBB101_2
> ; RV32I-NEXT:  .LBB101_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB101_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    and a1, a0, s0
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bltu s1, a1, .LBB101_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB101_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB101_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB101_1 Depth=1
> ; RV32I-NEXT:    sh a0, 6(sp)
> ; RV32I-NEXT:    mv a0, s4
> ; RV32I-NEXT:    mv a1, s3
> @@ -8896,17 +8779,8 @@ define i16 @atomicrmw_umax_i16_acquire(i
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 6(sp)
> -; RV32I-NEXT:    bnez a1, .LBB101_4
> -; RV32I-NEXT:  .LBB101_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    and a1, a0, s0
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bltu s1, a1, .LBB101_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB101_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB101_1
> -; RV32I-NEXT:  .LBB101_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB101_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s4, 8(sp)
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -8958,9 +8832,16 @@ define i16 @atomicrmw_umax_i16_acquire(i
> ; RV64I-NEXT:    addiw s0, a1, -1
> ; RV64I-NEXT:    and s1, s2, s0
> ; RV64I-NEXT:    addi s3, sp, 14
> -; RV64I-NEXT:    j .LBB101_2
> ; RV64I-NEXT:  .LBB101_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB101_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    and a1, a0, s0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s1, a1, .LBB101_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB101_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB101_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB101_1 Depth=1
> ; RV64I-NEXT:    sh a0, 14(sp)
> ; RV64I-NEXT:    mv a0, s4
> ; RV64I-NEXT:    mv a1, s3
> @@ -8969,17 +8850,8 @@ define i16 @atomicrmw_umax_i16_acquire(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 14(sp)
> -; RV64I-NEXT:    bnez a1, .LBB101_4
> -; RV64I-NEXT:  .LBB101_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    and a1, a0, s0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s1, a1, .LBB101_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB101_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB101_1
> -; RV64I-NEXT:  .LBB101_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB101_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s4, 16(sp)
> ; RV64I-NEXT:    ld s3, 24(sp)
> ; RV64I-NEXT:    ld s2, 32(sp)
> @@ -9035,9 +8907,16 @@ define i16 @atomicrmw_umax_i16_release(i
> ; RV32I-NEXT:    addi s0, a1, -1
> ; RV32I-NEXT:    and s1, s2, s0
> ; RV32I-NEXT:    addi s3, sp, 6
> -; RV32I-NEXT:    j .LBB102_2
> ; RV32I-NEXT:  .LBB102_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB102_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    and a1, a0, s0
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bltu s1, a1, .LBB102_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB102_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB102_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB102_1 Depth=1
> ; RV32I-NEXT:    sh a0, 6(sp)
> ; RV32I-NEXT:    mv a0, s4
> ; RV32I-NEXT:    mv a1, s3
> @@ -9046,17 +8925,8 @@ define i16 @atomicrmw_umax_i16_release(i
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 6(sp)
> -; RV32I-NEXT:    bnez a1, .LBB102_4
> -; RV32I-NEXT:  .LBB102_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    and a1, a0, s0
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bltu s1, a1, .LBB102_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB102_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB102_1
> -; RV32I-NEXT:  .LBB102_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB102_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s4, 8(sp)
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -9108,9 +8978,16 @@ define i16 @atomicrmw_umax_i16_release(i
> ; RV64I-NEXT:    addiw s0, a1, -1
> ; RV64I-NEXT:    and s1, s2, s0
> ; RV64I-NEXT:    addi s3, sp, 14
> -; RV64I-NEXT:    j .LBB102_2
> ; RV64I-NEXT:  .LBB102_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB102_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    and a1, a0, s0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s1, a1, .LBB102_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB102_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB102_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB102_1 Depth=1
> ; RV64I-NEXT:    sh a0, 14(sp)
> ; RV64I-NEXT:    mv a0, s4
> ; RV64I-NEXT:    mv a1, s3
> @@ -9119,17 +8996,8 @@ define i16 @atomicrmw_umax_i16_release(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 14(sp)
> -; RV64I-NEXT:    bnez a1, .LBB102_4
> -; RV64I-NEXT:  .LBB102_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    and a1, a0, s0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s1, a1, .LBB102_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB102_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB102_1
> -; RV64I-NEXT:  .LBB102_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB102_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s4, 16(sp)
> ; RV64I-NEXT:    ld s3, 24(sp)
> ; RV64I-NEXT:    ld s2, 32(sp)
> @@ -9185,9 +9053,16 @@ define i16 @atomicrmw_umax_i16_acq_rel(i
> ; RV32I-NEXT:    addi s0, a1, -1
> ; RV32I-NEXT:    and s1, s2, s0
> ; RV32I-NEXT:    addi s3, sp, 6
> -; RV32I-NEXT:    j .LBB103_2
> ; RV32I-NEXT:  .LBB103_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB103_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    and a1, a0, s0
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bltu s1, a1, .LBB103_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB103_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB103_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB103_1 Depth=1
> ; RV32I-NEXT:    sh a0, 6(sp)
> ; RV32I-NEXT:    mv a0, s4
> ; RV32I-NEXT:    mv a1, s3
> @@ -9196,17 +9071,8 @@ define i16 @atomicrmw_umax_i16_acq_rel(i
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 6(sp)
> -; RV32I-NEXT:    bnez a1, .LBB103_4
> -; RV32I-NEXT:  .LBB103_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    and a1, a0, s0
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bltu s1, a1, .LBB103_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB103_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB103_1
> -; RV32I-NEXT:  .LBB103_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB103_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s4, 8(sp)
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -9258,9 +9124,16 @@ define i16 @atomicrmw_umax_i16_acq_rel(i
> ; RV64I-NEXT:    addiw s0, a1, -1
> ; RV64I-NEXT:    and s1, s2, s0
> ; RV64I-NEXT:    addi s3, sp, 14
> -; RV64I-NEXT:    j .LBB103_2
> ; RV64I-NEXT:  .LBB103_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB103_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    and a1, a0, s0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s1, a1, .LBB103_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB103_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB103_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB103_1 Depth=1
> ; RV64I-NEXT:    sh a0, 14(sp)
> ; RV64I-NEXT:    mv a0, s4
> ; RV64I-NEXT:    mv a1, s3
> @@ -9269,17 +9142,8 @@ define i16 @atomicrmw_umax_i16_acq_rel(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 14(sp)
> -; RV64I-NEXT:    bnez a1, .LBB103_4
> -; RV64I-NEXT:  .LBB103_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    and a1, a0, s0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s1, a1, .LBB103_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB103_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB103_1
> -; RV64I-NEXT:  .LBB103_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB103_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s4, 16(sp)
> ; RV64I-NEXT:    ld s3, 24(sp)
> ; RV64I-NEXT:    ld s2, 32(sp)
> @@ -9335,9 +9199,16 @@ define i16 @atomicrmw_umax_i16_seq_cst(i
> ; RV32I-NEXT:    addi s0, a1, -1
> ; RV32I-NEXT:    and s1, s2, s0
> ; RV32I-NEXT:    addi s3, sp, 6
> -; RV32I-NEXT:    j .LBB104_2
> ; RV32I-NEXT:  .LBB104_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB104_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    and a1, a0, s0
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bltu s1, a1, .LBB104_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB104_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB104_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB104_1 Depth=1
> ; RV32I-NEXT:    sh a0, 6(sp)
> ; RV32I-NEXT:    mv a0, s4
> ; RV32I-NEXT:    mv a1, s3
> @@ -9346,17 +9217,8 @@ define i16 @atomicrmw_umax_i16_seq_cst(i
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 6(sp)
> -; RV32I-NEXT:    bnez a1, .LBB104_4
> -; RV32I-NEXT:  .LBB104_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    and a1, a0, s0
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bltu s1, a1, .LBB104_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB104_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB104_1
> -; RV32I-NEXT:  .LBB104_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB104_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s4, 8(sp)
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -9408,9 +9270,16 @@ define i16 @atomicrmw_umax_i16_seq_cst(i
> ; RV64I-NEXT:    addiw s0, a1, -1
> ; RV64I-NEXT:    and s1, s2, s0
> ; RV64I-NEXT:    addi s3, sp, 14
> -; RV64I-NEXT:    j .LBB104_2
> ; RV64I-NEXT:  .LBB104_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB104_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    and a1, a0, s0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s1, a1, .LBB104_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB104_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB104_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB104_1 Depth=1
> ; RV64I-NEXT:    sh a0, 14(sp)
> ; RV64I-NEXT:    mv a0, s4
> ; RV64I-NEXT:    mv a1, s3
> @@ -9419,17 +9288,8 @@ define i16 @atomicrmw_umax_i16_seq_cst(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 14(sp)
> -; RV64I-NEXT:    bnez a1, .LBB104_4
> -; RV64I-NEXT:  .LBB104_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    and a1, a0, s0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s1, a1, .LBB104_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB104_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB104_1
> -; RV64I-NEXT:  .LBB104_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB104_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s4, 16(sp)
> ; RV64I-NEXT:    ld s3, 24(sp)
> ; RV64I-NEXT:    ld s2, 32(sp)
> @@ -9485,9 +9345,16 @@ define i16 @atomicrmw_umin_i16_monotonic
> ; RV32I-NEXT:    addi s0, a1, -1
> ; RV32I-NEXT:    and s1, s2, s0
> ; RV32I-NEXT:    addi s3, sp, 6
> -; RV32I-NEXT:    j .LBB105_2
> ; RV32I-NEXT:  .LBB105_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB105_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    and a1, a0, s0
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bgeu s1, a1, .LBB105_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB105_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB105_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB105_1 Depth=1
> ; RV32I-NEXT:    sh a0, 6(sp)
> ; RV32I-NEXT:    mv a0, s4
> ; RV32I-NEXT:    mv a1, s3
> @@ -9496,17 +9363,8 @@ define i16 @atomicrmw_umin_i16_monotonic
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 6(sp)
> -; RV32I-NEXT:    bnez a1, .LBB105_4
> -; RV32I-NEXT:  .LBB105_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    and a1, a0, s0
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bgeu s1, a1, .LBB105_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB105_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB105_1
> -; RV32I-NEXT:  .LBB105_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB105_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s4, 8(sp)
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -9558,9 +9416,16 @@ define i16 @atomicrmw_umin_i16_monotonic
> ; RV64I-NEXT:    addiw s0, a1, -1
> ; RV64I-NEXT:    and s1, s2, s0
> ; RV64I-NEXT:    addi s3, sp, 14
> -; RV64I-NEXT:    j .LBB105_2
> ; RV64I-NEXT:  .LBB105_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB105_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    and a1, a0, s0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s1, a1, .LBB105_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB105_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB105_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB105_1 Depth=1
> ; RV64I-NEXT:    sh a0, 14(sp)
> ; RV64I-NEXT:    mv a0, s4
> ; RV64I-NEXT:    mv a1, s3
> @@ -9569,17 +9434,8 @@ define i16 @atomicrmw_umin_i16_monotonic
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 14(sp)
> -; RV64I-NEXT:    bnez a1, .LBB105_4
> -; RV64I-NEXT:  .LBB105_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    and a1, a0, s0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s1, a1, .LBB105_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB105_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB105_1
> -; RV64I-NEXT:  .LBB105_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB105_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s4, 16(sp)
> ; RV64I-NEXT:    ld s3, 24(sp)
> ; RV64I-NEXT:    ld s2, 32(sp)
> @@ -9635,9 +9491,16 @@ define i16 @atomicrmw_umin_i16_acquire(i
> ; RV32I-NEXT:    addi s0, a1, -1
> ; RV32I-NEXT:    and s1, s2, s0
> ; RV32I-NEXT:    addi s3, sp, 6
> -; RV32I-NEXT:    j .LBB106_2
> ; RV32I-NEXT:  .LBB106_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB106_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    and a1, a0, s0
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bgeu s1, a1, .LBB106_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB106_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB106_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB106_1 Depth=1
> ; RV32I-NEXT:    sh a0, 6(sp)
> ; RV32I-NEXT:    mv a0, s4
> ; RV32I-NEXT:    mv a1, s3
> @@ -9646,17 +9509,8 @@ define i16 @atomicrmw_umin_i16_acquire(i
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 6(sp)
> -; RV32I-NEXT:    bnez a1, .LBB106_4
> -; RV32I-NEXT:  .LBB106_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    and a1, a0, s0
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bgeu s1, a1, .LBB106_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB106_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB106_1
> -; RV32I-NEXT:  .LBB106_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB106_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s4, 8(sp)
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -9708,9 +9562,16 @@ define i16 @atomicrmw_umin_i16_acquire(i
> ; RV64I-NEXT:    addiw s0, a1, -1
> ; RV64I-NEXT:    and s1, s2, s0
> ; RV64I-NEXT:    addi s3, sp, 14
> -; RV64I-NEXT:    j .LBB106_2
> ; RV64I-NEXT:  .LBB106_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB106_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    and a1, a0, s0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s1, a1, .LBB106_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB106_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB106_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB106_1 Depth=1
> ; RV64I-NEXT:    sh a0, 14(sp)
> ; RV64I-NEXT:    mv a0, s4
> ; RV64I-NEXT:    mv a1, s3
> @@ -9719,17 +9580,8 @@ define i16 @atomicrmw_umin_i16_acquire(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 14(sp)
> -; RV64I-NEXT:    bnez a1, .LBB106_4
> -; RV64I-NEXT:  .LBB106_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    and a1, a0, s0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s1, a1, .LBB106_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB106_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB106_1
> -; RV64I-NEXT:  .LBB106_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB106_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s4, 16(sp)
> ; RV64I-NEXT:    ld s3, 24(sp)
> ; RV64I-NEXT:    ld s2, 32(sp)
> @@ -9785,9 +9637,16 @@ define i16 @atomicrmw_umin_i16_release(i
> ; RV32I-NEXT:    addi s0, a1, -1
> ; RV32I-NEXT:    and s1, s2, s0
> ; RV32I-NEXT:    addi s3, sp, 6
> -; RV32I-NEXT:    j .LBB107_2
> ; RV32I-NEXT:  .LBB107_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB107_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    and a1, a0, s0
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bgeu s1, a1, .LBB107_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB107_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB107_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB107_1 Depth=1
> ; RV32I-NEXT:    sh a0, 6(sp)
> ; RV32I-NEXT:    mv a0, s4
> ; RV32I-NEXT:    mv a1, s3
> @@ -9796,17 +9655,8 @@ define i16 @atomicrmw_umin_i16_release(i
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 6(sp)
> -; RV32I-NEXT:    bnez a1, .LBB107_4
> -; RV32I-NEXT:  .LBB107_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    and a1, a0, s0
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bgeu s1, a1, .LBB107_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB107_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB107_1
> -; RV32I-NEXT:  .LBB107_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB107_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s4, 8(sp)
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -9858,9 +9708,16 @@ define i16 @atomicrmw_umin_i16_release(i
> ; RV64I-NEXT:    addiw s0, a1, -1
> ; RV64I-NEXT:    and s1, s2, s0
> ; RV64I-NEXT:    addi s3, sp, 14
> -; RV64I-NEXT:    j .LBB107_2
> ; RV64I-NEXT:  .LBB107_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB107_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    and a1, a0, s0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s1, a1, .LBB107_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB107_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB107_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB107_1 Depth=1
> ; RV64I-NEXT:    sh a0, 14(sp)
> ; RV64I-NEXT:    mv a0, s4
> ; RV64I-NEXT:    mv a1, s3
> @@ -9869,17 +9726,8 @@ define i16 @atomicrmw_umin_i16_release(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 14(sp)
> -; RV64I-NEXT:    bnez a1, .LBB107_4
> -; RV64I-NEXT:  .LBB107_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    and a1, a0, s0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s1, a1, .LBB107_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB107_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB107_1
> -; RV64I-NEXT:  .LBB107_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB107_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s4, 16(sp)
> ; RV64I-NEXT:    ld s3, 24(sp)
> ; RV64I-NEXT:    ld s2, 32(sp)
> @@ -9935,9 +9783,16 @@ define i16 @atomicrmw_umin_i16_acq_rel(i
> ; RV32I-NEXT:    addi s0, a1, -1
> ; RV32I-NEXT:    and s1, s2, s0
> ; RV32I-NEXT:    addi s3, sp, 6
> -; RV32I-NEXT:    j .LBB108_2
> ; RV32I-NEXT:  .LBB108_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB108_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    and a1, a0, s0
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bgeu s1, a1, .LBB108_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB108_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB108_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB108_1 Depth=1
> ; RV32I-NEXT:    sh a0, 6(sp)
> ; RV32I-NEXT:    mv a0, s4
> ; RV32I-NEXT:    mv a1, s3
> @@ -9946,17 +9801,8 @@ define i16 @atomicrmw_umin_i16_acq_rel(i
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 6(sp)
> -; RV32I-NEXT:    bnez a1, .LBB108_4
> -; RV32I-NEXT:  .LBB108_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    and a1, a0, s0
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bgeu s1, a1, .LBB108_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB108_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB108_1
> -; RV32I-NEXT:  .LBB108_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB108_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s4, 8(sp)
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -10008,9 +9854,16 @@ define i16 @atomicrmw_umin_i16_acq_rel(i
> ; RV64I-NEXT:    addiw s0, a1, -1
> ; RV64I-NEXT:    and s1, s2, s0
> ; RV64I-NEXT:    addi s3, sp, 14
> -; RV64I-NEXT:    j .LBB108_2
> ; RV64I-NEXT:  .LBB108_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB108_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    and a1, a0, s0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s1, a1, .LBB108_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB108_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB108_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB108_1 Depth=1
> ; RV64I-NEXT:    sh a0, 14(sp)
> ; RV64I-NEXT:    mv a0, s4
> ; RV64I-NEXT:    mv a1, s3
> @@ -10019,17 +9872,8 @@ define i16 @atomicrmw_umin_i16_acq_rel(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 14(sp)
> -; RV64I-NEXT:    bnez a1, .LBB108_4
> -; RV64I-NEXT:  .LBB108_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    and a1, a0, s0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s1, a1, .LBB108_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB108_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB108_1
> -; RV64I-NEXT:  .LBB108_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB108_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s4, 16(sp)
> ; RV64I-NEXT:    ld s3, 24(sp)
> ; RV64I-NEXT:    ld s2, 32(sp)
> @@ -10085,9 +9929,16 @@ define i16 @atomicrmw_umin_i16_seq_cst(i
> ; RV32I-NEXT:    addi s0, a1, -1
> ; RV32I-NEXT:    and s1, s2, s0
> ; RV32I-NEXT:    addi s3, sp, 6
> -; RV32I-NEXT:    j .LBB109_2
> ; RV32I-NEXT:  .LBB109_1: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB109_2 Depth=1
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    and a1, a0, s0
> +; RV32I-NEXT:    mv a2, a0
> +; RV32I-NEXT:    bgeu s1, a1, .LBB109_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB109_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:  .LBB109_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB109_1 Depth=1
> ; RV32I-NEXT:    sh a0, 6(sp)
> ; RV32I-NEXT:    mv a0, s4
> ; RV32I-NEXT:    mv a1, s3
> @@ -10096,17 +9947,8 @@ define i16 @atomicrmw_umin_i16_seq_cst(i
> ; RV32I-NEXT:    call __atomic_compare_exchange_2
> ; RV32I-NEXT:    mv a1, a0
> ; RV32I-NEXT:    lh a0, 6(sp)
> -; RV32I-NEXT:    bnez a1, .LBB109_4
> -; RV32I-NEXT:  .LBB109_2: # %atomicrmw.start
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV32I-NEXT:    and a1, a0, s0
> -; RV32I-NEXT:    mv a2, a0
> -; RV32I-NEXT:    bgeu s1, a1, .LBB109_1
> -; RV32I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB109_2 Depth=1
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    j .LBB109_1
> -; RV32I-NEXT:  .LBB109_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a1, .LBB109_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    lw s4, 8(sp)
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -10158,9 +10000,16 @@ define i16 @atomicrmw_umin_i16_seq_cst(i
> ; RV64I-NEXT:    addiw s0, a1, -1
> ; RV64I-NEXT:    and s1, s2, s0
> ; RV64I-NEXT:    addi s3, sp, 14
> -; RV64I-NEXT:    j .LBB109_2
> ; RV64I-NEXT:  .LBB109_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB109_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    and a1, a0, s0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s1, a1, .LBB109_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB109_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB109_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB109_1 Depth=1
> ; RV64I-NEXT:    sh a0, 14(sp)
> ; RV64I-NEXT:    mv a0, s4
> ; RV64I-NEXT:    mv a1, s3
> @@ -10169,17 +10018,8 @@ define i16 @atomicrmw_umin_i16_seq_cst(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_2
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lh a0, 14(sp)
> -; RV64I-NEXT:    bnez a1, .LBB109_4
> -; RV64I-NEXT:  .LBB109_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    and a1, a0, s0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s1, a1, .LBB109_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB109_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB109_1
> -; RV64I-NEXT:  .LBB109_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB109_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s4, 16(sp)
> ; RV64I-NEXT:    ld s3, 24(sp)
> ; RV64I-NEXT:    ld s2, 32(sp)
> @@ -11500,25 +11340,23 @@ define i32 @atomicrmw_max_i32_monotonic(
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bge s0, a2, .LBB145_3
> ; RV32I-NEXT:  .LBB145_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    blt s0, a2, .LBB145_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB145_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB145_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB145_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    mv a3, zero
> ; RV32I-NEXT:    mv a4, zero
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB145_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB145_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    blt s0, a2, .LBB145_1
> -; RV32I-NEXT:  .LBB145_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB145_1
> -; RV32I-NEXT:  .LBB145_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB145_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -11545,9 +11383,16 @@ define i32 @atomicrmw_max_i32_monotonic(
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB145_2
> ; RV64I-NEXT:  .LBB145_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB145_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB145_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB145_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB145_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB145_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -11556,17 +11401,8 @@ define i32 @atomicrmw_max_i32_monotonic(
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB145_4
> -; RV64I-NEXT:  .LBB145_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB145_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB145_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB145_1
> -; RV64I-NEXT:  .LBB145_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB145_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -11595,25 +11431,23 @@ define i32 @atomicrmw_max_i32_acquire(i3
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bge s0, a2, .LBB146_3
> ; RV32I-NEXT:  .LBB146_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    blt s0, a2, .LBB146_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB146_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB146_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB146_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 2
> ; RV32I-NEXT:    addi a4, zero, 2
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB146_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB146_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    blt s0, a2, .LBB146_1
> -; RV32I-NEXT:  .LBB146_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB146_1
> -; RV32I-NEXT:  .LBB146_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB146_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -11640,9 +11474,16 @@ define i32 @atomicrmw_max_i32_acquire(i3
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB146_2
> ; RV64I-NEXT:  .LBB146_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB146_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB146_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB146_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB146_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB146_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -11651,17 +11492,8 @@ define i32 @atomicrmw_max_i32_acquire(i3
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB146_4
> -; RV64I-NEXT:  .LBB146_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB146_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB146_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB146_1
> -; RV64I-NEXT:  .LBB146_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB146_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -11690,25 +11522,23 @@ define i32 @atomicrmw_max_i32_release(i3
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bge s0, a2, .LBB147_3
> ; RV32I-NEXT:  .LBB147_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    blt s0, a2, .LBB147_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB147_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB147_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB147_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 3
> ; RV32I-NEXT:    mv a4, zero
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB147_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB147_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    blt s0, a2, .LBB147_1
> -; RV32I-NEXT:  .LBB147_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB147_1
> -; RV32I-NEXT:  .LBB147_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB147_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -11735,9 +11565,16 @@ define i32 @atomicrmw_max_i32_release(i3
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB147_2
> ; RV64I-NEXT:  .LBB147_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB147_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB147_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB147_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB147_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB147_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -11746,17 +11583,8 @@ define i32 @atomicrmw_max_i32_release(i3
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB147_4
> -; RV64I-NEXT:  .LBB147_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB147_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB147_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB147_1
> -; RV64I-NEXT:  .LBB147_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB147_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -11785,25 +11613,23 @@ define i32 @atomicrmw_max_i32_acq_rel(i3
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bge s0, a2, .LBB148_3
> ; RV32I-NEXT:  .LBB148_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    blt s0, a2, .LBB148_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB148_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB148_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB148_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 4
> ; RV32I-NEXT:    addi a4, zero, 2
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB148_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB148_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    blt s0, a2, .LBB148_1
> -; RV32I-NEXT:  .LBB148_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB148_1
> -; RV32I-NEXT:  .LBB148_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB148_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -11830,9 +11656,16 @@ define i32 @atomicrmw_max_i32_acq_rel(i3
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB148_2
> ; RV64I-NEXT:  .LBB148_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB148_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB148_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB148_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB148_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB148_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -11841,17 +11674,8 @@ define i32 @atomicrmw_max_i32_acq_rel(i3
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB148_4
> -; RV64I-NEXT:  .LBB148_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB148_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB148_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB148_1
> -; RV64I-NEXT:  .LBB148_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB148_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -11880,25 +11704,23 @@ define i32 @atomicrmw_max_i32_seq_cst(i3
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bge s0, a2, .LBB149_3
> ; RV32I-NEXT:  .LBB149_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    blt s0, a2, .LBB149_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB149_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB149_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB149_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 5
> ; RV32I-NEXT:    addi a4, zero, 5
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB149_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB149_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    blt s0, a2, .LBB149_1
> -; RV32I-NEXT:  .LBB149_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB149_1
> -; RV32I-NEXT:  .LBB149_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB149_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -11925,9 +11747,16 @@ define i32 @atomicrmw_max_i32_seq_cst(i3
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB149_2
> ; RV64I-NEXT:  .LBB149_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB149_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    blt s0, a1, .LBB149_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB149_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB149_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB149_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -11936,17 +11765,8 @@ define i32 @atomicrmw_max_i32_seq_cst(i3
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB149_4
> -; RV64I-NEXT:  .LBB149_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    blt s0, a1, .LBB149_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB149_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB149_1
> -; RV64I-NEXT:  .LBB149_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB149_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -11975,25 +11795,23 @@ define i32 @atomicrmw_min_i32_monotonic(
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    blt s0, a2, .LBB150_3
> ; RV32I-NEXT:  .LBB150_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bge s0, a2, .LBB150_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB150_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB150_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB150_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    mv a3, zero
> ; RV32I-NEXT:    mv a4, zero
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB150_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB150_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bge s0, a2, .LBB150_1
> -; RV32I-NEXT:  .LBB150_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB150_1
> -; RV32I-NEXT:  .LBB150_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB150_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -12020,9 +11838,16 @@ define i32 @atomicrmw_min_i32_monotonic(
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB150_2
> ; RV64I-NEXT:  .LBB150_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB150_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB150_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB150_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB150_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB150_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -12031,17 +11856,8 @@ define i32 @atomicrmw_min_i32_monotonic(
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB150_4
> -; RV64I-NEXT:  .LBB150_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB150_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB150_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB150_1
> -; RV64I-NEXT:  .LBB150_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB150_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -12070,25 +11886,23 @@ define i32 @atomicrmw_min_i32_acquire(i3
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    blt s0, a2, .LBB151_3
> ; RV32I-NEXT:  .LBB151_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bge s0, a2, .LBB151_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB151_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB151_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB151_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 2
> ; RV32I-NEXT:    addi a4, zero, 2
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB151_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB151_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bge s0, a2, .LBB151_1
> -; RV32I-NEXT:  .LBB151_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB151_1
> -; RV32I-NEXT:  .LBB151_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB151_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -12115,9 +11929,16 @@ define i32 @atomicrmw_min_i32_acquire(i3
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB151_2
> ; RV64I-NEXT:  .LBB151_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB151_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB151_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB151_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB151_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB151_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -12126,17 +11947,8 @@ define i32 @atomicrmw_min_i32_acquire(i3
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB151_4
> -; RV64I-NEXT:  .LBB151_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB151_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB151_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB151_1
> -; RV64I-NEXT:  .LBB151_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB151_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -12165,25 +11977,23 @@ define i32 @atomicrmw_min_i32_release(i3
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    blt s0, a2, .LBB152_3
> ; RV32I-NEXT:  .LBB152_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bge s0, a2, .LBB152_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB152_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB152_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB152_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 3
> ; RV32I-NEXT:    mv a4, zero
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB152_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB152_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bge s0, a2, .LBB152_1
> -; RV32I-NEXT:  .LBB152_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB152_1
> -; RV32I-NEXT:  .LBB152_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB152_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -12210,9 +12020,16 @@ define i32 @atomicrmw_min_i32_release(i3
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB152_2
> ; RV64I-NEXT:  .LBB152_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB152_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB152_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB152_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB152_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB152_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -12221,17 +12038,8 @@ define i32 @atomicrmw_min_i32_release(i3
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB152_4
> -; RV64I-NEXT:  .LBB152_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB152_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB152_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB152_1
> -; RV64I-NEXT:  .LBB152_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB152_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -12260,25 +12068,23 @@ define i32 @atomicrmw_min_i32_acq_rel(i3
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    blt s0, a2, .LBB153_3
> ; RV32I-NEXT:  .LBB153_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bge s0, a2, .LBB153_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB153_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB153_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB153_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 4
> ; RV32I-NEXT:    addi a4, zero, 2
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB153_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB153_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bge s0, a2, .LBB153_1
> -; RV32I-NEXT:  .LBB153_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB153_1
> -; RV32I-NEXT:  .LBB153_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB153_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -12305,9 +12111,16 @@ define i32 @atomicrmw_min_i32_acq_rel(i3
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB153_2
> ; RV64I-NEXT:  .LBB153_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB153_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB153_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB153_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB153_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB153_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -12316,17 +12129,8 @@ define i32 @atomicrmw_min_i32_acq_rel(i3
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB153_4
> -; RV64I-NEXT:  .LBB153_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB153_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB153_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB153_1
> -; RV64I-NEXT:  .LBB153_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB153_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -12355,25 +12159,23 @@ define i32 @atomicrmw_min_i32_seq_cst(i3
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    blt s0, a2, .LBB154_3
> ; RV32I-NEXT:  .LBB154_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bge s0, a2, .LBB154_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB154_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB154_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB154_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 5
> ; RV32I-NEXT:    addi a4, zero, 5
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB154_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB154_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bge s0, a2, .LBB154_1
> -; RV32I-NEXT:  .LBB154_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB154_1
> -; RV32I-NEXT:  .LBB154_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB154_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -12400,9 +12202,16 @@ define i32 @atomicrmw_min_i32_seq_cst(i3
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB154_2
> ; RV64I-NEXT:  .LBB154_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB154_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bge s0, a1, .LBB154_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB154_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB154_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB154_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -12411,17 +12220,8 @@ define i32 @atomicrmw_min_i32_seq_cst(i3
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB154_4
> -; RV64I-NEXT:  .LBB154_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bge s0, a1, .LBB154_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB154_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB154_1
> -; RV64I-NEXT:  .LBB154_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB154_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -12450,25 +12250,23 @@ define i32 @atomicrmw_umax_i32_monotonic
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bgeu s0, a2, .LBB155_3
> ; RV32I-NEXT:  .LBB155_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bltu s0, a2, .LBB155_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB155_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB155_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB155_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    mv a3, zero
> ; RV32I-NEXT:    mv a4, zero
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB155_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB155_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bltu s0, a2, .LBB155_1
> -; RV32I-NEXT:  .LBB155_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB155_1
> -; RV32I-NEXT:  .LBB155_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB155_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -12495,9 +12293,16 @@ define i32 @atomicrmw_umax_i32_monotonic
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB155_2
> ; RV64I-NEXT:  .LBB155_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB155_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s0, a1, .LBB155_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB155_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB155_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB155_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -12506,17 +12311,8 @@ define i32 @atomicrmw_umax_i32_monotonic
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB155_4
> -; RV64I-NEXT:  .LBB155_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s0, a1, .LBB155_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB155_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB155_1
> -; RV64I-NEXT:  .LBB155_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB155_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -12545,25 +12341,23 @@ define i32 @atomicrmw_umax_i32_acquire(i
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bgeu s0, a2, .LBB156_3
> ; RV32I-NEXT:  .LBB156_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bltu s0, a2, .LBB156_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB156_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB156_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB156_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 2
> ; RV32I-NEXT:    addi a4, zero, 2
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB156_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB156_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bltu s0, a2, .LBB156_1
> -; RV32I-NEXT:  .LBB156_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB156_1
> -; RV32I-NEXT:  .LBB156_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB156_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -12590,9 +12384,16 @@ define i32 @atomicrmw_umax_i32_acquire(i
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB156_2
> ; RV64I-NEXT:  .LBB156_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB156_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s0, a1, .LBB156_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB156_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB156_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB156_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -12601,17 +12402,8 @@ define i32 @atomicrmw_umax_i32_acquire(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB156_4
> -; RV64I-NEXT:  .LBB156_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s0, a1, .LBB156_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB156_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB156_1
> -; RV64I-NEXT:  .LBB156_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB156_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -12640,25 +12432,23 @@ define i32 @atomicrmw_umax_i32_release(i
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bgeu s0, a2, .LBB157_3
> ; RV32I-NEXT:  .LBB157_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bltu s0, a2, .LBB157_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB157_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB157_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB157_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 3
> ; RV32I-NEXT:    mv a4, zero
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB157_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB157_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bltu s0, a2, .LBB157_1
> -; RV32I-NEXT:  .LBB157_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB157_1
> -; RV32I-NEXT:  .LBB157_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB157_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -12685,9 +12475,16 @@ define i32 @atomicrmw_umax_i32_release(i
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB157_2
> ; RV64I-NEXT:  .LBB157_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB157_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s0, a1, .LBB157_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB157_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB157_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB157_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -12696,17 +12493,8 @@ define i32 @atomicrmw_umax_i32_release(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB157_4
> -; RV64I-NEXT:  .LBB157_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s0, a1, .LBB157_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB157_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB157_1
> -; RV64I-NEXT:  .LBB157_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB157_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -12735,25 +12523,23 @@ define i32 @atomicrmw_umax_i32_acq_rel(i
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bgeu s0, a2, .LBB158_3
> ; RV32I-NEXT:  .LBB158_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bltu s0, a2, .LBB158_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB158_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB158_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB158_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 4
> ; RV32I-NEXT:    addi a4, zero, 2
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB158_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB158_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bltu s0, a2, .LBB158_1
> -; RV32I-NEXT:  .LBB158_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB158_1
> -; RV32I-NEXT:  .LBB158_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB158_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -12780,9 +12566,16 @@ define i32 @atomicrmw_umax_i32_acq_rel(i
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB158_2
> ; RV64I-NEXT:  .LBB158_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB158_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s0, a1, .LBB158_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB158_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB158_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB158_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -12791,17 +12584,8 @@ define i32 @atomicrmw_umax_i32_acq_rel(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB158_4
> -; RV64I-NEXT:  .LBB158_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s0, a1, .LBB158_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB158_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB158_1
> -; RV64I-NEXT:  .LBB158_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB158_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -12830,25 +12614,23 @@ define i32 @atomicrmw_umax_i32_seq_cst(i
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bgeu s0, a2, .LBB159_3
> ; RV32I-NEXT:  .LBB159_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bltu s0, a2, .LBB159_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB159_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB159_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB159_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 5
> ; RV32I-NEXT:    addi a4, zero, 5
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB159_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB159_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bltu s0, a2, .LBB159_1
> -; RV32I-NEXT:  .LBB159_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB159_1
> -; RV32I-NEXT:  .LBB159_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB159_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -12875,9 +12657,16 @@ define i32 @atomicrmw_umax_i32_seq_cst(i
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB159_2
> ; RV64I-NEXT:  .LBB159_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB159_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bltu s0, a1, .LBB159_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB159_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB159_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB159_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -12886,17 +12675,8 @@ define i32 @atomicrmw_umax_i32_seq_cst(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB159_4
> -; RV64I-NEXT:  .LBB159_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bltu s0, a1, .LBB159_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB159_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB159_1
> -; RV64I-NEXT:  .LBB159_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB159_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -12925,25 +12705,23 @@ define i32 @atomicrmw_umin_i32_monotonic
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bltu s0, a2, .LBB160_3
> ; RV32I-NEXT:  .LBB160_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bgeu s0, a2, .LBB160_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB160_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB160_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB160_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    mv a3, zero
> ; RV32I-NEXT:    mv a4, zero
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB160_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB160_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bgeu s0, a2, .LBB160_1
> -; RV32I-NEXT:  .LBB160_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB160_1
> -; RV32I-NEXT:  .LBB160_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB160_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -12970,9 +12748,16 @@ define i32 @atomicrmw_umin_i32_monotonic
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB160_2
> ; RV64I-NEXT:  .LBB160_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB160_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s0, a1, .LBB160_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB160_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB160_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB160_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -12981,17 +12766,8 @@ define i32 @atomicrmw_umin_i32_monotonic
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB160_4
> -; RV64I-NEXT:  .LBB160_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s0, a1, .LBB160_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB160_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB160_1
> -; RV64I-NEXT:  .LBB160_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB160_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -13020,25 +12796,23 @@ define i32 @atomicrmw_umin_i32_acquire(i
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bltu s0, a2, .LBB161_3
> ; RV32I-NEXT:  .LBB161_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bgeu s0, a2, .LBB161_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB161_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB161_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB161_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 2
> ; RV32I-NEXT:    addi a4, zero, 2
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB161_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB161_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bgeu s0, a2, .LBB161_1
> -; RV32I-NEXT:  .LBB161_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB161_1
> -; RV32I-NEXT:  .LBB161_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB161_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -13065,9 +12839,16 @@ define i32 @atomicrmw_umin_i32_acquire(i
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB161_2
> ; RV64I-NEXT:  .LBB161_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB161_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s0, a1, .LBB161_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB161_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB161_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB161_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -13076,17 +12857,8 @@ define i32 @atomicrmw_umin_i32_acquire(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB161_4
> -; RV64I-NEXT:  .LBB161_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s0, a1, .LBB161_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB161_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB161_1
> -; RV64I-NEXT:  .LBB161_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB161_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -13115,25 +12887,23 @@ define i32 @atomicrmw_umin_i32_release(i
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bltu s0, a2, .LBB162_3
> ; RV32I-NEXT:  .LBB162_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bgeu s0, a2, .LBB162_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB162_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB162_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB162_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 3
> ; RV32I-NEXT:    mv a4, zero
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB162_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB162_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bgeu s0, a2, .LBB162_1
> -; RV32I-NEXT:  .LBB162_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB162_1
> -; RV32I-NEXT:  .LBB162_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB162_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -13160,9 +12930,16 @@ define i32 @atomicrmw_umin_i32_release(i
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB162_2
> ; RV64I-NEXT:  .LBB162_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB162_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s0, a1, .LBB162_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB162_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB162_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB162_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -13171,17 +12948,8 @@ define i32 @atomicrmw_umin_i32_release(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB162_4
> -; RV64I-NEXT:  .LBB162_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s0, a1, .LBB162_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB162_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB162_1
> -; RV64I-NEXT:  .LBB162_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB162_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -13210,25 +12978,23 @@ define i32 @atomicrmw_umin_i32_acq_rel(i
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bltu s0, a2, .LBB163_3
> ; RV32I-NEXT:  .LBB163_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bgeu s0, a2, .LBB163_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB163_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB163_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB163_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 4
> ; RV32I-NEXT:    addi a4, zero, 2
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB163_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB163_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bgeu s0, a2, .LBB163_1
> -; RV32I-NEXT:  .LBB163_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB163_1
> -; RV32I-NEXT:  .LBB163_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB163_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -13255,9 +13021,16 @@ define i32 @atomicrmw_umin_i32_acq_rel(i
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB163_2
> ; RV64I-NEXT:  .LBB163_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB163_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s0, a1, .LBB163_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB163_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB163_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB163_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -13266,17 +13039,8 @@ define i32 @atomicrmw_umin_i32_acq_rel(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB163_4
> -; RV64I-NEXT:  .LBB163_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s0, a1, .LBB163_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB163_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB163_1
> -; RV64I-NEXT:  .LBB163_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB163_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -13305,25 +13069,23 @@ define i32 @atomicrmw_umin_i32_seq_cst(i
> ; RV32I-NEXT:    mv s1, a0
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    addi s2, sp, 12
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bltu s0, a2, .LBB164_3
> ; RV32I-NEXT:  .LBB164_1: # %atomicrmw.start
> ; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    sw a2, 12(sp)
> +; RV32I-NEXT:    bgeu s0, a2, .LBB164_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB164_1 Depth=1
> +; RV32I-NEXT:    mv a2, s0
> +; RV32I-NEXT:  .LBB164_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB164_1 Depth=1
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s2
> ; RV32I-NEXT:    addi a3, zero, 5
> ; RV32I-NEXT:    addi a4, zero, 5
> ; RV32I-NEXT:    call __atomic_compare_exchange_4
> ; RV32I-NEXT:    lw a2, 12(sp)
> -; RV32I-NEXT:    bnez a0, .LBB164_4
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    # in Loop: Header=BB164_1 Depth=1
> -; RV32I-NEXT:    sw a2, 12(sp)
> -; RV32I-NEXT:    bgeu s0, a2, .LBB164_1
> -; RV32I-NEXT:  .LBB164_3: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s0
> -; RV32I-NEXT:    j .LBB164_1
> -; RV32I-NEXT:  .LBB164_4: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB164_1
> +; RV32I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s2, 16(sp)
> ; RV32I-NEXT:    lw s1, 20(sp)
> @@ -13350,9 +13112,16 @@ define i32 @atomicrmw_umin_i32_seq_cst(i
> ; RV64I-NEXT:    lwu a0, 0(a0)
> ; RV64I-NEXT:    sext.w s0, a1
> ; RV64I-NEXT:    addi s3, sp, 4
> -; RV64I-NEXT:    j .LBB164_2
> ; RV64I-NEXT:  .LBB164_1: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB164_2 Depth=1
> +; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sext.w a1, a0
> +; RV64I-NEXT:    mv a2, a0
> +; RV64I-NEXT:    bgeu s0, a1, .LBB164_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB164_1 Depth=1
> +; RV64I-NEXT:    mv a2, s2
> +; RV64I-NEXT:  .LBB164_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB164_1 Depth=1
> ; RV64I-NEXT:    sw a0, 4(sp)
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s3
> @@ -13361,17 +13130,8 @@ define i32 @atomicrmw_umin_i32_seq_cst(i
> ; RV64I-NEXT:    call __atomic_compare_exchange_4
> ; RV64I-NEXT:    mv a1, a0
> ; RV64I-NEXT:    lw a0, 4(sp)
> -; RV64I-NEXT:    bnez a1, .LBB164_4
> -; RV64I-NEXT:  .LBB164_2: # %atomicrmw.start
> -; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> -; RV64I-NEXT:    sext.w a1, a0
> -; RV64I-NEXT:    mv a2, a0
> -; RV64I-NEXT:    bgeu s0, a1, .LBB164_1
> -; RV64I-NEXT:  # %bb.3: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB164_2 Depth=1
> -; RV64I-NEXT:    mv a2, s2
> -; RV64I-NEXT:    j .LBB164_1
> -; RV64I-NEXT:  .LBB164_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a1, .LBB164_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    ld s3, 8(sp)
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -14808,9 +14568,26 @@ define i64 @atomicrmw_max_i64_monotonic(
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB200_3
> -; RV32I-NEXT:    j .LBB200_4
> ; RV32I-NEXT:  .LBB200_1: # %atomicrmw.start
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB200_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB200_1 Depth=1
> +; RV32I-NEXT:    slt a0, s0, a1
> +; RV32I-NEXT:    j .LBB200_4
> +; RV32I-NEXT:  .LBB200_3: # in Loop: Header=BB200_1 Depth=1
> +; RV32I-NEXT:    sltu a0, s2, a2
> +; RV32I-NEXT:  .LBB200_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB200_1 Depth=1
> +; RV32I-NEXT:    sw a2, 0(sp)
> +; RV32I-NEXT:    mv a3, a1
> +; RV32I-NEXT:    bnez a0, .LBB200_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB200_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:    mv a3, s0
> +; RV32I-NEXT:  .LBB200_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB200_1 Depth=1
> ; RV32I-NEXT:    sw a1, 4(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -14819,23 +14596,8 @@ define i64 @atomicrmw_max_i64_monotonic(
> ; RV32I-NEXT:    call __atomic_compare_exchange_8
> ; RV32I-NEXT:    lw a1, 4(sp)
> ; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB200_7
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB200_4
> -; RV32I-NEXT:  .LBB200_3: # %atomicrmw.start
> -; RV32I-NEXT:    slt a0, s0, a1
> -; RV32I-NEXT:    j .LBB200_5
> -; RV32I-NEXT:  .LBB200_4:
> -; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB200_5: # %atomicrmw.start
> -; RV32I-NEXT:    sw a2, 0(sp)
> -; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB200_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB200_1
> -; RV32I-NEXT:  .LBB200_7: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB200_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -14859,9 +14621,26 @@ define i64 @atomicrmw_max_i64_monotonic(
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB200_3
> -; RV32IA-NEXT:    j .LBB200_4
> ; RV32IA-NEXT:  .LBB200_1: # %atomicrmw.start
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB200_3
> +; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB200_1 Depth=1
> +; RV32IA-NEXT:    slt a0, s0, a1
> +; RV32IA-NEXT:    j .LBB200_4
> +; RV32IA-NEXT:  .LBB200_3: # in Loop: Header=BB200_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s2, a2
> +; RV32IA-NEXT:  .LBB200_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB200_1 Depth=1
> +; RV32IA-NEXT:    sw a2, 0(sp)
> +; RV32IA-NEXT:    mv a3, a1
> +; RV32IA-NEXT:    bnez a0, .LBB200_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB200_1 Depth=1
> +; RV32IA-NEXT:    mv a2, s2
> +; RV32IA-NEXT:    mv a3, s0
> +; RV32IA-NEXT:  .LBB200_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB200_1 Depth=1
> ; RV32IA-NEXT:    sw a1, 4(sp)
> ; RV32IA-NEXT:    mv a0, s1
> ; RV32IA-NEXT:    mv a1, s3
> @@ -14870,23 +14649,8 @@ define i64 @atomicrmw_max_i64_monotonic(
> ; RV32IA-NEXT:    call __atomic_compare_exchange_8
> ; RV32IA-NEXT:    lw a1, 4(sp)
> ; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB200_7
> -; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB200_4
> -; RV32IA-NEXT:  .LBB200_3: # %atomicrmw.start
> -; RV32IA-NEXT:    slt a0, s0, a1
> -; RV32IA-NEXT:    j .LBB200_5
> -; RV32IA-NEXT:  .LBB200_4:
> -; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB200_5: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a2, 0(sp)
> -; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB200_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32IA-NEXT:    mv a2, s2
> -; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB200_1
> -; RV32IA-NEXT:  .LBB200_7: # %atomicrmw.end
> +; RV32IA-NEXT:    beqz a0, .LBB200_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -14907,25 +14671,23 @@ define i64 @atomicrmw_max_i64_monotonic(
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bge s0, a2, .LBB200_3
> ; RV64I-NEXT:  .LBB200_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    blt s0, a2, .LBB200_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB200_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB200_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB200_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    mv a3, zero
> ; RV64I-NEXT:    mv a4, zero
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB200_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB200_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    blt s0, a2, .LBB200_1
> -; RV64I-NEXT:  .LBB200_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB200_1
> -; RV64I-NEXT:  .LBB200_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB200_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -14957,9 +14719,26 @@ define i64 @atomicrmw_max_i64_acquire(i6
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB201_3
> -; RV32I-NEXT:    j .LBB201_4
> ; RV32I-NEXT:  .LBB201_1: # %atomicrmw.start
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB201_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB201_1 Depth=1
> +; RV32I-NEXT:    slt a0, s0, a1
> +; RV32I-NEXT:    j .LBB201_4
> +; RV32I-NEXT:  .LBB201_3: # in Loop: Header=BB201_1 Depth=1
> +; RV32I-NEXT:    sltu a0, s2, a2
> +; RV32I-NEXT:  .LBB201_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB201_1 Depth=1
> +; RV32I-NEXT:    sw a2, 0(sp)
> +; RV32I-NEXT:    mv a3, a1
> +; RV32I-NEXT:    bnez a0, .LBB201_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB201_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:    mv a3, s0
> +; RV32I-NEXT:  .LBB201_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB201_1 Depth=1
> ; RV32I-NEXT:    sw a1, 4(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -14968,23 +14747,8 @@ define i64 @atomicrmw_max_i64_acquire(i6
> ; RV32I-NEXT:    call __atomic_compare_exchange_8
> ; RV32I-NEXT:    lw a1, 4(sp)
> ; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB201_7
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB201_4
> -; RV32I-NEXT:  .LBB201_3: # %atomicrmw.start
> -; RV32I-NEXT:    slt a0, s0, a1
> -; RV32I-NEXT:    j .LBB201_5
> -; RV32I-NEXT:  .LBB201_4:
> -; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB201_5: # %atomicrmw.start
> -; RV32I-NEXT:    sw a2, 0(sp)
> -; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB201_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB201_1
> -; RV32I-NEXT:  .LBB201_7: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB201_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -15008,9 +14772,26 @@ define i64 @atomicrmw_max_i64_acquire(i6
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB201_3
> -; RV32IA-NEXT:    j .LBB201_4
> ; RV32IA-NEXT:  .LBB201_1: # %atomicrmw.start
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB201_3
> +; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB201_1 Depth=1
> +; RV32IA-NEXT:    slt a0, s0, a1
> +; RV32IA-NEXT:    j .LBB201_4
> +; RV32IA-NEXT:  .LBB201_3: # in Loop: Header=BB201_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s2, a2
> +; RV32IA-NEXT:  .LBB201_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB201_1 Depth=1
> +; RV32IA-NEXT:    sw a2, 0(sp)
> +; RV32IA-NEXT:    mv a3, a1
> +; RV32IA-NEXT:    bnez a0, .LBB201_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB201_1 Depth=1
> +; RV32IA-NEXT:    mv a2, s2
> +; RV32IA-NEXT:    mv a3, s0
> +; RV32IA-NEXT:  .LBB201_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB201_1 Depth=1
> ; RV32IA-NEXT:    sw a1, 4(sp)
> ; RV32IA-NEXT:    mv a0, s1
> ; RV32IA-NEXT:    mv a1, s3
> @@ -15019,23 +14800,8 @@ define i64 @atomicrmw_max_i64_acquire(i6
> ; RV32IA-NEXT:    call __atomic_compare_exchange_8
> ; RV32IA-NEXT:    lw a1, 4(sp)
> ; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB201_7
> -; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB201_4
> -; RV32IA-NEXT:  .LBB201_3: # %atomicrmw.start
> -; RV32IA-NEXT:    slt a0, s0, a1
> -; RV32IA-NEXT:    j .LBB201_5
> -; RV32IA-NEXT:  .LBB201_4:
> -; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB201_5: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a2, 0(sp)
> -; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB201_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32IA-NEXT:    mv a2, s2
> -; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB201_1
> -; RV32IA-NEXT:  .LBB201_7: # %atomicrmw.end
> +; RV32IA-NEXT:    beqz a0, .LBB201_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -15056,25 +14822,23 @@ define i64 @atomicrmw_max_i64_acquire(i6
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bge s0, a2, .LBB201_3
> ; RV64I-NEXT:  .LBB201_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    blt s0, a2, .LBB201_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB201_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB201_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB201_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 2
> ; RV64I-NEXT:    addi a4, zero, 2
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB201_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB201_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    blt s0, a2, .LBB201_1
> -; RV64I-NEXT:  .LBB201_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB201_1
> -; RV64I-NEXT:  .LBB201_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB201_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -15106,9 +14870,26 @@ define i64 @atomicrmw_max_i64_release(i6
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB202_3
> -; RV32I-NEXT:    j .LBB202_4
> ; RV32I-NEXT:  .LBB202_1: # %atomicrmw.start
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB202_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB202_1 Depth=1
> +; RV32I-NEXT:    slt a0, s0, a1
> +; RV32I-NEXT:    j .LBB202_4
> +; RV32I-NEXT:  .LBB202_3: # in Loop: Header=BB202_1 Depth=1
> +; RV32I-NEXT:    sltu a0, s2, a2
> +; RV32I-NEXT:  .LBB202_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB202_1 Depth=1
> +; RV32I-NEXT:    sw a2, 0(sp)
> +; RV32I-NEXT:    mv a3, a1
> +; RV32I-NEXT:    bnez a0, .LBB202_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB202_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:    mv a3, s0
> +; RV32I-NEXT:  .LBB202_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB202_1 Depth=1
> ; RV32I-NEXT:    sw a1, 4(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -15117,23 +14898,8 @@ define i64 @atomicrmw_max_i64_release(i6
> ; RV32I-NEXT:    call __atomic_compare_exchange_8
> ; RV32I-NEXT:    lw a1, 4(sp)
> ; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB202_7
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB202_4
> -; RV32I-NEXT:  .LBB202_3: # %atomicrmw.start
> -; RV32I-NEXT:    slt a0, s0, a1
> -; RV32I-NEXT:    j .LBB202_5
> -; RV32I-NEXT:  .LBB202_4:
> -; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB202_5: # %atomicrmw.start
> -; RV32I-NEXT:    sw a2, 0(sp)
> -; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB202_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB202_1
> -; RV32I-NEXT:  .LBB202_7: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB202_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -15157,9 +14923,26 @@ define i64 @atomicrmw_max_i64_release(i6
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB202_3
> -; RV32IA-NEXT:    j .LBB202_4
> ; RV32IA-NEXT:  .LBB202_1: # %atomicrmw.start
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB202_3
> +; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB202_1 Depth=1
> +; RV32IA-NEXT:    slt a0, s0, a1
> +; RV32IA-NEXT:    j .LBB202_4
> +; RV32IA-NEXT:  .LBB202_3: # in Loop: Header=BB202_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s2, a2
> +; RV32IA-NEXT:  .LBB202_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB202_1 Depth=1
> +; RV32IA-NEXT:    sw a2, 0(sp)
> +; RV32IA-NEXT:    mv a3, a1
> +; RV32IA-NEXT:    bnez a0, .LBB202_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB202_1 Depth=1
> +; RV32IA-NEXT:    mv a2, s2
> +; RV32IA-NEXT:    mv a3, s0
> +; RV32IA-NEXT:  .LBB202_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB202_1 Depth=1
> ; RV32IA-NEXT:    sw a1, 4(sp)
> ; RV32IA-NEXT:    mv a0, s1
> ; RV32IA-NEXT:    mv a1, s3
> @@ -15168,23 +14951,8 @@ define i64 @atomicrmw_max_i64_release(i6
> ; RV32IA-NEXT:    call __atomic_compare_exchange_8
> ; RV32IA-NEXT:    lw a1, 4(sp)
> ; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB202_7
> -; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB202_4
> -; RV32IA-NEXT:  .LBB202_3: # %atomicrmw.start
> -; RV32IA-NEXT:    slt a0, s0, a1
> -; RV32IA-NEXT:    j .LBB202_5
> -; RV32IA-NEXT:  .LBB202_4:
> -; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB202_5: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a2, 0(sp)
> -; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB202_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32IA-NEXT:    mv a2, s2
> -; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB202_1
> -; RV32IA-NEXT:  .LBB202_7: # %atomicrmw.end
> +; RV32IA-NEXT:    beqz a0, .LBB202_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -15205,25 +14973,23 @@ define i64 @atomicrmw_max_i64_release(i6
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bge s0, a2, .LBB202_3
> ; RV64I-NEXT:  .LBB202_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    blt s0, a2, .LBB202_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB202_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB202_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB202_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 3
> ; RV64I-NEXT:    mv a4, zero
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB202_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB202_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    blt s0, a2, .LBB202_1
> -; RV64I-NEXT:  .LBB202_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB202_1
> -; RV64I-NEXT:  .LBB202_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB202_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -15255,9 +15021,26 @@ define i64 @atomicrmw_max_i64_acq_rel(i6
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB203_3
> -; RV32I-NEXT:    j .LBB203_4
> ; RV32I-NEXT:  .LBB203_1: # %atomicrmw.start
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB203_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB203_1 Depth=1
> +; RV32I-NEXT:    slt a0, s0, a1
> +; RV32I-NEXT:    j .LBB203_4
> +; RV32I-NEXT:  .LBB203_3: # in Loop: Header=BB203_1 Depth=1
> +; RV32I-NEXT:    sltu a0, s2, a2
> +; RV32I-NEXT:  .LBB203_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB203_1 Depth=1
> +; RV32I-NEXT:    sw a2, 0(sp)
> +; RV32I-NEXT:    mv a3, a1
> +; RV32I-NEXT:    bnez a0, .LBB203_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB203_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:    mv a3, s0
> +; RV32I-NEXT:  .LBB203_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB203_1 Depth=1
> ; RV32I-NEXT:    sw a1, 4(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -15266,23 +15049,8 @@ define i64 @atomicrmw_max_i64_acq_rel(i6
> ; RV32I-NEXT:    call __atomic_compare_exchange_8
> ; RV32I-NEXT:    lw a1, 4(sp)
> ; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB203_7
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB203_4
> -; RV32I-NEXT:  .LBB203_3: # %atomicrmw.start
> -; RV32I-NEXT:    slt a0, s0, a1
> -; RV32I-NEXT:    j .LBB203_5
> -; RV32I-NEXT:  .LBB203_4:
> -; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB203_5: # %atomicrmw.start
> -; RV32I-NEXT:    sw a2, 0(sp)
> -; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB203_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB203_1
> -; RV32I-NEXT:  .LBB203_7: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB203_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -15306,9 +15074,26 @@ define i64 @atomicrmw_max_i64_acq_rel(i6
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB203_3
> -; RV32IA-NEXT:    j .LBB203_4
> ; RV32IA-NEXT:  .LBB203_1: # %atomicrmw.start
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB203_3
> +; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB203_1 Depth=1
> +; RV32IA-NEXT:    slt a0, s0, a1
> +; RV32IA-NEXT:    j .LBB203_4
> +; RV32IA-NEXT:  .LBB203_3: # in Loop: Header=BB203_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s2, a2
> +; RV32IA-NEXT:  .LBB203_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB203_1 Depth=1
> +; RV32IA-NEXT:    sw a2, 0(sp)
> +; RV32IA-NEXT:    mv a3, a1
> +; RV32IA-NEXT:    bnez a0, .LBB203_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB203_1 Depth=1
> +; RV32IA-NEXT:    mv a2, s2
> +; RV32IA-NEXT:    mv a3, s0
> +; RV32IA-NEXT:  .LBB203_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB203_1 Depth=1
> ; RV32IA-NEXT:    sw a1, 4(sp)
> ; RV32IA-NEXT:    mv a0, s1
> ; RV32IA-NEXT:    mv a1, s3
> @@ -15317,23 +15102,8 @@ define i64 @atomicrmw_max_i64_acq_rel(i6
> ; RV32IA-NEXT:    call __atomic_compare_exchange_8
> ; RV32IA-NEXT:    lw a1, 4(sp)
> ; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB203_7
> -; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB203_4
> -; RV32IA-NEXT:  .LBB203_3: # %atomicrmw.start
> -; RV32IA-NEXT:    slt a0, s0, a1
> -; RV32IA-NEXT:    j .LBB203_5
> -; RV32IA-NEXT:  .LBB203_4:
> -; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB203_5: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a2, 0(sp)
> -; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB203_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32IA-NEXT:    mv a2, s2
> -; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB203_1
> -; RV32IA-NEXT:  .LBB203_7: # %atomicrmw.end
> +; RV32IA-NEXT:    beqz a0, .LBB203_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -15354,25 +15124,23 @@ define i64 @atomicrmw_max_i64_acq_rel(i6
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bge s0, a2, .LBB203_3
> ; RV64I-NEXT:  .LBB203_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    blt s0, a2, .LBB203_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB203_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB203_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB203_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 4
> ; RV64I-NEXT:    addi a4, zero, 2
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB203_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB203_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    blt s0, a2, .LBB203_1
> -; RV64I-NEXT:  .LBB203_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB203_1
> -; RV64I-NEXT:  .LBB203_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB203_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -15404,9 +15172,26 @@ define i64 @atomicrmw_max_i64_seq_cst(i6
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB204_3
> -; RV32I-NEXT:    j .LBB204_4
> ; RV32I-NEXT:  .LBB204_1: # %atomicrmw.start
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB204_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB204_1 Depth=1
> +; RV32I-NEXT:    slt a0, s0, a1
> +; RV32I-NEXT:    j .LBB204_4
> +; RV32I-NEXT:  .LBB204_3: # in Loop: Header=BB204_1 Depth=1
> +; RV32I-NEXT:    sltu a0, s2, a2
> +; RV32I-NEXT:  .LBB204_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB204_1 Depth=1
> +; RV32I-NEXT:    sw a2, 0(sp)
> +; RV32I-NEXT:    mv a3, a1
> +; RV32I-NEXT:    bnez a0, .LBB204_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB204_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:    mv a3, s0
> +; RV32I-NEXT:  .LBB204_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB204_1 Depth=1
> ; RV32I-NEXT:    sw a1, 4(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -15415,23 +15200,8 @@ define i64 @atomicrmw_max_i64_seq_cst(i6
> ; RV32I-NEXT:    call __atomic_compare_exchange_8
> ; RV32I-NEXT:    lw a1, 4(sp)
> ; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB204_7
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB204_4
> -; RV32I-NEXT:  .LBB204_3: # %atomicrmw.start
> -; RV32I-NEXT:    slt a0, s0, a1
> -; RV32I-NEXT:    j .LBB204_5
> -; RV32I-NEXT:  .LBB204_4:
> -; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB204_5: # %atomicrmw.start
> -; RV32I-NEXT:    sw a2, 0(sp)
> -; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB204_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB204_1
> -; RV32I-NEXT:  .LBB204_7: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB204_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -15455,9 +15225,26 @@ define i64 @atomicrmw_max_i64_seq_cst(i6
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB204_3
> -; RV32IA-NEXT:    j .LBB204_4
> ; RV32IA-NEXT:  .LBB204_1: # %atomicrmw.start
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB204_3
> +; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB204_1 Depth=1
> +; RV32IA-NEXT:    slt a0, s0, a1
> +; RV32IA-NEXT:    j .LBB204_4
> +; RV32IA-NEXT:  .LBB204_3: # in Loop: Header=BB204_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s2, a2
> +; RV32IA-NEXT:  .LBB204_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB204_1 Depth=1
> +; RV32IA-NEXT:    sw a2, 0(sp)
> +; RV32IA-NEXT:    mv a3, a1
> +; RV32IA-NEXT:    bnez a0, .LBB204_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB204_1 Depth=1
> +; RV32IA-NEXT:    mv a2, s2
> +; RV32IA-NEXT:    mv a3, s0
> +; RV32IA-NEXT:  .LBB204_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB204_1 Depth=1
> ; RV32IA-NEXT:    sw a1, 4(sp)
> ; RV32IA-NEXT:    mv a0, s1
> ; RV32IA-NEXT:    mv a1, s3
> @@ -15466,23 +15253,8 @@ define i64 @atomicrmw_max_i64_seq_cst(i6
> ; RV32IA-NEXT:    call __atomic_compare_exchange_8
> ; RV32IA-NEXT:    lw a1, 4(sp)
> ; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB204_7
> -; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB204_4
> -; RV32IA-NEXT:  .LBB204_3: # %atomicrmw.start
> -; RV32IA-NEXT:    slt a0, s0, a1
> -; RV32IA-NEXT:    j .LBB204_5
> -; RV32IA-NEXT:  .LBB204_4:
> -; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB204_5: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a2, 0(sp)
> -; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB204_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32IA-NEXT:    mv a2, s2
> -; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB204_1
> -; RV32IA-NEXT:  .LBB204_7: # %atomicrmw.end
> +; RV32IA-NEXT:    beqz a0, .LBB204_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -15503,25 +15275,23 @@ define i64 @atomicrmw_max_i64_seq_cst(i6
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bge s0, a2, .LBB204_3
> ; RV64I-NEXT:  .LBB204_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    blt s0, a2, .LBB204_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB204_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB204_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB204_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 5
> ; RV64I-NEXT:    addi a4, zero, 5
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB204_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB204_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    blt s0, a2, .LBB204_1
> -; RV64I-NEXT:  .LBB204_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB204_1
> -; RV64I-NEXT:  .LBB204_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB204_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -15553,35 +15323,37 @@ define i64 @atomicrmw_min_i64_monotonic(
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB205_3
> -; RV32I-NEXT:    j .LBB205_4
> ; RV32I-NEXT:  .LBB205_1: # %atomicrmw.start
> -; RV32I-NEXT:    sw a1, 4(sp)
> -; RV32I-NEXT:    mv a0, s1
> -; RV32I-NEXT:    mv a1, s3
> -; RV32I-NEXT:    mv a4, zero
> -; RV32I-NEXT:    mv a5, zero
> -; RV32I-NEXT:    call __atomic_compare_exchange_8
> -; RV32I-NEXT:    lw a1, 4(sp)
> -; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB205_7
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB205_3
> ; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB205_4
> -; RV32I-NEXT:  .LBB205_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB205_1 Depth=1
> ; RV32I-NEXT:    slt a0, s0, a1
> -; RV32I-NEXT:    j .LBB205_5
> -; RV32I-NEXT:  .LBB205_4:
> +; RV32I-NEXT:    j .LBB205_4
> +; RV32I-NEXT:  .LBB205_3: # in Loop: Header=BB205_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB205_5: # %atomicrmw.start
> +; RV32I-NEXT:  .LBB205_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB205_1 Depth=1
> ; RV32I-NEXT:    xori a0, a0, 1
> ; RV32I-NEXT:    sw a2, 0(sp)
> ; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB205_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32I-NEXT:    bnez a0, .LBB205_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB205_1 Depth=1
> ; RV32I-NEXT:    mv a2, s2
> ; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB205_1
> -; RV32I-NEXT:  .LBB205_7: # %atomicrmw.end
> +; RV32I-NEXT:  .LBB205_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB205_1 Depth=1
> +; RV32I-NEXT:    sw a1, 4(sp)
> +; RV32I-NEXT:    mv a0, s1
> +; RV32I-NEXT:    mv a1, s3
> +; RV32I-NEXT:    mv a4, zero
> +; RV32I-NEXT:    mv a5, zero
> +; RV32I-NEXT:    call __atomic_compare_exchange_8
> +; RV32I-NEXT:    lw a1, 4(sp)
> +; RV32I-NEXT:    lw a2, 0(sp)
> +; RV32I-NEXT:    beqz a0, .LBB205_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -15605,35 +15377,37 @@ define i64 @atomicrmw_min_i64_monotonic(
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB205_3
> -; RV32IA-NEXT:    j .LBB205_4
> ; RV32IA-NEXT:  .LBB205_1: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a1, 4(sp)
> -; RV32IA-NEXT:    mv a0, s1
> -; RV32IA-NEXT:    mv a1, s3
> -; RV32IA-NEXT:    mv a4, zero
> -; RV32IA-NEXT:    mv a5, zero
> -; RV32IA-NEXT:    call __atomic_compare_exchange_8
> -; RV32IA-NEXT:    lw a1, 4(sp)
> -; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB205_7
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB205_3
> ; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB205_4
> -; RV32IA-NEXT:  .LBB205_3: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB205_1 Depth=1
> ; RV32IA-NEXT:    slt a0, s0, a1
> -; RV32IA-NEXT:    j .LBB205_5
> -; RV32IA-NEXT:  .LBB205_4:
> +; RV32IA-NEXT:    j .LBB205_4
> +; RV32IA-NEXT:  .LBB205_3: # in Loop: Header=BB205_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB205_5: # %atomicrmw.start
> +; RV32IA-NEXT:  .LBB205_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB205_1 Depth=1
> ; RV32IA-NEXT:    xori a0, a0, 1
> ; RV32IA-NEXT:    sw a2, 0(sp)
> ; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB205_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32IA-NEXT:    bnez a0, .LBB205_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB205_1 Depth=1
> ; RV32IA-NEXT:    mv a2, s2
> ; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB205_1
> -; RV32IA-NEXT:  .LBB205_7: # %atomicrmw.end
> +; RV32IA-NEXT:  .LBB205_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB205_1 Depth=1
> +; RV32IA-NEXT:    sw a1, 4(sp)
> +; RV32IA-NEXT:    mv a0, s1
> +; RV32IA-NEXT:    mv a1, s3
> +; RV32IA-NEXT:    mv a4, zero
> +; RV32IA-NEXT:    mv a5, zero
> +; RV32IA-NEXT:    call __atomic_compare_exchange_8
> +; RV32IA-NEXT:    lw a1, 4(sp)
> +; RV32IA-NEXT:    lw a2, 0(sp)
> +; RV32IA-NEXT:    beqz a0, .LBB205_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -15654,25 +15428,23 @@ define i64 @atomicrmw_min_i64_monotonic(
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    blt s0, a2, .LBB205_3
> ; RV64I-NEXT:  .LBB205_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bge s0, a2, .LBB205_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB205_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB205_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB205_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    mv a3, zero
> ; RV64I-NEXT:    mv a4, zero
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB205_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB205_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bge s0, a2, .LBB205_1
> -; RV64I-NEXT:  .LBB205_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB205_1
> -; RV64I-NEXT:  .LBB205_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB205_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -15704,35 +15476,37 @@ define i64 @atomicrmw_min_i64_acquire(i6
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB206_3
> -; RV32I-NEXT:    j .LBB206_4
> ; RV32I-NEXT:  .LBB206_1: # %atomicrmw.start
> -; RV32I-NEXT:    sw a1, 4(sp)
> -; RV32I-NEXT:    mv a0, s1
> -; RV32I-NEXT:    mv a1, s3
> -; RV32I-NEXT:    addi a4, zero, 2
> -; RV32I-NEXT:    addi a5, zero, 2
> -; RV32I-NEXT:    call __atomic_compare_exchange_8
> -; RV32I-NEXT:    lw a1, 4(sp)
> -; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB206_7
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB206_3
> ; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB206_4
> -; RV32I-NEXT:  .LBB206_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB206_1 Depth=1
> ; RV32I-NEXT:    slt a0, s0, a1
> -; RV32I-NEXT:    j .LBB206_5
> -; RV32I-NEXT:  .LBB206_4:
> +; RV32I-NEXT:    j .LBB206_4
> +; RV32I-NEXT:  .LBB206_3: # in Loop: Header=BB206_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB206_5: # %atomicrmw.start
> +; RV32I-NEXT:  .LBB206_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB206_1 Depth=1
> ; RV32I-NEXT:    xori a0, a0, 1
> ; RV32I-NEXT:    sw a2, 0(sp)
> ; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB206_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32I-NEXT:    bnez a0, .LBB206_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB206_1 Depth=1
> ; RV32I-NEXT:    mv a2, s2
> ; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB206_1
> -; RV32I-NEXT:  .LBB206_7: # %atomicrmw.end
> +; RV32I-NEXT:  .LBB206_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB206_1 Depth=1
> +; RV32I-NEXT:    sw a1, 4(sp)
> +; RV32I-NEXT:    mv a0, s1
> +; RV32I-NEXT:    mv a1, s3
> +; RV32I-NEXT:    addi a4, zero, 2
> +; RV32I-NEXT:    addi a5, zero, 2
> +; RV32I-NEXT:    call __atomic_compare_exchange_8
> +; RV32I-NEXT:    lw a1, 4(sp)
> +; RV32I-NEXT:    lw a2, 0(sp)
> +; RV32I-NEXT:    beqz a0, .LBB206_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -15756,35 +15530,37 @@ define i64 @atomicrmw_min_i64_acquire(i6
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB206_3
> -; RV32IA-NEXT:    j .LBB206_4
> ; RV32IA-NEXT:  .LBB206_1: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a1, 4(sp)
> -; RV32IA-NEXT:    mv a0, s1
> -; RV32IA-NEXT:    mv a1, s3
> -; RV32IA-NEXT:    addi a4, zero, 2
> -; RV32IA-NEXT:    addi a5, zero, 2
> -; RV32IA-NEXT:    call __atomic_compare_exchange_8
> -; RV32IA-NEXT:    lw a1, 4(sp)
> -; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB206_7
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB206_3
> ; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB206_4
> -; RV32IA-NEXT:  .LBB206_3: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB206_1 Depth=1
> ; RV32IA-NEXT:    slt a0, s0, a1
> -; RV32IA-NEXT:    j .LBB206_5
> -; RV32IA-NEXT:  .LBB206_4:
> +; RV32IA-NEXT:    j .LBB206_4
> +; RV32IA-NEXT:  .LBB206_3: # in Loop: Header=BB206_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB206_5: # %atomicrmw.start
> +; RV32IA-NEXT:  .LBB206_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB206_1 Depth=1
> ; RV32IA-NEXT:    xori a0, a0, 1
> ; RV32IA-NEXT:    sw a2, 0(sp)
> ; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB206_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32IA-NEXT:    bnez a0, .LBB206_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB206_1 Depth=1
> ; RV32IA-NEXT:    mv a2, s2
> ; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB206_1
> -; RV32IA-NEXT:  .LBB206_7: # %atomicrmw.end
> +; RV32IA-NEXT:  .LBB206_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB206_1 Depth=1
> +; RV32IA-NEXT:    sw a1, 4(sp)
> +; RV32IA-NEXT:    mv a0, s1
> +; RV32IA-NEXT:    mv a1, s3
> +; RV32IA-NEXT:    addi a4, zero, 2
> +; RV32IA-NEXT:    addi a5, zero, 2
> +; RV32IA-NEXT:    call __atomic_compare_exchange_8
> +; RV32IA-NEXT:    lw a1, 4(sp)
> +; RV32IA-NEXT:    lw a2, 0(sp)
> +; RV32IA-NEXT:    beqz a0, .LBB206_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -15805,25 +15581,23 @@ define i64 @atomicrmw_min_i64_acquire(i6
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    blt s0, a2, .LBB206_3
> ; RV64I-NEXT:  .LBB206_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bge s0, a2, .LBB206_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB206_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB206_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB206_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 2
> ; RV64I-NEXT:    addi a4, zero, 2
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB206_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB206_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bge s0, a2, .LBB206_1
> -; RV64I-NEXT:  .LBB206_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB206_1
> -; RV64I-NEXT:  .LBB206_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB206_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -15855,35 +15629,37 @@ define i64 @atomicrmw_min_i64_release(i6
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB207_3
> -; RV32I-NEXT:    j .LBB207_4
> ; RV32I-NEXT:  .LBB207_1: # %atomicrmw.start
> -; RV32I-NEXT:    sw a1, 4(sp)
> -; RV32I-NEXT:    mv a0, s1
> -; RV32I-NEXT:    mv a1, s3
> -; RV32I-NEXT:    addi a4, zero, 3
> -; RV32I-NEXT:    mv a5, zero
> -; RV32I-NEXT:    call __atomic_compare_exchange_8
> -; RV32I-NEXT:    lw a1, 4(sp)
> -; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB207_7
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB207_3
> ; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB207_4
> -; RV32I-NEXT:  .LBB207_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB207_1 Depth=1
> ; RV32I-NEXT:    slt a0, s0, a1
> -; RV32I-NEXT:    j .LBB207_5
> -; RV32I-NEXT:  .LBB207_4:
> +; RV32I-NEXT:    j .LBB207_4
> +; RV32I-NEXT:  .LBB207_3: # in Loop: Header=BB207_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB207_5: # %atomicrmw.start
> +; RV32I-NEXT:  .LBB207_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB207_1 Depth=1
> ; RV32I-NEXT:    xori a0, a0, 1
> ; RV32I-NEXT:    sw a2, 0(sp)
> ; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB207_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32I-NEXT:    bnez a0, .LBB207_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB207_1 Depth=1
> ; RV32I-NEXT:    mv a2, s2
> ; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB207_1
> -; RV32I-NEXT:  .LBB207_7: # %atomicrmw.end
> +; RV32I-NEXT:  .LBB207_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB207_1 Depth=1
> +; RV32I-NEXT:    sw a1, 4(sp)
> +; RV32I-NEXT:    mv a0, s1
> +; RV32I-NEXT:    mv a1, s3
> +; RV32I-NEXT:    addi a4, zero, 3
> +; RV32I-NEXT:    mv a5, zero
> +; RV32I-NEXT:    call __atomic_compare_exchange_8
> +; RV32I-NEXT:    lw a1, 4(sp)
> +; RV32I-NEXT:    lw a2, 0(sp)
> +; RV32I-NEXT:    beqz a0, .LBB207_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -15907,35 +15683,37 @@ define i64 @atomicrmw_min_i64_release(i6
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB207_3
> -; RV32IA-NEXT:    j .LBB207_4
> ; RV32IA-NEXT:  .LBB207_1: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a1, 4(sp)
> -; RV32IA-NEXT:    mv a0, s1
> -; RV32IA-NEXT:    mv a1, s3
> -; RV32IA-NEXT:    addi a4, zero, 3
> -; RV32IA-NEXT:    mv a5, zero
> -; RV32IA-NEXT:    call __atomic_compare_exchange_8
> -; RV32IA-NEXT:    lw a1, 4(sp)
> -; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB207_7
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB207_3
> ; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB207_4
> -; RV32IA-NEXT:  .LBB207_3: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB207_1 Depth=1
> ; RV32IA-NEXT:    slt a0, s0, a1
> -; RV32IA-NEXT:    j .LBB207_5
> -; RV32IA-NEXT:  .LBB207_4:
> +; RV32IA-NEXT:    j .LBB207_4
> +; RV32IA-NEXT:  .LBB207_3: # in Loop: Header=BB207_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB207_5: # %atomicrmw.start
> +; RV32IA-NEXT:  .LBB207_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB207_1 Depth=1
> ; RV32IA-NEXT:    xori a0, a0, 1
> ; RV32IA-NEXT:    sw a2, 0(sp)
> ; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB207_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32IA-NEXT:    bnez a0, .LBB207_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB207_1 Depth=1
> ; RV32IA-NEXT:    mv a2, s2
> ; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB207_1
> -; RV32IA-NEXT:  .LBB207_7: # %atomicrmw.end
> +; RV32IA-NEXT:  .LBB207_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB207_1 Depth=1
> +; RV32IA-NEXT:    sw a1, 4(sp)
> +; RV32IA-NEXT:    mv a0, s1
> +; RV32IA-NEXT:    mv a1, s3
> +; RV32IA-NEXT:    addi a4, zero, 3
> +; RV32IA-NEXT:    mv a5, zero
> +; RV32IA-NEXT:    call __atomic_compare_exchange_8
> +; RV32IA-NEXT:    lw a1, 4(sp)
> +; RV32IA-NEXT:    lw a2, 0(sp)
> +; RV32IA-NEXT:    beqz a0, .LBB207_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -15956,25 +15734,23 @@ define i64 @atomicrmw_min_i64_release(i6
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    blt s0, a2, .LBB207_3
> ; RV64I-NEXT:  .LBB207_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bge s0, a2, .LBB207_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB207_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB207_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB207_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 3
> ; RV64I-NEXT:    mv a4, zero
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB207_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB207_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bge s0, a2, .LBB207_1
> -; RV64I-NEXT:  .LBB207_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB207_1
> -; RV64I-NEXT:  .LBB207_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB207_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -16006,35 +15782,37 @@ define i64 @atomicrmw_min_i64_acq_rel(i6
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB208_3
> -; RV32I-NEXT:    j .LBB208_4
> ; RV32I-NEXT:  .LBB208_1: # %atomicrmw.start
> -; RV32I-NEXT:    sw a1, 4(sp)
> -; RV32I-NEXT:    mv a0, s1
> -; RV32I-NEXT:    mv a1, s3
> -; RV32I-NEXT:    addi a4, zero, 4
> -; RV32I-NEXT:    addi a5, zero, 2
> -; RV32I-NEXT:    call __atomic_compare_exchange_8
> -; RV32I-NEXT:    lw a1, 4(sp)
> -; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB208_7
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB208_3
> ; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB208_4
> -; RV32I-NEXT:  .LBB208_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB208_1 Depth=1
> ; RV32I-NEXT:    slt a0, s0, a1
> -; RV32I-NEXT:    j .LBB208_5
> -; RV32I-NEXT:  .LBB208_4:
> +; RV32I-NEXT:    j .LBB208_4
> +; RV32I-NEXT:  .LBB208_3: # in Loop: Header=BB208_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB208_5: # %atomicrmw.start
> +; RV32I-NEXT:  .LBB208_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB208_1 Depth=1
> ; RV32I-NEXT:    xori a0, a0, 1
> ; RV32I-NEXT:    sw a2, 0(sp)
> ; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB208_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32I-NEXT:    bnez a0, .LBB208_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB208_1 Depth=1
> ; RV32I-NEXT:    mv a2, s2
> ; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB208_1
> -; RV32I-NEXT:  .LBB208_7: # %atomicrmw.end
> +; RV32I-NEXT:  .LBB208_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB208_1 Depth=1
> +; RV32I-NEXT:    sw a1, 4(sp)
> +; RV32I-NEXT:    mv a0, s1
> +; RV32I-NEXT:    mv a1, s3
> +; RV32I-NEXT:    addi a4, zero, 4
> +; RV32I-NEXT:    addi a5, zero, 2
> +; RV32I-NEXT:    call __atomic_compare_exchange_8
> +; RV32I-NEXT:    lw a1, 4(sp)
> +; RV32I-NEXT:    lw a2, 0(sp)
> +; RV32I-NEXT:    beqz a0, .LBB208_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -16058,35 +15836,37 @@ define i64 @atomicrmw_min_i64_acq_rel(i6
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB208_3
> -; RV32IA-NEXT:    j .LBB208_4
> ; RV32IA-NEXT:  .LBB208_1: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a1, 4(sp)
> -; RV32IA-NEXT:    mv a0, s1
> -; RV32IA-NEXT:    mv a1, s3
> -; RV32IA-NEXT:    addi a4, zero, 4
> -; RV32IA-NEXT:    addi a5, zero, 2
> -; RV32IA-NEXT:    call __atomic_compare_exchange_8
> -; RV32IA-NEXT:    lw a1, 4(sp)
> -; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB208_7
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB208_3
> ; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB208_4
> -; RV32IA-NEXT:  .LBB208_3: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB208_1 Depth=1
> ; RV32IA-NEXT:    slt a0, s0, a1
> -; RV32IA-NEXT:    j .LBB208_5
> -; RV32IA-NEXT:  .LBB208_4:
> +; RV32IA-NEXT:    j .LBB208_4
> +; RV32IA-NEXT:  .LBB208_3: # in Loop: Header=BB208_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB208_5: # %atomicrmw.start
> +; RV32IA-NEXT:  .LBB208_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB208_1 Depth=1
> ; RV32IA-NEXT:    xori a0, a0, 1
> ; RV32IA-NEXT:    sw a2, 0(sp)
> ; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB208_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32IA-NEXT:    bnez a0, .LBB208_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB208_1 Depth=1
> ; RV32IA-NEXT:    mv a2, s2
> ; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB208_1
> -; RV32IA-NEXT:  .LBB208_7: # %atomicrmw.end
> +; RV32IA-NEXT:  .LBB208_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB208_1 Depth=1
> +; RV32IA-NEXT:    sw a1, 4(sp)
> +; RV32IA-NEXT:    mv a0, s1
> +; RV32IA-NEXT:    mv a1, s3
> +; RV32IA-NEXT:    addi a4, zero, 4
> +; RV32IA-NEXT:    addi a5, zero, 2
> +; RV32IA-NEXT:    call __atomic_compare_exchange_8
> +; RV32IA-NEXT:    lw a1, 4(sp)
> +; RV32IA-NEXT:    lw a2, 0(sp)
> +; RV32IA-NEXT:    beqz a0, .LBB208_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -16107,25 +15887,23 @@ define i64 @atomicrmw_min_i64_acq_rel(i6
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    blt s0, a2, .LBB208_3
> ; RV64I-NEXT:  .LBB208_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bge s0, a2, .LBB208_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB208_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB208_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB208_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 4
> ; RV64I-NEXT:    addi a4, zero, 2
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB208_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB208_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bge s0, a2, .LBB208_1
> -; RV64I-NEXT:  .LBB208_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB208_1
> -; RV64I-NEXT:  .LBB208_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB208_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -16157,35 +15935,37 @@ define i64 @atomicrmw_min_i64_seq_cst(i6
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB209_3
> -; RV32I-NEXT:    j .LBB209_4
> ; RV32I-NEXT:  .LBB209_1: # %atomicrmw.start
> -; RV32I-NEXT:    sw a1, 4(sp)
> -; RV32I-NEXT:    mv a0, s1
> -; RV32I-NEXT:    mv a1, s3
> -; RV32I-NEXT:    addi a4, zero, 5
> -; RV32I-NEXT:    addi a5, zero, 5
> -; RV32I-NEXT:    call __atomic_compare_exchange_8
> -; RV32I-NEXT:    lw a1, 4(sp)
> -; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB209_7
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB209_3
> ; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB209_4
> -; RV32I-NEXT:  .LBB209_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB209_1 Depth=1
> ; RV32I-NEXT:    slt a0, s0, a1
> -; RV32I-NEXT:    j .LBB209_5
> -; RV32I-NEXT:  .LBB209_4:
> +; RV32I-NEXT:    j .LBB209_4
> +; RV32I-NEXT:  .LBB209_3: # in Loop: Header=BB209_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB209_5: # %atomicrmw.start
> +; RV32I-NEXT:  .LBB209_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB209_1 Depth=1
> ; RV32I-NEXT:    xori a0, a0, 1
> ; RV32I-NEXT:    sw a2, 0(sp)
> ; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB209_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32I-NEXT:    bnez a0, .LBB209_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB209_1 Depth=1
> ; RV32I-NEXT:    mv a2, s2
> ; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB209_1
> -; RV32I-NEXT:  .LBB209_7: # %atomicrmw.end
> +; RV32I-NEXT:  .LBB209_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB209_1 Depth=1
> +; RV32I-NEXT:    sw a1, 4(sp)
> +; RV32I-NEXT:    mv a0, s1
> +; RV32I-NEXT:    mv a1, s3
> +; RV32I-NEXT:    addi a4, zero, 5
> +; RV32I-NEXT:    addi a5, zero, 5
> +; RV32I-NEXT:    call __atomic_compare_exchange_8
> +; RV32I-NEXT:    lw a1, 4(sp)
> +; RV32I-NEXT:    lw a2, 0(sp)
> +; RV32I-NEXT:    beqz a0, .LBB209_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -16209,35 +15989,37 @@ define i64 @atomicrmw_min_i64_seq_cst(i6
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB209_3
> -; RV32IA-NEXT:    j .LBB209_4
> ; RV32IA-NEXT:  .LBB209_1: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a1, 4(sp)
> -; RV32IA-NEXT:    mv a0, s1
> -; RV32IA-NEXT:    mv a1, s3
> -; RV32IA-NEXT:    addi a4, zero, 5
> -; RV32IA-NEXT:    addi a5, zero, 5
> -; RV32IA-NEXT:    call __atomic_compare_exchange_8
> -; RV32IA-NEXT:    lw a1, 4(sp)
> -; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB209_7
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB209_3
> ; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB209_4
> -; RV32IA-NEXT:  .LBB209_3: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB209_1 Depth=1
> ; RV32IA-NEXT:    slt a0, s0, a1
> -; RV32IA-NEXT:    j .LBB209_5
> -; RV32IA-NEXT:  .LBB209_4:
> +; RV32IA-NEXT:    j .LBB209_4
> +; RV32IA-NEXT:  .LBB209_3: # in Loop: Header=BB209_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB209_5: # %atomicrmw.start
> +; RV32IA-NEXT:  .LBB209_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB209_1 Depth=1
> ; RV32IA-NEXT:    xori a0, a0, 1
> ; RV32IA-NEXT:    sw a2, 0(sp)
> ; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB209_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32IA-NEXT:    bnez a0, .LBB209_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB209_1 Depth=1
> ; RV32IA-NEXT:    mv a2, s2
> ; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB209_1
> -; RV32IA-NEXT:  .LBB209_7: # %atomicrmw.end
> +; RV32IA-NEXT:  .LBB209_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB209_1 Depth=1
> +; RV32IA-NEXT:    sw a1, 4(sp)
> +; RV32IA-NEXT:    mv a0, s1
> +; RV32IA-NEXT:    mv a1, s3
> +; RV32IA-NEXT:    addi a4, zero, 5
> +; RV32IA-NEXT:    addi a5, zero, 5
> +; RV32IA-NEXT:    call __atomic_compare_exchange_8
> +; RV32IA-NEXT:    lw a1, 4(sp)
> +; RV32IA-NEXT:    lw a2, 0(sp)
> +; RV32IA-NEXT:    beqz a0, .LBB209_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -16258,25 +16040,23 @@ define i64 @atomicrmw_min_i64_seq_cst(i6
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    blt s0, a2, .LBB209_3
> ; RV64I-NEXT:  .LBB209_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bge s0, a2, .LBB209_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB209_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB209_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB209_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 5
> ; RV64I-NEXT:    addi a4, zero, 5
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB209_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB209_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bge s0, a2, .LBB209_1
> -; RV64I-NEXT:  .LBB209_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB209_1
> -; RV64I-NEXT:  .LBB209_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB209_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -16308,9 +16088,26 @@ define i64 @atomicrmw_umax_i64_monotonic
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB210_3
> -; RV32I-NEXT:    j .LBB210_4
> ; RV32I-NEXT:  .LBB210_1: # %atomicrmw.start
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB210_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB210_1 Depth=1
> +; RV32I-NEXT:    sltu a0, s0, a1
> +; RV32I-NEXT:    j .LBB210_4
> +; RV32I-NEXT:  .LBB210_3: # in Loop: Header=BB210_1 Depth=1
> +; RV32I-NEXT:    sltu a0, s2, a2
> +; RV32I-NEXT:  .LBB210_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB210_1 Depth=1
> +; RV32I-NEXT:    sw a2, 0(sp)
> +; RV32I-NEXT:    mv a3, a1
> +; RV32I-NEXT:    bnez a0, .LBB210_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB210_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:    mv a3, s0
> +; RV32I-NEXT:  .LBB210_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB210_1 Depth=1
> ; RV32I-NEXT:    sw a1, 4(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -16319,23 +16116,8 @@ define i64 @atomicrmw_umax_i64_monotonic
> ; RV32I-NEXT:    call __atomic_compare_exchange_8
> ; RV32I-NEXT:    lw a1, 4(sp)
> ; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB210_7
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB210_4
> -; RV32I-NEXT:  .LBB210_3: # %atomicrmw.start
> -; RV32I-NEXT:    sltu a0, s0, a1
> -; RV32I-NEXT:    j .LBB210_5
> -; RV32I-NEXT:  .LBB210_4:
> -; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB210_5: # %atomicrmw.start
> -; RV32I-NEXT:    sw a2, 0(sp)
> -; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB210_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB210_1
> -; RV32I-NEXT:  .LBB210_7: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB210_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -16359,9 +16141,26 @@ define i64 @atomicrmw_umax_i64_monotonic
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB210_3
> -; RV32IA-NEXT:    j .LBB210_4
> ; RV32IA-NEXT:  .LBB210_1: # %atomicrmw.start
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB210_3
> +; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB210_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s0, a1
> +; RV32IA-NEXT:    j .LBB210_4
> +; RV32IA-NEXT:  .LBB210_3: # in Loop: Header=BB210_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s2, a2
> +; RV32IA-NEXT:  .LBB210_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB210_1 Depth=1
> +; RV32IA-NEXT:    sw a2, 0(sp)
> +; RV32IA-NEXT:    mv a3, a1
> +; RV32IA-NEXT:    bnez a0, .LBB210_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB210_1 Depth=1
> +; RV32IA-NEXT:    mv a2, s2
> +; RV32IA-NEXT:    mv a3, s0
> +; RV32IA-NEXT:  .LBB210_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB210_1 Depth=1
> ; RV32IA-NEXT:    sw a1, 4(sp)
> ; RV32IA-NEXT:    mv a0, s1
> ; RV32IA-NEXT:    mv a1, s3
> @@ -16370,23 +16169,8 @@ define i64 @atomicrmw_umax_i64_monotonic
> ; RV32IA-NEXT:    call __atomic_compare_exchange_8
> ; RV32IA-NEXT:    lw a1, 4(sp)
> ; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB210_7
> -; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB210_4
> -; RV32IA-NEXT:  .LBB210_3: # %atomicrmw.start
> -; RV32IA-NEXT:    sltu a0, s0, a1
> -; RV32IA-NEXT:    j .LBB210_5
> -; RV32IA-NEXT:  .LBB210_4:
> -; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB210_5: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a2, 0(sp)
> -; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB210_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32IA-NEXT:    mv a2, s2
> -; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB210_1
> -; RV32IA-NEXT:  .LBB210_7: # %atomicrmw.end
> +; RV32IA-NEXT:    beqz a0, .LBB210_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -16407,25 +16191,23 @@ define i64 @atomicrmw_umax_i64_monotonic
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bgeu s0, a2, .LBB210_3
> ; RV64I-NEXT:  .LBB210_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bltu s0, a2, .LBB210_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB210_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB210_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB210_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    mv a3, zero
> ; RV64I-NEXT:    mv a4, zero
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB210_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB210_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bltu s0, a2, .LBB210_1
> -; RV64I-NEXT:  .LBB210_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB210_1
> -; RV64I-NEXT:  .LBB210_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB210_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -16457,9 +16239,26 @@ define i64 @atomicrmw_umax_i64_acquire(i
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB211_3
> -; RV32I-NEXT:    j .LBB211_4
> ; RV32I-NEXT:  .LBB211_1: # %atomicrmw.start
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB211_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB211_1 Depth=1
> +; RV32I-NEXT:    sltu a0, s0, a1
> +; RV32I-NEXT:    j .LBB211_4
> +; RV32I-NEXT:  .LBB211_3: # in Loop: Header=BB211_1 Depth=1
> +; RV32I-NEXT:    sltu a0, s2, a2
> +; RV32I-NEXT:  .LBB211_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB211_1 Depth=1
> +; RV32I-NEXT:    sw a2, 0(sp)
> +; RV32I-NEXT:    mv a3, a1
> +; RV32I-NEXT:    bnez a0, .LBB211_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB211_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:    mv a3, s0
> +; RV32I-NEXT:  .LBB211_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB211_1 Depth=1
> ; RV32I-NEXT:    sw a1, 4(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -16468,23 +16267,8 @@ define i64 @atomicrmw_umax_i64_acquire(i
> ; RV32I-NEXT:    call __atomic_compare_exchange_8
> ; RV32I-NEXT:    lw a1, 4(sp)
> ; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB211_7
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB211_4
> -; RV32I-NEXT:  .LBB211_3: # %atomicrmw.start
> -; RV32I-NEXT:    sltu a0, s0, a1
> -; RV32I-NEXT:    j .LBB211_5
> -; RV32I-NEXT:  .LBB211_4:
> -; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB211_5: # %atomicrmw.start
> -; RV32I-NEXT:    sw a2, 0(sp)
> -; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB211_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB211_1
> -; RV32I-NEXT:  .LBB211_7: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB211_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -16508,9 +16292,26 @@ define i64 @atomicrmw_umax_i64_acquire(i
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB211_3
> -; RV32IA-NEXT:    j .LBB211_4
> ; RV32IA-NEXT:  .LBB211_1: # %atomicrmw.start
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB211_3
> +; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB211_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s0, a1
> +; RV32IA-NEXT:    j .LBB211_4
> +; RV32IA-NEXT:  .LBB211_3: # in Loop: Header=BB211_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s2, a2
> +; RV32IA-NEXT:  .LBB211_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB211_1 Depth=1
> +; RV32IA-NEXT:    sw a2, 0(sp)
> +; RV32IA-NEXT:    mv a3, a1
> +; RV32IA-NEXT:    bnez a0, .LBB211_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB211_1 Depth=1
> +; RV32IA-NEXT:    mv a2, s2
> +; RV32IA-NEXT:    mv a3, s0
> +; RV32IA-NEXT:  .LBB211_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB211_1 Depth=1
> ; RV32IA-NEXT:    sw a1, 4(sp)
> ; RV32IA-NEXT:    mv a0, s1
> ; RV32IA-NEXT:    mv a1, s3
> @@ -16519,23 +16320,8 @@ define i64 @atomicrmw_umax_i64_acquire(i
> ; RV32IA-NEXT:    call __atomic_compare_exchange_8
> ; RV32IA-NEXT:    lw a1, 4(sp)
> ; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB211_7
> -; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB211_4
> -; RV32IA-NEXT:  .LBB211_3: # %atomicrmw.start
> -; RV32IA-NEXT:    sltu a0, s0, a1
> -; RV32IA-NEXT:    j .LBB211_5
> -; RV32IA-NEXT:  .LBB211_4:
> -; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB211_5: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a2, 0(sp)
> -; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB211_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32IA-NEXT:    mv a2, s2
> -; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB211_1
> -; RV32IA-NEXT:  .LBB211_7: # %atomicrmw.end
> +; RV32IA-NEXT:    beqz a0, .LBB211_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -16556,25 +16342,23 @@ define i64 @atomicrmw_umax_i64_acquire(i
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bgeu s0, a2, .LBB211_3
> ; RV64I-NEXT:  .LBB211_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bltu s0, a2, .LBB211_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB211_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB211_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB211_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 2
> ; RV64I-NEXT:    addi a4, zero, 2
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB211_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB211_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bltu s0, a2, .LBB211_1
> -; RV64I-NEXT:  .LBB211_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB211_1
> -; RV64I-NEXT:  .LBB211_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB211_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -16606,34 +16390,36 @@ define i64 @atomicrmw_umax_i64_release(i
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB212_3
> -; RV32I-NEXT:    j .LBB212_4
> ; RV32I-NEXT:  .LBB212_1: # %atomicrmw.start
> -; RV32I-NEXT:    sw a1, 4(sp)
> -; RV32I-NEXT:    mv a0, s1
> -; RV32I-NEXT:    mv a1, s3
> -; RV32I-NEXT:    addi a4, zero, 3
> -; RV32I-NEXT:    mv a5, zero
> -; RV32I-NEXT:    call __atomic_compare_exchange_8
> -; RV32I-NEXT:    lw a1, 4(sp)
> -; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB212_7
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB212_3
> ; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB212_4
> -; RV32I-NEXT:  .LBB212_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB212_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s0, a1
> -; RV32I-NEXT:    j .LBB212_5
> -; RV32I-NEXT:  .LBB212_4:
> +; RV32I-NEXT:    j .LBB212_4
> +; RV32I-NEXT:  .LBB212_3: # in Loop: Header=BB212_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB212_5: # %atomicrmw.start
> +; RV32I-NEXT:  .LBB212_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB212_1 Depth=1
> ; RV32I-NEXT:    sw a2, 0(sp)
> ; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB212_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32I-NEXT:    bnez a0, .LBB212_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB212_1 Depth=1
> ; RV32I-NEXT:    mv a2, s2
> ; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB212_1
> -; RV32I-NEXT:  .LBB212_7: # %atomicrmw.end
> +; RV32I-NEXT:  .LBB212_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB212_1 Depth=1
> +; RV32I-NEXT:    sw a1, 4(sp)
> +; RV32I-NEXT:    mv a0, s1
> +; RV32I-NEXT:    mv a1, s3
> +; RV32I-NEXT:    addi a4, zero, 3
> +; RV32I-NEXT:    mv a5, zero
> +; RV32I-NEXT:    call __atomic_compare_exchange_8
> +; RV32I-NEXT:    lw a1, 4(sp)
> +; RV32I-NEXT:    lw a2, 0(sp)
> +; RV32I-NEXT:    beqz a0, .LBB212_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -16657,9 +16443,26 @@ define i64 @atomicrmw_umax_i64_release(i
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB212_3
> -; RV32IA-NEXT:    j .LBB212_4
> ; RV32IA-NEXT:  .LBB212_1: # %atomicrmw.start
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB212_3
> +; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB212_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s0, a1
> +; RV32IA-NEXT:    j .LBB212_4
> +; RV32IA-NEXT:  .LBB212_3: # in Loop: Header=BB212_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s2, a2
> +; RV32IA-NEXT:  .LBB212_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB212_1 Depth=1
> +; RV32IA-NEXT:    sw a2, 0(sp)
> +; RV32IA-NEXT:    mv a3, a1
> +; RV32IA-NEXT:    bnez a0, .LBB212_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB212_1 Depth=1
> +; RV32IA-NEXT:    mv a2, s2
> +; RV32IA-NEXT:    mv a3, s0
> +; RV32IA-NEXT:  .LBB212_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB212_1 Depth=1
> ; RV32IA-NEXT:    sw a1, 4(sp)
> ; RV32IA-NEXT:    mv a0, s1
> ; RV32IA-NEXT:    mv a1, s3
> @@ -16668,23 +16471,8 @@ define i64 @atomicrmw_umax_i64_release(i
> ; RV32IA-NEXT:    call __atomic_compare_exchange_8
> ; RV32IA-NEXT:    lw a1, 4(sp)
> ; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB212_7
> -; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB212_4
> -; RV32IA-NEXT:  .LBB212_3: # %atomicrmw.start
> -; RV32IA-NEXT:    sltu a0, s0, a1
> -; RV32IA-NEXT:    j .LBB212_5
> -; RV32IA-NEXT:  .LBB212_4:
> -; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB212_5: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a2, 0(sp)
> -; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB212_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32IA-NEXT:    mv a2, s2
> -; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB212_1
> -; RV32IA-NEXT:  .LBB212_7: # %atomicrmw.end
> +; RV32IA-NEXT:    beqz a0, .LBB212_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -16705,25 +16493,23 @@ define i64 @atomicrmw_umax_i64_release(i
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bgeu s0, a2, .LBB212_3
> ; RV64I-NEXT:  .LBB212_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bltu s0, a2, .LBB212_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB212_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB212_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB212_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 3
> ; RV64I-NEXT:    mv a4, zero
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB212_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB212_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bltu s0, a2, .LBB212_1
> -; RV64I-NEXT:  .LBB212_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB212_1
> -; RV64I-NEXT:  .LBB212_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB212_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -16755,9 +16541,26 @@ define i64 @atomicrmw_umax_i64_acq_rel(i
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB213_3
> -; RV32I-NEXT:    j .LBB213_4
> ; RV32I-NEXT:  .LBB213_1: # %atomicrmw.start
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB213_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB213_1 Depth=1
> +; RV32I-NEXT:    sltu a0, s0, a1
> +; RV32I-NEXT:    j .LBB213_4
> +; RV32I-NEXT:  .LBB213_3: # in Loop: Header=BB213_1 Depth=1
> +; RV32I-NEXT:    sltu a0, s2, a2
> +; RV32I-NEXT:  .LBB213_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB213_1 Depth=1
> +; RV32I-NEXT:    sw a2, 0(sp)
> +; RV32I-NEXT:    mv a3, a1
> +; RV32I-NEXT:    bnez a0, .LBB213_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB213_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:    mv a3, s0
> +; RV32I-NEXT:  .LBB213_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB213_1 Depth=1
> ; RV32I-NEXT:    sw a1, 4(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -16766,23 +16569,8 @@ define i64 @atomicrmw_umax_i64_acq_rel(i
> ; RV32I-NEXT:    call __atomic_compare_exchange_8
> ; RV32I-NEXT:    lw a1, 4(sp)
> ; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB213_7
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB213_4
> -; RV32I-NEXT:  .LBB213_3: # %atomicrmw.start
> -; RV32I-NEXT:    sltu a0, s0, a1
> -; RV32I-NEXT:    j .LBB213_5
> -; RV32I-NEXT:  .LBB213_4:
> -; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB213_5: # %atomicrmw.start
> -; RV32I-NEXT:    sw a2, 0(sp)
> -; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB213_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB213_1
> -; RV32I-NEXT:  .LBB213_7: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB213_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -16806,9 +16594,26 @@ define i64 @atomicrmw_umax_i64_acq_rel(i
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB213_3
> -; RV32IA-NEXT:    j .LBB213_4
> ; RV32IA-NEXT:  .LBB213_1: # %atomicrmw.start
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB213_3
> +; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB213_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s0, a1
> +; RV32IA-NEXT:    j .LBB213_4
> +; RV32IA-NEXT:  .LBB213_3: # in Loop: Header=BB213_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s2, a2
> +; RV32IA-NEXT:  .LBB213_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB213_1 Depth=1
> +; RV32IA-NEXT:    sw a2, 0(sp)
> +; RV32IA-NEXT:    mv a3, a1
> +; RV32IA-NEXT:    bnez a0, .LBB213_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB213_1 Depth=1
> +; RV32IA-NEXT:    mv a2, s2
> +; RV32IA-NEXT:    mv a3, s0
> +; RV32IA-NEXT:  .LBB213_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB213_1 Depth=1
> ; RV32IA-NEXT:    sw a1, 4(sp)
> ; RV32IA-NEXT:    mv a0, s1
> ; RV32IA-NEXT:    mv a1, s3
> @@ -16817,23 +16622,8 @@ define i64 @atomicrmw_umax_i64_acq_rel(i
> ; RV32IA-NEXT:    call __atomic_compare_exchange_8
> ; RV32IA-NEXT:    lw a1, 4(sp)
> ; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB213_7
> -; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB213_4
> -; RV32IA-NEXT:  .LBB213_3: # %atomicrmw.start
> -; RV32IA-NEXT:    sltu a0, s0, a1
> -; RV32IA-NEXT:    j .LBB213_5
> -; RV32IA-NEXT:  .LBB213_4:
> -; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB213_5: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a2, 0(sp)
> -; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB213_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32IA-NEXT:    mv a2, s2
> -; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB213_1
> -; RV32IA-NEXT:  .LBB213_7: # %atomicrmw.end
> +; RV32IA-NEXT:    beqz a0, .LBB213_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -16854,25 +16644,23 @@ define i64 @atomicrmw_umax_i64_acq_rel(i
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bgeu s0, a2, .LBB213_3
> ; RV64I-NEXT:  .LBB213_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bltu s0, a2, .LBB213_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB213_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB213_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB213_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 4
> ; RV64I-NEXT:    addi a4, zero, 2
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB213_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB213_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bltu s0, a2, .LBB213_1
> -; RV64I-NEXT:  .LBB213_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB213_1
> -; RV64I-NEXT:  .LBB213_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB213_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -16904,9 +16692,26 @@ define i64 @atomicrmw_umax_i64_seq_cst(i
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB214_3
> -; RV32I-NEXT:    j .LBB214_4
> ; RV32I-NEXT:  .LBB214_1: # %atomicrmw.start
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB214_3
> +; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB214_1 Depth=1
> +; RV32I-NEXT:    sltu a0, s0, a1
> +; RV32I-NEXT:    j .LBB214_4
> +; RV32I-NEXT:  .LBB214_3: # in Loop: Header=BB214_1 Depth=1
> +; RV32I-NEXT:    sltu a0, s2, a2
> +; RV32I-NEXT:  .LBB214_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB214_1 Depth=1
> +; RV32I-NEXT:    sw a2, 0(sp)
> +; RV32I-NEXT:    mv a3, a1
> +; RV32I-NEXT:    bnez a0, .LBB214_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB214_1 Depth=1
> +; RV32I-NEXT:    mv a2, s2
> +; RV32I-NEXT:    mv a3, s0
> +; RV32I-NEXT:  .LBB214_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB214_1 Depth=1
> ; RV32I-NEXT:    sw a1, 4(sp)
> ; RV32I-NEXT:    mv a0, s1
> ; RV32I-NEXT:    mv a1, s3
> @@ -16915,23 +16720,8 @@ define i64 @atomicrmw_umax_i64_seq_cst(i
> ; RV32I-NEXT:    call __atomic_compare_exchange_8
> ; RV32I-NEXT:    lw a1, 4(sp)
> ; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB214_7
> -; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB214_4
> -; RV32I-NEXT:  .LBB214_3: # %atomicrmw.start
> -; RV32I-NEXT:    sltu a0, s0, a1
> -; RV32I-NEXT:    j .LBB214_5
> -; RV32I-NEXT:  .LBB214_4:
> -; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB214_5: # %atomicrmw.start
> -; RV32I-NEXT:    sw a2, 0(sp)
> -; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB214_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32I-NEXT:    mv a2, s2
> -; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB214_1
> -; RV32I-NEXT:  .LBB214_7: # %atomicrmw.end
> +; RV32I-NEXT:    beqz a0, .LBB214_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -16955,9 +16745,26 @@ define i64 @atomicrmw_umax_i64_seq_cst(i
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB214_3
> -; RV32IA-NEXT:    j .LBB214_4
> ; RV32IA-NEXT:  .LBB214_1: # %atomicrmw.start
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB214_3
> +; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB214_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s0, a1
> +; RV32IA-NEXT:    j .LBB214_4
> +; RV32IA-NEXT:  .LBB214_3: # in Loop: Header=BB214_1 Depth=1
> +; RV32IA-NEXT:    sltu a0, s2, a2
> +; RV32IA-NEXT:  .LBB214_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB214_1 Depth=1
> +; RV32IA-NEXT:    sw a2, 0(sp)
> +; RV32IA-NEXT:    mv a3, a1
> +; RV32IA-NEXT:    bnez a0, .LBB214_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB214_1 Depth=1
> +; RV32IA-NEXT:    mv a2, s2
> +; RV32IA-NEXT:    mv a3, s0
> +; RV32IA-NEXT:  .LBB214_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB214_1 Depth=1
> ; RV32IA-NEXT:    sw a1, 4(sp)
> ; RV32IA-NEXT:    mv a0, s1
> ; RV32IA-NEXT:    mv a1, s3
> @@ -16966,23 +16773,8 @@ define i64 @atomicrmw_umax_i64_seq_cst(i
> ; RV32IA-NEXT:    call __atomic_compare_exchange_8
> ; RV32IA-NEXT:    lw a1, 4(sp)
> ; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB214_7
> -; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB214_4
> -; RV32IA-NEXT:  .LBB214_3: # %atomicrmw.start
> -; RV32IA-NEXT:    sltu a0, s0, a1
> -; RV32IA-NEXT:    j .LBB214_5
> -; RV32IA-NEXT:  .LBB214_4:
> -; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB214_5: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a2, 0(sp)
> -; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB214_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> -; RV32IA-NEXT:    mv a2, s2
> -; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB214_1
> -; RV32IA-NEXT:  .LBB214_7: # %atomicrmw.end
> +; RV32IA-NEXT:    beqz a0, .LBB214_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -17003,25 +16795,23 @@ define i64 @atomicrmw_umax_i64_seq_cst(i
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bgeu s0, a2, .LBB214_3
> ; RV64I-NEXT:  .LBB214_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bltu s0, a2, .LBB214_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB214_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB214_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB214_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 5
> ; RV64I-NEXT:    addi a4, zero, 5
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB214_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB214_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bltu s0, a2, .LBB214_1
> -; RV64I-NEXT:  .LBB214_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB214_1
> -; RV64I-NEXT:  .LBB214_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB214_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -17053,35 +16843,37 @@ define i64 @atomicrmw_umin_i64_monotonic
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB215_3
> -; RV32I-NEXT:    j .LBB215_4
> ; RV32I-NEXT:  .LBB215_1: # %atomicrmw.start
> -; RV32I-NEXT:    sw a1, 4(sp)
> -; RV32I-NEXT:    mv a0, s1
> -; RV32I-NEXT:    mv a1, s3
> -; RV32I-NEXT:    mv a4, zero
> -; RV32I-NEXT:    mv a5, zero
> -; RV32I-NEXT:    call __atomic_compare_exchange_8
> -; RV32I-NEXT:    lw a1, 4(sp)
> -; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB215_7
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB215_3
> ; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB215_4
> -; RV32I-NEXT:  .LBB215_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB215_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s0, a1
> -; RV32I-NEXT:    j .LBB215_5
> -; RV32I-NEXT:  .LBB215_4:
> +; RV32I-NEXT:    j .LBB215_4
> +; RV32I-NEXT:  .LBB215_3: # in Loop: Header=BB215_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB215_5: # %atomicrmw.start
> +; RV32I-NEXT:  .LBB215_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB215_1 Depth=1
> ; RV32I-NEXT:    xori a0, a0, 1
> ; RV32I-NEXT:    sw a2, 0(sp)
> ; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB215_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32I-NEXT:    bnez a0, .LBB215_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB215_1 Depth=1
> ; RV32I-NEXT:    mv a2, s2
> ; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB215_1
> -; RV32I-NEXT:  .LBB215_7: # %atomicrmw.end
> +; RV32I-NEXT:  .LBB215_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB215_1 Depth=1
> +; RV32I-NEXT:    sw a1, 4(sp)
> +; RV32I-NEXT:    mv a0, s1
> +; RV32I-NEXT:    mv a1, s3
> +; RV32I-NEXT:    mv a4, zero
> +; RV32I-NEXT:    mv a5, zero
> +; RV32I-NEXT:    call __atomic_compare_exchange_8
> +; RV32I-NEXT:    lw a1, 4(sp)
> +; RV32I-NEXT:    lw a2, 0(sp)
> +; RV32I-NEXT:    beqz a0, .LBB215_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -17105,35 +16897,37 @@ define i64 @atomicrmw_umin_i64_monotonic
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB215_3
> -; RV32IA-NEXT:    j .LBB215_4
> ; RV32IA-NEXT:  .LBB215_1: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a1, 4(sp)
> -; RV32IA-NEXT:    mv a0, s1
> -; RV32IA-NEXT:    mv a1, s3
> -; RV32IA-NEXT:    mv a4, zero
> -; RV32IA-NEXT:    mv a5, zero
> -; RV32IA-NEXT:    call __atomic_compare_exchange_8
> -; RV32IA-NEXT:    lw a1, 4(sp)
> -; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB215_7
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB215_3
> ; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB215_4
> -; RV32IA-NEXT:  .LBB215_3: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB215_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s0, a1
> -; RV32IA-NEXT:    j .LBB215_5
> -; RV32IA-NEXT:  .LBB215_4:
> +; RV32IA-NEXT:    j .LBB215_4
> +; RV32IA-NEXT:  .LBB215_3: # in Loop: Header=BB215_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB215_5: # %atomicrmw.start
> +; RV32IA-NEXT:  .LBB215_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB215_1 Depth=1
> ; RV32IA-NEXT:    xori a0, a0, 1
> ; RV32IA-NEXT:    sw a2, 0(sp)
> ; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB215_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32IA-NEXT:    bnez a0, .LBB215_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB215_1 Depth=1
> ; RV32IA-NEXT:    mv a2, s2
> ; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB215_1
> -; RV32IA-NEXT:  .LBB215_7: # %atomicrmw.end
> +; RV32IA-NEXT:  .LBB215_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB215_1 Depth=1
> +; RV32IA-NEXT:    sw a1, 4(sp)
> +; RV32IA-NEXT:    mv a0, s1
> +; RV32IA-NEXT:    mv a1, s3
> +; RV32IA-NEXT:    mv a4, zero
> +; RV32IA-NEXT:    mv a5, zero
> +; RV32IA-NEXT:    call __atomic_compare_exchange_8
> +; RV32IA-NEXT:    lw a1, 4(sp)
> +; RV32IA-NEXT:    lw a2, 0(sp)
> +; RV32IA-NEXT:    beqz a0, .LBB215_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -17154,25 +16948,23 @@ define i64 @atomicrmw_umin_i64_monotonic
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bltu s0, a2, .LBB215_3
> ; RV64I-NEXT:  .LBB215_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bgeu s0, a2, .LBB215_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB215_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB215_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB215_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    mv a3, zero
> ; RV64I-NEXT:    mv a4, zero
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB215_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB215_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bgeu s0, a2, .LBB215_1
> -; RV64I-NEXT:  .LBB215_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB215_1
> -; RV64I-NEXT:  .LBB215_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB215_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -17204,35 +16996,37 @@ define i64 @atomicrmw_umin_i64_acquire(i
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB216_3
> -; RV32I-NEXT:    j .LBB216_4
> ; RV32I-NEXT:  .LBB216_1: # %atomicrmw.start
> -; RV32I-NEXT:    sw a1, 4(sp)
> -; RV32I-NEXT:    mv a0, s1
> -; RV32I-NEXT:    mv a1, s3
> -; RV32I-NEXT:    addi a4, zero, 2
> -; RV32I-NEXT:    addi a5, zero, 2
> -; RV32I-NEXT:    call __atomic_compare_exchange_8
> -; RV32I-NEXT:    lw a1, 4(sp)
> -; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB216_7
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB216_3
> ; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB216_4
> -; RV32I-NEXT:  .LBB216_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB216_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s0, a1
> -; RV32I-NEXT:    j .LBB216_5
> -; RV32I-NEXT:  .LBB216_4:
> +; RV32I-NEXT:    j .LBB216_4
> +; RV32I-NEXT:  .LBB216_3: # in Loop: Header=BB216_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB216_5: # %atomicrmw.start
> +; RV32I-NEXT:  .LBB216_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB216_1 Depth=1
> ; RV32I-NEXT:    xori a0, a0, 1
> ; RV32I-NEXT:    sw a2, 0(sp)
> ; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB216_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32I-NEXT:    bnez a0, .LBB216_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB216_1 Depth=1
> ; RV32I-NEXT:    mv a2, s2
> ; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB216_1
> -; RV32I-NEXT:  .LBB216_7: # %atomicrmw.end
> +; RV32I-NEXT:  .LBB216_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB216_1 Depth=1
> +; RV32I-NEXT:    sw a1, 4(sp)
> +; RV32I-NEXT:    mv a0, s1
> +; RV32I-NEXT:    mv a1, s3
> +; RV32I-NEXT:    addi a4, zero, 2
> +; RV32I-NEXT:    addi a5, zero, 2
> +; RV32I-NEXT:    call __atomic_compare_exchange_8
> +; RV32I-NEXT:    lw a1, 4(sp)
> +; RV32I-NEXT:    lw a2, 0(sp)
> +; RV32I-NEXT:    beqz a0, .LBB216_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -17256,35 +17050,37 @@ define i64 @atomicrmw_umin_i64_acquire(i
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB216_3
> -; RV32IA-NEXT:    j .LBB216_4
> ; RV32IA-NEXT:  .LBB216_1: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a1, 4(sp)
> -; RV32IA-NEXT:    mv a0, s1
> -; RV32IA-NEXT:    mv a1, s3
> -; RV32IA-NEXT:    addi a4, zero, 2
> -; RV32IA-NEXT:    addi a5, zero, 2
> -; RV32IA-NEXT:    call __atomic_compare_exchange_8
> -; RV32IA-NEXT:    lw a1, 4(sp)
> -; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB216_7
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB216_3
> ; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB216_4
> -; RV32IA-NEXT:  .LBB216_3: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB216_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s0, a1
> -; RV32IA-NEXT:    j .LBB216_5
> -; RV32IA-NEXT:  .LBB216_4:
> +; RV32IA-NEXT:    j .LBB216_4
> +; RV32IA-NEXT:  .LBB216_3: # in Loop: Header=BB216_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB216_5: # %atomicrmw.start
> +; RV32IA-NEXT:  .LBB216_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB216_1 Depth=1
> ; RV32IA-NEXT:    xori a0, a0, 1
> ; RV32IA-NEXT:    sw a2, 0(sp)
> ; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB216_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32IA-NEXT:    bnez a0, .LBB216_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB216_1 Depth=1
> ; RV32IA-NEXT:    mv a2, s2
> ; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB216_1
> -; RV32IA-NEXT:  .LBB216_7: # %atomicrmw.end
> +; RV32IA-NEXT:  .LBB216_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB216_1 Depth=1
> +; RV32IA-NEXT:    sw a1, 4(sp)
> +; RV32IA-NEXT:    mv a0, s1
> +; RV32IA-NEXT:    mv a1, s3
> +; RV32IA-NEXT:    addi a4, zero, 2
> +; RV32IA-NEXT:    addi a5, zero, 2
> +; RV32IA-NEXT:    call __atomic_compare_exchange_8
> +; RV32IA-NEXT:    lw a1, 4(sp)
> +; RV32IA-NEXT:    lw a2, 0(sp)
> +; RV32IA-NEXT:    beqz a0, .LBB216_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -17305,25 +17101,23 @@ define i64 @atomicrmw_umin_i64_acquire(i
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bltu s0, a2, .LBB216_3
> ; RV64I-NEXT:  .LBB216_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bgeu s0, a2, .LBB216_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB216_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB216_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB216_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 2
> ; RV64I-NEXT:    addi a4, zero, 2
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB216_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB216_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bgeu s0, a2, .LBB216_1
> -; RV64I-NEXT:  .LBB216_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB216_1
> -; RV64I-NEXT:  .LBB216_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB216_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -17355,35 +17149,37 @@ define i64 @atomicrmw_umin_i64_release(i
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB217_3
> -; RV32I-NEXT:    j .LBB217_4
> ; RV32I-NEXT:  .LBB217_1: # %atomicrmw.start
> -; RV32I-NEXT:    sw a1, 4(sp)
> -; RV32I-NEXT:    mv a0, s1
> -; RV32I-NEXT:    mv a1, s3
> -; RV32I-NEXT:    addi a4, zero, 3
> -; RV32I-NEXT:    mv a5, zero
> -; RV32I-NEXT:    call __atomic_compare_exchange_8
> -; RV32I-NEXT:    lw a1, 4(sp)
> -; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB217_7
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB217_3
> ; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB217_4
> -; RV32I-NEXT:  .LBB217_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB217_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s0, a1
> -; RV32I-NEXT:    j .LBB217_5
> -; RV32I-NEXT:  .LBB217_4:
> +; RV32I-NEXT:    j .LBB217_4
> +; RV32I-NEXT:  .LBB217_3: # in Loop: Header=BB217_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB217_5: # %atomicrmw.start
> +; RV32I-NEXT:  .LBB217_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB217_1 Depth=1
> ; RV32I-NEXT:    xori a0, a0, 1
> ; RV32I-NEXT:    sw a2, 0(sp)
> ; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB217_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32I-NEXT:    bnez a0, .LBB217_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB217_1 Depth=1
> ; RV32I-NEXT:    mv a2, s2
> ; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB217_1
> -; RV32I-NEXT:  .LBB217_7: # %atomicrmw.end
> +; RV32I-NEXT:  .LBB217_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB217_1 Depth=1
> +; RV32I-NEXT:    sw a1, 4(sp)
> +; RV32I-NEXT:    mv a0, s1
> +; RV32I-NEXT:    mv a1, s3
> +; RV32I-NEXT:    addi a4, zero, 3
> +; RV32I-NEXT:    mv a5, zero
> +; RV32I-NEXT:    call __atomic_compare_exchange_8
> +; RV32I-NEXT:    lw a1, 4(sp)
> +; RV32I-NEXT:    lw a2, 0(sp)
> +; RV32I-NEXT:    beqz a0, .LBB217_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -17407,35 +17203,37 @@ define i64 @atomicrmw_umin_i64_release(i
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB217_3
> -; RV32IA-NEXT:    j .LBB217_4
> ; RV32IA-NEXT:  .LBB217_1: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a1, 4(sp)
> -; RV32IA-NEXT:    mv a0, s1
> -; RV32IA-NEXT:    mv a1, s3
> -; RV32IA-NEXT:    addi a4, zero, 3
> -; RV32IA-NEXT:    mv a5, zero
> -; RV32IA-NEXT:    call __atomic_compare_exchange_8
> -; RV32IA-NEXT:    lw a1, 4(sp)
> -; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB217_7
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB217_3
> ; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB217_4
> -; RV32IA-NEXT:  .LBB217_3: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB217_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s0, a1
> -; RV32IA-NEXT:    j .LBB217_5
> -; RV32IA-NEXT:  .LBB217_4:
> +; RV32IA-NEXT:    j .LBB217_4
> +; RV32IA-NEXT:  .LBB217_3: # in Loop: Header=BB217_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB217_5: # %atomicrmw.start
> +; RV32IA-NEXT:  .LBB217_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB217_1 Depth=1
> ; RV32IA-NEXT:    xori a0, a0, 1
> ; RV32IA-NEXT:    sw a2, 0(sp)
> ; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB217_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32IA-NEXT:    bnez a0, .LBB217_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB217_1 Depth=1
> ; RV32IA-NEXT:    mv a2, s2
> ; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB217_1
> -; RV32IA-NEXT:  .LBB217_7: # %atomicrmw.end
> +; RV32IA-NEXT:  .LBB217_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB217_1 Depth=1
> +; RV32IA-NEXT:    sw a1, 4(sp)
> +; RV32IA-NEXT:    mv a0, s1
> +; RV32IA-NEXT:    mv a1, s3
> +; RV32IA-NEXT:    addi a4, zero, 3
> +; RV32IA-NEXT:    mv a5, zero
> +; RV32IA-NEXT:    call __atomic_compare_exchange_8
> +; RV32IA-NEXT:    lw a1, 4(sp)
> +; RV32IA-NEXT:    lw a2, 0(sp)
> +; RV32IA-NEXT:    beqz a0, .LBB217_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -17456,25 +17254,23 @@ define i64 @atomicrmw_umin_i64_release(i
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bltu s0, a2, .LBB217_3
> ; RV64I-NEXT:  .LBB217_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bgeu s0, a2, .LBB217_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB217_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB217_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB217_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 3
> ; RV64I-NEXT:    mv a4, zero
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB217_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB217_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bgeu s0, a2, .LBB217_1
> -; RV64I-NEXT:  .LBB217_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB217_1
> -; RV64I-NEXT:  .LBB217_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB217_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -17506,35 +17302,37 @@ define i64 @atomicrmw_umin_i64_acq_rel(i
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB218_3
> -; RV32I-NEXT:    j .LBB218_4
> ; RV32I-NEXT:  .LBB218_1: # %atomicrmw.start
> -; RV32I-NEXT:    sw a1, 4(sp)
> -; RV32I-NEXT:    mv a0, s1
> -; RV32I-NEXT:    mv a1, s3
> -; RV32I-NEXT:    addi a4, zero, 4
> -; RV32I-NEXT:    addi a5, zero, 2
> -; RV32I-NEXT:    call __atomic_compare_exchange_8
> -; RV32I-NEXT:    lw a1, 4(sp)
> -; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB218_7
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB218_3
> ; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB218_4
> -; RV32I-NEXT:  .LBB218_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB218_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s0, a1
> -; RV32I-NEXT:    j .LBB218_5
> -; RV32I-NEXT:  .LBB218_4:
> +; RV32I-NEXT:    j .LBB218_4
> +; RV32I-NEXT:  .LBB218_3: # in Loop: Header=BB218_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB218_5: # %atomicrmw.start
> +; RV32I-NEXT:  .LBB218_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB218_1 Depth=1
> ; RV32I-NEXT:    xori a0, a0, 1
> ; RV32I-NEXT:    sw a2, 0(sp)
> ; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB218_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32I-NEXT:    bnez a0, .LBB218_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB218_1 Depth=1
> ; RV32I-NEXT:    mv a2, s2
> ; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB218_1
> -; RV32I-NEXT:  .LBB218_7: # %atomicrmw.end
> +; RV32I-NEXT:  .LBB218_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB218_1 Depth=1
> +; RV32I-NEXT:    sw a1, 4(sp)
> +; RV32I-NEXT:    mv a0, s1
> +; RV32I-NEXT:    mv a1, s3
> +; RV32I-NEXT:    addi a4, zero, 4
> +; RV32I-NEXT:    addi a5, zero, 2
> +; RV32I-NEXT:    call __atomic_compare_exchange_8
> +; RV32I-NEXT:    lw a1, 4(sp)
> +; RV32I-NEXT:    lw a2, 0(sp)
> +; RV32I-NEXT:    beqz a0, .LBB218_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -17558,35 +17356,37 @@ define i64 @atomicrmw_umin_i64_acq_rel(i
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB218_3
> -; RV32IA-NEXT:    j .LBB218_4
> ; RV32IA-NEXT:  .LBB218_1: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a1, 4(sp)
> -; RV32IA-NEXT:    mv a0, s1
> -; RV32IA-NEXT:    mv a1, s3
> -; RV32IA-NEXT:    addi a4, zero, 4
> -; RV32IA-NEXT:    addi a5, zero, 2
> -; RV32IA-NEXT:    call __atomic_compare_exchange_8
> -; RV32IA-NEXT:    lw a1, 4(sp)
> -; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB218_7
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB218_3
> ; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB218_4
> -; RV32IA-NEXT:  .LBB218_3: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB218_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s0, a1
> -; RV32IA-NEXT:    j .LBB218_5
> -; RV32IA-NEXT:  .LBB218_4:
> +; RV32IA-NEXT:    j .LBB218_4
> +; RV32IA-NEXT:  .LBB218_3: # in Loop: Header=BB218_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB218_5: # %atomicrmw.start
> +; RV32IA-NEXT:  .LBB218_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB218_1 Depth=1
> ; RV32IA-NEXT:    xori a0, a0, 1
> ; RV32IA-NEXT:    sw a2, 0(sp)
> ; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB218_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32IA-NEXT:    bnez a0, .LBB218_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB218_1 Depth=1
> ; RV32IA-NEXT:    mv a2, s2
> ; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB218_1
> -; RV32IA-NEXT:  .LBB218_7: # %atomicrmw.end
> +; RV32IA-NEXT:  .LBB218_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB218_1 Depth=1
> +; RV32IA-NEXT:    sw a1, 4(sp)
> +; RV32IA-NEXT:    mv a0, s1
> +; RV32IA-NEXT:    mv a1, s3
> +; RV32IA-NEXT:    addi a4, zero, 4
> +; RV32IA-NEXT:    addi a5, zero, 2
> +; RV32IA-NEXT:    call __atomic_compare_exchange_8
> +; RV32IA-NEXT:    lw a1, 4(sp)
> +; RV32IA-NEXT:    lw a2, 0(sp)
> +; RV32IA-NEXT:    beqz a0, .LBB218_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -17607,25 +17407,23 @@ define i64 @atomicrmw_umin_i64_acq_rel(i
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bltu s0, a2, .LBB218_3
> ; RV64I-NEXT:  .LBB218_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bgeu s0, a2, .LBB218_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB218_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB218_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB218_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 4
> ; RV64I-NEXT:    addi a4, zero, 2
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB218_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB218_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bgeu s0, a2, .LBB218_1
> -; RV64I-NEXT:  .LBB218_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB218_1
> -; RV64I-NEXT:  .LBB218_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB218_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
> @@ -17657,35 +17455,37 @@ define i64 @atomicrmw_umin_i64_seq_cst(i
> ; RV32I-NEXT:    lw a1, 4(a0)
> ; RV32I-NEXT:    lw a2, 0(a0)
> ; RV32I-NEXT:    mv s3, sp
> -; RV32I-NEXT:    bne a1, s0, .LBB219_3
> -; RV32I-NEXT:    j .LBB219_4
> ; RV32I-NEXT:  .LBB219_1: # %atomicrmw.start
> -; RV32I-NEXT:    sw a1, 4(sp)
> -; RV32I-NEXT:    mv a0, s1
> -; RV32I-NEXT:    mv a1, s3
> -; RV32I-NEXT:    addi a4, zero, 5
> -; RV32I-NEXT:    addi a5, zero, 5
> -; RV32I-NEXT:    call __atomic_compare_exchange_8
> -; RV32I-NEXT:    lw a1, 4(sp)
> -; RV32I-NEXT:    lw a2, 0(sp)
> -; RV32I-NEXT:    bnez a0, .LBB219_7
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:    beq a1, s0, .LBB219_3
> ; RV32I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32I-NEXT:    beq a1, s0, .LBB219_4
> -; RV32I-NEXT:  .LBB219_3: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB219_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s0, a1
> -; RV32I-NEXT:    j .LBB219_5
> -; RV32I-NEXT:  .LBB219_4:
> +; RV32I-NEXT:    j .LBB219_4
> +; RV32I-NEXT:  .LBB219_3: # in Loop: Header=BB219_1 Depth=1
> ; RV32I-NEXT:    sltu a0, s2, a2
> -; RV32I-NEXT:  .LBB219_5: # %atomicrmw.start
> +; RV32I-NEXT:  .LBB219_4: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB219_1 Depth=1
> ; RV32I-NEXT:    xori a0, a0, 1
> ; RV32I-NEXT:    sw a2, 0(sp)
> ; RV32I-NEXT:    mv a3, a1
> -; RV32I-NEXT:    bnez a0, .LBB219_1
> -; RV32I-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32I-NEXT:    bnez a0, .LBB219_6
> +; RV32I-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB219_1 Depth=1
> ; RV32I-NEXT:    mv a2, s2
> ; RV32I-NEXT:    mv a3, s0
> -; RV32I-NEXT:    j .LBB219_1
> -; RV32I-NEXT:  .LBB219_7: # %atomicrmw.end
> +; RV32I-NEXT:  .LBB219_6: # %atomicrmw.start
> +; RV32I-NEXT:    # in Loop: Header=BB219_1 Depth=1
> +; RV32I-NEXT:    sw a1, 4(sp)
> +; RV32I-NEXT:    mv a0, s1
> +; RV32I-NEXT:    mv a1, s3
> +; RV32I-NEXT:    addi a4, zero, 5
> +; RV32I-NEXT:    addi a5, zero, 5
> +; RV32I-NEXT:    call __atomic_compare_exchange_8
> +; RV32I-NEXT:    lw a1, 4(sp)
> +; RV32I-NEXT:    lw a2, 0(sp)
> +; RV32I-NEXT:    beqz a0, .LBB219_1
> +; RV32I-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32I-NEXT:    mv a0, a2
> ; RV32I-NEXT:    lw s3, 12(sp)
> ; RV32I-NEXT:    lw s2, 16(sp)
> @@ -17709,35 +17509,37 @@ define i64 @atomicrmw_umin_i64_seq_cst(i
> ; RV32IA-NEXT:    lw a1, 4(a0)
> ; RV32IA-NEXT:    lw a2, 0(a0)
> ; RV32IA-NEXT:    mv s3, sp
> -; RV32IA-NEXT:    bne a1, s0, .LBB219_3
> -; RV32IA-NEXT:    j .LBB219_4
> ; RV32IA-NEXT:  .LBB219_1: # %atomicrmw.start
> -; RV32IA-NEXT:    sw a1, 4(sp)
> -; RV32IA-NEXT:    mv a0, s1
> -; RV32IA-NEXT:    mv a1, s3
> -; RV32IA-NEXT:    addi a4, zero, 5
> -; RV32IA-NEXT:    addi a5, zero, 5
> -; RV32IA-NEXT:    call __atomic_compare_exchange_8
> -; RV32IA-NEXT:    lw a1, 4(sp)
> -; RV32IA-NEXT:    lw a2, 0(sp)
> -; RV32IA-NEXT:    bnez a0, .LBB219_7
> +; RV32IA-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32IA-NEXT:    beq a1, s0, .LBB219_3
> ; RV32IA-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV32IA-NEXT:    beq a1, s0, .LBB219_4
> -; RV32IA-NEXT:  .LBB219_3: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB219_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s0, a1
> -; RV32IA-NEXT:    j .LBB219_5
> -; RV32IA-NEXT:  .LBB219_4:
> +; RV32IA-NEXT:    j .LBB219_4
> +; RV32IA-NEXT:  .LBB219_3: # in Loop: Header=BB219_1 Depth=1
> ; RV32IA-NEXT:    sltu a0, s2, a2
> -; RV32IA-NEXT:  .LBB219_5: # %atomicrmw.start
> +; RV32IA-NEXT:  .LBB219_4: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB219_1 Depth=1
> ; RV32IA-NEXT:    xori a0, a0, 1
> ; RV32IA-NEXT:    sw a2, 0(sp)
> ; RV32IA-NEXT:    mv a3, a1
> -; RV32IA-NEXT:    bnez a0, .LBB219_1
> -; RV32IA-NEXT:  # %bb.6: # %atomicrmw.start
> +; RV32IA-NEXT:    bnez a0, .LBB219_6
> +; RV32IA-NEXT:  # %bb.5: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB219_1 Depth=1
> ; RV32IA-NEXT:    mv a2, s2
> ; RV32IA-NEXT:    mv a3, s0
> -; RV32IA-NEXT:    j .LBB219_1
> -; RV32IA-NEXT:  .LBB219_7: # %atomicrmw.end
> +; RV32IA-NEXT:  .LBB219_6: # %atomicrmw.start
> +; RV32IA-NEXT:    # in Loop: Header=BB219_1 Depth=1
> +; RV32IA-NEXT:    sw a1, 4(sp)
> +; RV32IA-NEXT:    mv a0, s1
> +; RV32IA-NEXT:    mv a1, s3
> +; RV32IA-NEXT:    addi a4, zero, 5
> +; RV32IA-NEXT:    addi a5, zero, 5
> +; RV32IA-NEXT:    call __atomic_compare_exchange_8
> +; RV32IA-NEXT:    lw a1, 4(sp)
> +; RV32IA-NEXT:    lw a2, 0(sp)
> +; RV32IA-NEXT:    beqz a0, .LBB219_1
> +; RV32IA-NEXT:  # %bb.7: # %atomicrmw.end
> ; RV32IA-NEXT:    mv a0, a2
> ; RV32IA-NEXT:    lw s3, 12(sp)
> ; RV32IA-NEXT:    lw s2, 16(sp)
> @@ -17758,25 +17560,23 @@ define i64 @atomicrmw_umin_i64_seq_cst(i
> ; RV64I-NEXT:    mv s1, a0
> ; RV64I-NEXT:    ld a2, 0(a0)
> ; RV64I-NEXT:    addi s2, sp, 8
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bltu s0, a2, .LBB219_3
> ; RV64I-NEXT:  .LBB219_1: # %atomicrmw.start
> ; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV64I-NEXT:    sd a2, 8(sp)
> +; RV64I-NEXT:    bgeu s0, a2, .LBB219_3
> +; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB219_1 Depth=1
> +; RV64I-NEXT:    mv a2, s0
> +; RV64I-NEXT:  .LBB219_3: # %atomicrmw.start
> +; RV64I-NEXT:    # in Loop: Header=BB219_1 Depth=1
> ; RV64I-NEXT:    mv a0, s1
> ; RV64I-NEXT:    mv a1, s2
> ; RV64I-NEXT:    addi a3, zero, 5
> ; RV64I-NEXT:    addi a4, zero, 5
> ; RV64I-NEXT:    call __atomic_compare_exchange_8
> ; RV64I-NEXT:    ld a2, 8(sp)
> -; RV64I-NEXT:    bnez a0, .LBB219_4
> -; RV64I-NEXT:  # %bb.2: # %atomicrmw.start
> -; RV64I-NEXT:    # in Loop: Header=BB219_1 Depth=1
> -; RV64I-NEXT:    sd a2, 8(sp)
> -; RV64I-NEXT:    bgeu s0, a2, .LBB219_1
> -; RV64I-NEXT:  .LBB219_3: # %atomicrmw.start
> -; RV64I-NEXT:    mv a2, s0
> -; RV64I-NEXT:    j .LBB219_1
> -; RV64I-NEXT:  .LBB219_4: # %atomicrmw.end
> +; RV64I-NEXT:    beqz a0, .LBB219_1
> +; RV64I-NEXT:  # %bb.4: # %atomicrmw.end
> ; RV64I-NEXT:    mv a0, a2
> ; RV64I-NEXT:    ld s2, 16(sp)
> ; RV64I-NEXT:    ld s1, 24(sp)
>
> Modified: llvm/trunk/test/CodeGen/RISCV/remat.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/RISCV/remat.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/RISCV/remat.ll (original)
> +++ llvm/trunk/test/CodeGen/RISCV/remat.ll Thu Aug 22 09:21:32 2019
> @@ -52,32 +52,24 @@ define i32 @test() nounwind {
> ; RV32I-NEXT:    lui s0, %hi(d)
> ; RV32I-NEXT:    lui s10, %hi(c)
> ; RV32I-NEXT:    lui s11, %hi(b)
> +; RV32I-NEXT:  .LBB0_2: # %for.body
> +; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> ; RV32I-NEXT:    lw a1, %lo(l)(s2)
> -; RV32I-NEXT:    bnez a1, .LBB0_4
> -; RV32I-NEXT:    j .LBB0_5
> -; RV32I-NEXT:  .LBB0_2: # %for.inc
> -; RV32I-NEXT:    # in Loop: Header=BB0_5 Depth=1
> -; RV32I-NEXT:    lw a0, %lo(a)(s9)
> -; RV32I-NEXT:    addi a0, a0, -1
> -; RV32I-NEXT:    sw a0, %lo(a)(s9)
> -; RV32I-NEXT:    beqz a0, .LBB0_11
> -; RV32I-NEXT:  # %bb.3: # %for.body
> -; RV32I-NEXT:    # in Loop: Header=BB0_5 Depth=1
> -; RV32I-NEXT:    lw a1, %lo(l)(s2)
> -; RV32I-NEXT:    beqz a1, .LBB0_5
> -; RV32I-NEXT:  .LBB0_4: # %if.then
> +; RV32I-NEXT:    beqz a1, .LBB0_4
> +; RV32I-NEXT:  # %bb.3: # %if.then
> +; RV32I-NEXT:    # in Loop: Header=BB0_2 Depth=1
> ; RV32I-NEXT:    lw a4, %lo(e)(s1)
> ; RV32I-NEXT:    lw a3, %lo(d)(s0)
> ; RV32I-NEXT:    lw a2, %lo(c)(s10)
> ; RV32I-NEXT:    lw a1, %lo(b)(s11)
> ; RV32I-NEXT:    addi a5, zero, 32
> ; RV32I-NEXT:    call foo
> -; RV32I-NEXT:  .LBB0_5: # %if.end
> -; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
> +; RV32I-NEXT:  .LBB0_4: # %if.end
> +; RV32I-NEXT:    # in Loop: Header=BB0_2 Depth=1
> ; RV32I-NEXT:    lw a0, %lo(k)(s3)
> -; RV32I-NEXT:    beqz a0, .LBB0_7
> -; RV32I-NEXT:  # %bb.6: # %if.then3
> -; RV32I-NEXT:    # in Loop: Header=BB0_5 Depth=1
> +; RV32I-NEXT:    beqz a0, .LBB0_6
> +; RV32I-NEXT:  # %bb.5: # %if.then3
> +; RV32I-NEXT:    # in Loop: Header=BB0_2 Depth=1
> ; RV32I-NEXT:    lw a4, %lo(f)(s8)
> ; RV32I-NEXT:    lw a3, %lo(e)(s1)
> ; RV32I-NEXT:    lw a2, %lo(d)(s0)
> @@ -85,12 +77,12 @@ define i32 @test() nounwind {
> ; RV32I-NEXT:    lw a0, %lo(b)(s11)
> ; RV32I-NEXT:    addi a5, zero, 64
> ; RV32I-NEXT:    call foo
> -; RV32I-NEXT:  .LBB0_7: # %if.end5
> -; RV32I-NEXT:    # in Loop: Header=BB0_5 Depth=1
> +; RV32I-NEXT:  .LBB0_6: # %if.end5
> +; RV32I-NEXT:    # in Loop: Header=BB0_2 Depth=1
> ; RV32I-NEXT:    lw a0, %lo(j)(s4)
> -; RV32I-NEXT:    beqz a0, .LBB0_9
> -; RV32I-NEXT:  # %bb.8: # %if.then7
> -; RV32I-NEXT:    # in Loop: Header=BB0_5 Depth=1
> +; RV32I-NEXT:    beqz a0, .LBB0_8
> +; RV32I-NEXT:  # %bb.7: # %if.then7
> +; RV32I-NEXT:    # in Loop: Header=BB0_2 Depth=1
> ; RV32I-NEXT:    lw a4, %lo(g)(s7)
> ; RV32I-NEXT:    lw a3, %lo(f)(s8)
> ; RV32I-NEXT:    lw a2, %lo(e)(s1)
> @@ -98,12 +90,12 @@ define i32 @test() nounwind {
> ; RV32I-NEXT:    lw a0, %lo(c)(s10)
> ; RV32I-NEXT:    addi a5, zero, 32
> ; RV32I-NEXT:    call foo
> -; RV32I-NEXT:  .LBB0_9: # %if.end9
> -; RV32I-NEXT:    # in Loop: Header=BB0_5 Depth=1
> +; RV32I-NEXT:  .LBB0_8: # %if.end9
> +; RV32I-NEXT:    # in Loop: Header=BB0_2 Depth=1
> ; RV32I-NEXT:    lw a0, %lo(i)(s6)
> -; RV32I-NEXT:    beqz a0, .LBB0_2
> -; RV32I-NEXT:  # %bb.10: # %if.then11
> -; RV32I-NEXT:    # in Loop: Header=BB0_5 Depth=1
> +; RV32I-NEXT:    beqz a0, .LBB0_10
> +; RV32I-NEXT:  # %bb.9: # %if.then11
> +; RV32I-NEXT:    # in Loop: Header=BB0_2 Depth=1
> ; RV32I-NEXT:    lw a4, %lo(h)(s5)
> ; RV32I-NEXT:    lw a3, %lo(g)(s7)
> ; RV32I-NEXT:    lw a2, %lo(f)(s8)
> @@ -111,7 +103,12 @@ define i32 @test() nounwind {
> ; RV32I-NEXT:    lw a0, %lo(d)(s0)
> ; RV32I-NEXT:    addi a5, zero, 32
> ; RV32I-NEXT:    call foo
> -; RV32I-NEXT:    j .LBB0_2
> +; RV32I-NEXT:  .LBB0_10: # %for.inc
> +; RV32I-NEXT:    # in Loop: Header=BB0_2 Depth=1
> +; RV32I-NEXT:    lw a0, %lo(a)(s9)
> +; RV32I-NEXT:    addi a0, a0, -1
> +; RV32I-NEXT:    sw a0, %lo(a)(s9)
> +; RV32I-NEXT:    bnez a0, .LBB0_2
> ; RV32I-NEXT:  .LBB0_11: # %for.end
> ; RV32I-NEXT:    addi a0, zero, 1
> ; RV32I-NEXT:    lw s11, 12(sp)
>
> Modified: llvm/trunk/test/CodeGen/Thumb/consthoist-physical-addr.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Thumb/consthoist-physical-addr.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/Thumb/consthoist-physical-addr.ll (original)
> +++ llvm/trunk/test/CodeGen/Thumb/consthoist-physical-addr.ll Thu Aug 22 09:21:32 2019
> @@ -10,9 +10,8 @@ define i32 @C(i32 %x, i32* nocapture %y)
> ; CHECK-NEXT:    push {r4, r5, r7, lr}
> ; CHECK-NEXT:    movs r2, #0
> ; CHECK-NEXT:    ldr r3, .LCPI0_0
> +; CHECK-NEXT:    b .LBB0_4
> ; CHECK-NEXT:  .LBB0_1:
> -; CHECK-NEXT:    cmp r2, #128
> -; CHECK-NEXT:    beq .LBB0_5
> ; CHECK-NEXT:    movs r4, #0
> ; CHECK-NEXT:    str r4, [r3, #8]
> ; CHECK-NEXT:    lsls r4, r2, #2
> @@ -21,15 +20,16 @@ define i32 @C(i32 %x, i32* nocapture %y)
> ; CHECK-NEXT:    movs r5, #1
> ; CHECK-NEXT:    str r5, [r3, #12]
> ; CHECK-NEXT:    isb sy
> -; CHECK-NEXT:  .LBB0_3:
> +; CHECK-NEXT:  .LBB0_2:
> ; CHECK-NEXT:    ldr r5, [r3, #12]
> ; CHECK-NEXT:    cmp r5, #0
> -; CHECK-NEXT:    bne .LBB0_3
> +; CHECK-NEXT:    bne .LBB0_2
> ; CHECK-NEXT:    ldr r5, [r3, #4]
> ; CHECK-NEXT:    str r5, [r1, r4]
> ; CHECK-NEXT:    adds r2, r2, #1
> -; CHECK-NEXT:    b .LBB0_1
> -; CHECK-NEXT:  .LBB0_5:
> +; CHECK-NEXT:  .LBB0_4:
> +; CHECK-NEXT:    cmp r2, #128
> +; CHECK-NEXT:    bne .LBB0_1
> ; CHECK-NEXT:    movs r0, #0
> ; CHECK-NEXT:    pop {r4, r5, r7, pc}
> ; CHECK-NEXT:    .p2align 2
>
> Modified: llvm/trunk/test/CodeGen/Thumb/pr42760.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Thumb/pr42760.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/Thumb/pr42760.ll (original)
> +++ llvm/trunk/test/CodeGen/Thumb/pr42760.ll Thu Aug 22 09:21:32 2019
> @@ -6,31 +6,27 @@ define hidden void @test() {
> ; CHECK:       @ %bb.0: @ %entry
> ; CHECK-NEXT:    movs r0, #1
> ; CHECK-NEXT:    lsls r1, r0, #2
> -; CHECK-NEXT:    b .LBB0_2
> -; CHECK-NEXT:  .LBB0_1: @ %bb2
> -; CHECK-NEXT:    @ in Loop: Header=BB0_2 Depth=1
> -; CHECK-NEXT:    cmp r0, #0
> -; CHECK-NEXT:    bne .LBB0_6
> -; CHECK-NEXT:  .LBB0_2: @ %switch
> +; CHECK-NEXT:  .LBB0_1: @ %switch
> ; CHECK-NEXT:    @ =>This Inner Loop Header: Depth=1
> ; CHECK-NEXT:    adr r2, .LJTI0_0
> ; CHECK-NEXT:    ldr r2, [r2, r1]
> ; CHECK-NEXT:    mov pc, r2
> -; CHECK-NEXT:  @ %bb.3:
> +; CHECK-NEXT:  @ %bb.2:
> ; CHECK-NEXT:    .p2align 2
> ; CHECK-NEXT:  .LJTI0_0:
> -; CHECK-NEXT:    .long .LBB0_6+1
> +; CHECK-NEXT:    .long .LBB0_5+1
> ; CHECK-NEXT:    .long .LBB0_4+1
> -; CHECK-NEXT:    .long .LBB0_6+1
> ; CHECK-NEXT:    .long .LBB0_5+1
> -; CHECK-NEXT:  .LBB0_4: @ %switch
> -; CHECK-NEXT:    @ in Loop: Header=BB0_2 Depth=1
> -; CHECK-NEXT:    b .LBB0_1
> -; CHECK-NEXT:  .LBB0_5: @ %bb
> -; CHECK-NEXT:    @ in Loop: Header=BB0_2 Depth=1
> +; CHECK-NEXT:    .long .LBB0_3+1
> +; CHECK-NEXT:  .LBB0_3: @ %bb
> +; CHECK-NEXT:    @ in Loop: Header=BB0_1 Depth=1
> +; CHECK-NEXT:    cmp r0, #0
> +; CHECK-NEXT:    bne .LBB0_5
> +; CHECK-NEXT:  .LBB0_4: @ %bb2
> +; CHECK-NEXT:    @ in Loop: Header=BB0_1 Depth=1
> ; CHECK-NEXT:    cmp r0, #0
> ; CHECK-NEXT:    beq .LBB0_1
> -; CHECK-NEXT:  .LBB0_6: @ %dead
> +; CHECK-NEXT:  .LBB0_5: @ %dead
> entry:
>   br label %switch
>
>
> Modified: llvm/trunk/test/CodeGen/X86/block-placement.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/block-placement.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/block-placement.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/block-placement.ll Thu Aug 22 09:21:32 2019
> @@ -82,14 +82,14 @@ define i32 @test_loop_cold_blocks(i32 %i
> ; Check that we sink cold loop blocks after the hot loop body.
> ; CHECK-LABEL: test_loop_cold_blocks:
> ; CHECK: %entry
> -; CHECK: .p2align
> -; CHECK: %body1
> -; CHECK: %body2
> -; CHECK: %body3
> ; CHECK-NOT: .p2align
> ; CHECK: %unlikely1
> ; CHECK-NOT: .p2align
> ; CHECK: %unlikely2
> +; CHECK: .p2align
> +; CHECK: %body1
> +; CHECK: %body2
> +; CHECK: %body3
> ; CHECK: %exit
>
> entry:
> @@ -125,7 +125,7 @@ exit:
>   ret i32 %sum
> }
>
> -!0 = !{!"branch_weights", i32 1, i32 64}
> +!0 = !{!"branch_weights", i32 4, i32 64}
>
> define i32 @test_loop_early_exits(i32 %i, i32* %a) {
> ; Check that we sink early exit blocks out of loop bodies.
> @@ -189,8 +189,8 @@ define i32 @test_loop_rotate(i32 %i, i32
> ; loop, eliminating unconditional branches to the top.
> ; CHECK-LABEL: test_loop_rotate:
> ; CHECK: %entry
> -; CHECK: %body0
> ; CHECK: %body1
> +; CHECK: %body0
> ; CHECK: %exit
>
> entry:
> @@ -957,15 +957,16 @@ define void @benchmark_heapsort(i32 %n,
> ; CHECK: %if.else
> ; CHECK: %if.end10
> ; Second rotated loop top
> +; CHECK: .p2align
> +; CHECK: %if.then24
> ; CHECK: %while.cond.outer
> ; Third rotated loop top
> ; CHECK: .p2align
> -; CHECK: %if.end20
> ; CHECK: %while.cond
> ; CHECK: %while.body
> ; CHECK: %land.lhs.true
> ; CHECK: %if.then19
> -; CHECK: %if.then24
> +; CHECK: %if.end20
> ; CHECK: %if.then8
> ; CHECK: ret
>
> @@ -1545,8 +1546,8 @@ define i32 @not_rotate_if_extra_branch_r
> ; CHECK-LABEL: not_rotate_if_extra_branch_regression
> ; CHECK: %.entry
> ; CHECK: %.first_backedge
> -; CHECK: %.second_header
> ; CHECK: %.slow
> +; CHECK: %.second_header
> .entry:
>   %sum.0 = shl nsw i32 %count, 1
>   br label %.first_header
>
> Modified: llvm/trunk/test/CodeGen/X86/code_placement.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/code_placement.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/code_placement.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/code_placement.ll Thu Aug 22 09:21:32 2019
> @@ -4,11 +4,6 @@
> @Te1 = external global [256 x i32] ; <[256 x i32]*> [#uses=4]
> @Te3 = external global [256 x i32] ; <[256 x i32]*> [#uses=2]
>
> -; CHECK: %entry
> -; CHECK: %bb
> -; CHECK: %bb1
> -; CHECK: %bb2
> -
> define void @t(i8* nocapture %in, i8* nocapture %out, i32* nocapture %rk, i32 %r) nounwind ssp {
> entry:
> %0 = load i32, i32* %rk, align 4 ; <i32> [#uses=1]
> @@ -17,6 +12,8 @@ entry:
> %tmp15 = add i32 %r, -1 ; <i32> [#uses=1]
> %tmp.16 = zext i32 %tmp15 to i64 ; <i64> [#uses=2]
> br label %bb
> +; CHECK: jmp
> +; CHECK-NEXT: align
>
> bb: ; preds = %bb1, %entry
> %indvar = phi i64 [ 0, %entry ], [ %indvar.next, %bb1 ] ; <i64> [#uses=3]
>
> Modified: llvm/trunk/test/CodeGen/X86/code_placement_ignore_succ_in_inner_loop.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/code_placement_ignore_succ_in_inner_loop.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/code_placement_ignore_succ_in_inner_loop.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/code_placement_ignore_succ_in_inner_loop.ll Thu Aug 22 09:21:32 2019
> @@ -1,12 +1,13 @@
> ; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux < %s | FileCheck %s
>
> define void @foo() {
> -; After moving the latch to the top of loop, there is no fall through from the
> -; latch to outer loop.
> +; Test that when determining the edge probability from a node in an inner loop
> +; to a node in an outer loop, the weights on edges in the inner loop should be
> +; ignored if we are building the chain for the outer loop.
> ;
> ; CHECK-LABEL: foo:
> -; CHECK: callq b
> ; CHECK: callq c
> +; CHECK: callq b
>
> entry:
>   %call = call zeroext i1 @a()
>
> Modified: llvm/trunk/test/CodeGen/X86/code_placement_no_header_change.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/code_placement_no_header_change.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/code_placement_no_header_change.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/code_placement_no_header_change.ll Thu Aug 22 09:21:32 2019
> @@ -7,9 +7,9 @@ define i32 @bar(i32 %count) {
> ; Later backedge1 and backedge2 is rotated before loop header.
> ; CHECK-LABEL: bar
> ; CHECK: %.entry
> -; CHECK: %.header
> ; CHECK: %.backedge1
> ; CHECK: %.backedge2
> +; CHECK: %.header
> ; CHECK: %.exit
> .entry:
>   %c = shl nsw i32 %count, 2
>
> Modified: llvm/trunk/test/CodeGen/X86/conditional-tailcall.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/conditional-tailcall.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/conditional-tailcall.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/conditional-tailcall.ll Thu Aug 22 09:21:32 2019
> @@ -258,12 +258,9 @@ define zeroext i1 @pr31257(%"class.std::
> ; CHECK32-NEXT:    .cfi_adjust_cfa_offset -4
> ; CHECK32-NEXT:    xorl %edi, %edi # encoding: [0x31,0xff]
> ; CHECK32-NEXT:    incl %edi # encoding: [0x47]
> -; CHECK32-NEXT:  .LBB3_1: # %for.cond
> -; CHECK32-NEXT:    # =>This Inner Loop Header: Depth=1
> -; CHECK32-NEXT:    testl %edx, %edx # encoding: [0x85,0xd2]
> -; CHECK32-NEXT:    je .LBB3_13 # encoding: [0x74,A]
> -; CHECK32-NEXT:    # fixup A - offset: 1, value: .LBB3_13-1, kind: FK_PCRel_1
> -; CHECK32-NEXT:  # %bb.2: # %for.body
> +; CHECK32-NEXT:    jmp .LBB3_1 # encoding: [0xeb,A]
> +; CHECK32-NEXT:    # fixup A - offset: 1, value: .LBB3_1-1, kind: FK_PCRel_1
> +; CHECK32-NEXT:  .LBB3_2: # %for.body
> ; CHECK32-NEXT:    # in Loop: Header=BB3_1 Depth=1
> ; CHECK32-NEXT:    cmpl $2, %ebx # encoding: [0x83,0xfb,0x02]
> ; CHECK32-NEXT:    je .LBB3_11 # encoding: [0x74,A]
> @@ -317,9 +314,12 @@ define zeroext i1 @pr31257(%"class.std::
> ; CHECK32-NEXT:    # in Loop: Header=BB3_1 Depth=1
> ; CHECK32-NEXT:    incl %eax # encoding: [0x40]
> ; CHECK32-NEXT:    decl %edx # encoding: [0x4a]
> -; CHECK32-NEXT:    jmp .LBB3_1 # encoding: [0xeb,A]
> -; CHECK32-NEXT:    # fixup A - offset: 1, value: .LBB3_1-1, kind: FK_PCRel_1
> -; CHECK32-NEXT:  .LBB3_13:
> +; CHECK32-NEXT:  .LBB3_1: # %for.cond
> +; CHECK32-NEXT:    # =>This Inner Loop Header: Depth=1
> +; CHECK32-NEXT:    testl %edx, %edx # encoding: [0x85,0xd2]
> +; CHECK32-NEXT:    jne .LBB3_2 # encoding: [0x75,A]
> +; CHECK32-NEXT:    # fixup A - offset: 1, value: .LBB3_2-1, kind: FK_PCRel_1
> +; CHECK32-NEXT:  # %bb.13:
> ; CHECK32-NEXT:    cmpl $2, %ebx # encoding: [0x83,0xfb,0x02]
> ; CHECK32-NEXT:    sete %al # encoding: [0x0f,0x94,0xc0]
> ; CHECK32-NEXT:    jmp .LBB3_14 # encoding: [0xeb,A]
> @@ -369,59 +369,56 @@ define zeroext i1 @pr31257(%"class.std::
> ; CHECK64-NEXT:    .cfi_adjust_cfa_offset 8
> ; CHECK64-NEXT:    popq %r8 # encoding: [0x41,0x58]
> ; CHECK64-NEXT:    .cfi_adjust_cfa_offset -8
> -; CHECK64-NEXT:  .LBB3_1: # %for.cond
> -; CHECK64-NEXT:    # =>This Inner Loop Header: Depth=1
> -; CHECK64-NEXT:    testq %rax, %rax # encoding: [0x48,0x85,0xc0]
> -; CHECK64-NEXT:    je .LBB3_12 # encoding: [0x74,A]
> -; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_12-1, kind: FK_PCRel_1
> -; CHECK64-NEXT:  # %bb.2: # %for.body
> -; CHECK64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; CHECK64-NEXT:    jmp .LBB3_11 # encoding: [0xeb,A]
> +; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_11-1, kind: FK_PCRel_1
> +; CHECK64-NEXT:  .LBB3_1: # %for.body
> +; CHECK64-NEXT:    # in Loop: Header=BB3_11 Depth=1
> ; CHECK64-NEXT:    cmpl $2, %ecx # encoding: [0x83,0xf9,0x02]
> -; CHECK64-NEXT:    je .LBB3_10 # encoding: [0x74,A]
> -; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
> -; CHECK64-NEXT:  # %bb.3: # %for.body
> -; CHECK64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; CHECK64-NEXT:    je .LBB3_9 # encoding: [0x74,A]
> +; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_9-1, kind: FK_PCRel_1
> +; CHECK64-NEXT:  # %bb.2: # %for.body
> +; CHECK64-NEXT:    # in Loop: Header=BB3_11 Depth=1
> ; CHECK64-NEXT:    cmpl $1, %ecx # encoding: [0x83,0xf9,0x01]
> -; CHECK64-NEXT:    je .LBB3_8 # encoding: [0x74,A]
> -; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_8-1, kind: FK_PCRel_1
> -; CHECK64-NEXT:  # %bb.4: # %for.body
> -; CHECK64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; CHECK64-NEXT:    je .LBB3_7 # encoding: [0x74,A]
> +; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_7-1, kind: FK_PCRel_1
> +; CHECK64-NEXT:  # %bb.3: # %for.body
> +; CHECK64-NEXT:    # in Loop: Header=BB3_11 Depth=1
> ; CHECK64-NEXT:    testl %ecx, %ecx # encoding: [0x85,0xc9]
> -; CHECK64-NEXT:    jne .LBB3_11 # encoding: [0x75,A]
> -; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_11-1, kind: FK_PCRel_1
> -; CHECK64-NEXT:  # %bb.5: # %sw.bb
> -; CHECK64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; CHECK64-NEXT:    jne .LBB3_10 # encoding: [0x75,A]
> +; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
> +; CHECK64-NEXT:  # %bb.4: # %sw.bb
> +; CHECK64-NEXT:    # in Loop: Header=BB3_11 Depth=1
> ; CHECK64-NEXT:    movzbl (%rdi), %edx # encoding: [0x0f,0xb6,0x17]
> ; CHECK64-NEXT:    cmpl $43, %edx # encoding: [0x83,0xfa,0x2b]
> ; CHECK64-NEXT:    movl %r8d, %ecx # encoding: [0x44,0x89,0xc1]
> -; CHECK64-NEXT:    je .LBB3_11 # encoding: [0x74,A]
> -; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_11-1, kind: FK_PCRel_1
> -; CHECK64-NEXT:  # %bb.6: # %sw.bb
> -; CHECK64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; CHECK64-NEXT:    je .LBB3_10 # encoding: [0x74,A]
> +; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
> +; CHECK64-NEXT:  # %bb.5: # %sw.bb
> +; CHECK64-NEXT:    # in Loop: Header=BB3_11 Depth=1
> ; CHECK64-NEXT:    cmpb $45, %dl # encoding: [0x80,0xfa,0x2d]
> ; CHECK64-NEXT:    movl %r8d, %ecx # encoding: [0x44,0x89,0xc1]
> -; CHECK64-NEXT:    je .LBB3_11 # encoding: [0x74,A]
> -; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_11-1, kind: FK_PCRel_1
> -; CHECK64-NEXT:  # %bb.7: # %if.else
> -; CHECK64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; CHECK64-NEXT:    je .LBB3_10 # encoding: [0x74,A]
> +; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
> +; CHECK64-NEXT:  # %bb.6: # %if.else
> +; CHECK64-NEXT:    # in Loop: Header=BB3_11 Depth=1
> ; CHECK64-NEXT:    addl $-48, %edx # encoding: [0x83,0xc2,0xd0]
> ; CHECK64-NEXT:    cmpl $10, %edx # encoding: [0x83,0xfa,0x0a]
> -; CHECK64-NEXT:    jmp .LBB3_9 # encoding: [0xeb,A]
> -; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_9-1, kind: FK_PCRel_1
> -; CHECK64-NEXT:  .LBB3_8: # %sw.bb14
> -; CHECK64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; CHECK64-NEXT:    jmp .LBB3_8 # encoding: [0xeb,A]
> +; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_8-1, kind: FK_PCRel_1
> +; CHECK64-NEXT:  .LBB3_7: # %sw.bb14
> +; CHECK64-NEXT:    # in Loop: Header=BB3_11 Depth=1
> ; CHECK64-NEXT:    movzbl (%rdi), %ecx # encoding: [0x0f,0xb6,0x0f]
> ; CHECK64-NEXT:    addl $-48, %ecx # encoding: [0x83,0xc1,0xd0]
> ; CHECK64-NEXT:    cmpl $10, %ecx # encoding: [0x83,0xf9,0x0a]
> -; CHECK64-NEXT:  .LBB3_9: # %if.else
> -; CHECK64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; CHECK64-NEXT:  .LBB3_8: # %if.else
> +; CHECK64-NEXT:    # in Loop: Header=BB3_11 Depth=1
> ; CHECK64-NEXT:    movl %r9d, %ecx # encoding: [0x44,0x89,0xc9]
> -; CHECK64-NEXT:    jb .LBB3_11 # encoding: [0x72,A]
> -; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_11-1, kind: FK_PCRel_1
> +; CHECK64-NEXT:    jb .LBB3_10 # encoding: [0x72,A]
> +; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
> ; CHECK64-NEXT:    jmp .LBB3_13 # encoding: [0xeb,A]
> ; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_13-1, kind: FK_PCRel_1
> -; CHECK64-NEXT:  .LBB3_10: # %sw.bb22
> -; CHECK64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; CHECK64-NEXT:  .LBB3_9: # %sw.bb22
> +; CHECK64-NEXT:    # in Loop: Header=BB3_11 Depth=1
> ; CHECK64-NEXT:    movzbl (%rdi), %ecx # encoding: [0x0f,0xb6,0x0f]
> ; CHECK64-NEXT:    addl $-48, %ecx # encoding: [0x83,0xc1,0xd0]
> ; CHECK64-NEXT:    cmpl $10, %ecx # encoding: [0x83,0xf9,0x0a]
> @@ -429,13 +426,16 @@ define zeroext i1 @pr31257(%"class.std::
> ; CHECK64-NEXT:    jae _Z20isValidIntegerSuffixN9__gnu_cxx17__normal_iteratorIPKcSsEES3_ # TAILCALL
> ; CHECK64-NEXT:    # encoding: [0x73,A]
> ; CHECK64-NEXT:    # fixup A - offset: 1, value: _Z20isValidIntegerSuffixN9__gnu_cxx17__normal_iteratorIPKcSsEES3_-1, kind: FK_PCRel_1
> -; CHECK64-NEXT:  .LBB3_11: # %for.inc
> -; CHECK64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; CHECK64-NEXT:  .LBB3_10: # %for.inc
> +; CHECK64-NEXT:    # in Loop: Header=BB3_11 Depth=1
> ; CHECK64-NEXT:    incq %rdi # encoding: [0x48,0xff,0xc7]
> ; CHECK64-NEXT:    decq %rax # encoding: [0x48,0xff,0xc8]
> -; CHECK64-NEXT:    jmp .LBB3_1 # encoding: [0xeb,A]
> +; CHECK64-NEXT:  .LBB3_11: # %for.cond
> +; CHECK64-NEXT:    # =>This Inner Loop Header: Depth=1
> +; CHECK64-NEXT:    testq %rax, %rax # encoding: [0x48,0x85,0xc0]
> +; CHECK64-NEXT:    jne .LBB3_1 # encoding: [0x75,A]
> ; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB3_1-1, kind: FK_PCRel_1
> -; CHECK64-NEXT:  .LBB3_12:
> +; CHECK64-NEXT:  # %bb.12:
> ; CHECK64-NEXT:    cmpl $2, %ecx # encoding: [0x83,0xf9,0x02]
> ; CHECK64-NEXT:    sete %al # encoding: [0x0f,0x94,0xc0]
> ; CHECK64-NEXT:    # kill: def $al killed $al killed $eax
> @@ -451,54 +451,51 @@ define zeroext i1 @pr31257(%"class.std::
> ; WIN64-NEXT:    movq -24(%rcx), %r8 # encoding: [0x4c,0x8b,0x41,0xe8]
> ; WIN64-NEXT:    leaq (%rcx,%r8), %rdx # encoding: [0x4a,0x8d,0x14,0x01]
> ; WIN64-NEXT:    xorl %eax, %eax # encoding: [0x31,0xc0]
> -; WIN64-NEXT:  .LBB3_1: # %for.cond
> -; WIN64-NEXT:    # =>This Inner Loop Header: Depth=1
> -; WIN64-NEXT:    testq %r8, %r8 # encoding: [0x4d,0x85,0xc0]
> -; WIN64-NEXT:    je .LBB3_11 # encoding: [0x74,A]
> -; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_11-1, kind: FK_PCRel_1
> -; WIN64-NEXT:  # %bb.2: # %for.body
> -; WIN64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; WIN64-NEXT:    jmp .LBB3_10 # encoding: [0xeb,A]
> +; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
> +; WIN64-NEXT:  .LBB3_1: # %for.body
> +; WIN64-NEXT:    # in Loop: Header=BB3_10 Depth=1
> ; WIN64-NEXT:    cmpl $2, %eax # encoding: [0x83,0xf8,0x02]
> -; WIN64-NEXT:    je .LBB3_9 # encoding: [0x74,A]
> -; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_9-1, kind: FK_PCRel_1
> -; WIN64-NEXT:  # %bb.3: # %for.body
> -; WIN64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; WIN64-NEXT:    je .LBB3_8 # encoding: [0x74,A]
> +; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_8-1, kind: FK_PCRel_1
> +; WIN64-NEXT:  # %bb.2: # %for.body
> +; WIN64-NEXT:    # in Loop: Header=BB3_10 Depth=1
> ; WIN64-NEXT:    cmpl $1, %eax # encoding: [0x83,0xf8,0x01]
> -; WIN64-NEXT:    je .LBB3_7 # encoding: [0x74,A]
> -; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_7-1, kind: FK_PCRel_1
> -; WIN64-NEXT:  # %bb.4: # %for.body
> -; WIN64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; WIN64-NEXT:    je .LBB3_6 # encoding: [0x74,A]
> +; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_6-1, kind: FK_PCRel_1
> +; WIN64-NEXT:  # %bb.3: # %for.body
> +; WIN64-NEXT:    # in Loop: Header=BB3_10 Depth=1
> ; WIN64-NEXT:    testl %eax, %eax # encoding: [0x85,0xc0]
> -; WIN64-NEXT:    jne .LBB3_10 # encoding: [0x75,A]
> -; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
> -; WIN64-NEXT:  # %bb.5: # %sw.bb
> -; WIN64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; WIN64-NEXT:    jne .LBB3_9 # encoding: [0x75,A]
> +; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_9-1, kind: FK_PCRel_1
> +; WIN64-NEXT:  # %bb.4: # %sw.bb
> +; WIN64-NEXT:    # in Loop: Header=BB3_10 Depth=1
> ; WIN64-NEXT:    movzbl (%rcx), %r9d # encoding: [0x44,0x0f,0xb6,0x09]
> ; WIN64-NEXT:    cmpl $43, %r9d # encoding: [0x41,0x83,0xf9,0x2b]
> ; WIN64-NEXT:    movl $1, %eax # encoding: [0xb8,0x01,0x00,0x00,0x00]
> -; WIN64-NEXT:    je .LBB3_10 # encoding: [0x74,A]
> -; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
> -; WIN64-NEXT:  # %bb.6: # %sw.bb
> -; WIN64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; WIN64-NEXT:    je .LBB3_9 # encoding: [0x74,A]
> +; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_9-1, kind: FK_PCRel_1
> +; WIN64-NEXT:  # %bb.5: # %sw.bb
> +; WIN64-NEXT:    # in Loop: Header=BB3_10 Depth=1
> ; WIN64-NEXT:    cmpb $45, %r9b # encoding: [0x41,0x80,0xf9,0x2d]
> -; WIN64-NEXT:    je .LBB3_10 # encoding: [0x74,A]
> -; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
> -; WIN64-NEXT:    jmp .LBB3_8 # encoding: [0xeb,A]
> -; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_8-1, kind: FK_PCRel_1
> -; WIN64-NEXT:  .LBB3_7: # %sw.bb14
> -; WIN64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; WIN64-NEXT:    je .LBB3_9 # encoding: [0x74,A]
> +; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_9-1, kind: FK_PCRel_1
> +; WIN64-NEXT:    jmp .LBB3_7 # encoding: [0xeb,A]
> +; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_7-1, kind: FK_PCRel_1
> +; WIN64-NEXT:  .LBB3_6: # %sw.bb14
> +; WIN64-NEXT:    # in Loop: Header=BB3_10 Depth=1
> ; WIN64-NEXT:    movzbl (%rcx), %r9d # encoding: [0x44,0x0f,0xb6,0x09]
> -; WIN64-NEXT:  .LBB3_8: # %if.else
> -; WIN64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; WIN64-NEXT:  .LBB3_7: # %if.else
> +; WIN64-NEXT:    # in Loop: Header=BB3_10 Depth=1
> ; WIN64-NEXT:    addl $-48, %r9d # encoding: [0x41,0x83,0xc1,0xd0]
> ; WIN64-NEXT:    movl $2, %eax # encoding: [0xb8,0x02,0x00,0x00,0x00]
> ; WIN64-NEXT:    cmpl $10, %r9d # encoding: [0x41,0x83,0xf9,0x0a]
> -; WIN64-NEXT:    jb .LBB3_10 # encoding: [0x72,A]
> -; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_10-1, kind: FK_PCRel_1
> +; WIN64-NEXT:    jb .LBB3_9 # encoding: [0x72,A]
> +; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_9-1, kind: FK_PCRel_1
> ; WIN64-NEXT:    jmp .LBB3_12 # encoding: [0xeb,A]
> ; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_12-1, kind: FK_PCRel_1
> -; WIN64-NEXT:  .LBB3_9: # %sw.bb22
> -; WIN64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; WIN64-NEXT:  .LBB3_8: # %sw.bb22
> +; WIN64-NEXT:    # in Loop: Header=BB3_10 Depth=1
> ; WIN64-NEXT:    movzbl (%rcx), %r9d # encoding: [0x44,0x0f,0xb6,0x09]
> ; WIN64-NEXT:    addl $-48, %r9d # encoding: [0x41,0x83,0xc1,0xd0]
> ; WIN64-NEXT:    movl $2, %eax # encoding: [0xb8,0x02,0x00,0x00,0x00]
> @@ -506,13 +503,16 @@ define zeroext i1 @pr31257(%"class.std::
> ; WIN64-NEXT:    jae _Z20isValidIntegerSuffixN9__gnu_cxx17__normal_iteratorIPKcSsEES3_ # TAILCALL
> ; WIN64-NEXT:    # encoding: [0x73,A]
> ; WIN64-NEXT:    # fixup A - offset: 1, value: _Z20isValidIntegerSuffixN9__gnu_cxx17__normal_iteratorIPKcSsEES3_-1, kind: FK_PCRel_1
> -; WIN64-NEXT:  .LBB3_10: # %for.inc
> -; WIN64-NEXT:    # in Loop: Header=BB3_1 Depth=1
> +; WIN64-NEXT:  .LBB3_9: # %for.inc
> +; WIN64-NEXT:    # in Loop: Header=BB3_10 Depth=1
> ; WIN64-NEXT:    incq %rcx # encoding: [0x48,0xff,0xc1]
> ; WIN64-NEXT:    decq %r8 # encoding: [0x49,0xff,0xc8]
> -; WIN64-NEXT:    jmp .LBB3_1 # encoding: [0xeb,A]
> +; WIN64-NEXT:  .LBB3_10: # %for.cond
> +; WIN64-NEXT:    # =>This Inner Loop Header: Depth=1
> +; WIN64-NEXT:    testq %r8, %r8 # encoding: [0x4d,0x85,0xc0]
> +; WIN64-NEXT:    jne .LBB3_1 # encoding: [0x75,A]
> ; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB3_1-1, kind: FK_PCRel_1
> -; WIN64-NEXT:  .LBB3_11:
> +; WIN64-NEXT:  # %bb.11:
> ; WIN64-NEXT:    cmpl $2, %eax # encoding: [0x83,0xf8,0x02]
> ; WIN64-NEXT:    sete %al # encoding: [0x0f,0x94,0xc0]
> ; WIN64-NEXT:    # kill: def $al killed $al killed $eax
>
> Modified: llvm/trunk/test/CodeGen/X86/loop-blocks.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/loop-blocks.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/loop-blocks.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/loop-blocks.ll Thu Aug 22 09:21:32 2019
> @@ -7,14 +7,12 @@
> ; order to avoid a branch within the loop.
>
> ; CHECK-LABEL: simple:
> -;      CHECK:   align
> +;      CHECK:   jmp   .LBB0_1
> +; CHECK-NEXT:   align
> +; CHECK-NEXT: .LBB0_2:
> +; CHECK-NEXT:   callq loop_latch
> ; CHECK-NEXT: .LBB0_1:
> ; CHECK-NEXT:   callq loop_header
> -;      CHECK:   js .LBB0_3
> -; CHECK-NEXT:   callq loop_latch
> -; CHECK-NEXT:   jmp .LBB0_1
> -; CHECK-NEXT: .LBB0_3:
> -; CHECK-NEXT:   callq exit
>
> define void @simple() nounwind {
> entry:
> @@ -77,21 +75,17 @@ exit:
> ; CHECK-LABEL: yet_more_involved:
> ;      CHECK:   jmp .LBB2_1
> ; CHECK-NEXT:   align
> -
> -;      CHECK: .LBB2_1:
> +; CHECK-NEXT: .LBB2_5:
> +; CHECK-NEXT:   callq block_a_true_func
> +; CHECK-NEXT:   callq block_a_merge_func
> +; CHECK-NEXT: .LBB2_1:
> ; CHECK-NEXT:   callq body
> -; CHECK-NEXT:   callq get
> -; CHECK-NEXT:   cmpl $2, %eax
> -; CHECK-NEXT:   jge .LBB2_2
> -; CHECK-NEXT:   callq bar99
> +;
> +; LBB2_4
> +;      CHECK:   callq bar99
> ; CHECK-NEXT:   callq get
> ; CHECK-NEXT:   cmpl $2999, %eax
> -; CHECK-NEXT:   jg .LBB2_6
> -; CHECK-NEXT:   callq block_a_true_func
> -; CHECK-NEXT:   callq block_a_merge_func
> -; CHECK-NEXT:   jmp .LBB2_1
> -; CHECK-NEXT:   align
> -; CHECK-NEXT: .LBB2_6:
> +; CHECK-NEXT:   jle .LBB2_5
> ; CHECK-NEXT:   callq block_a_false_func
> ; CHECK-NEXT:   callq block_a_merge_func
> ; CHECK-NEXT:   jmp .LBB2_1
> @@ -207,12 +201,12 @@ block102:
> }
>
> ; CHECK-LABEL: check_minsize:
> +;      CHECK:   jmp   .LBB4_1
> ; CHECK-NOT:   align
> -; CHECK:      .LBB4_1:
> +; CHECK-NEXT: .LBB4_2:
> +; CHECK-NEXT:   callq loop_latch
> +; CHECK-NEXT: .LBB4_1:
> ; CHECK-NEXT:   callq loop_header
> -; CHECK:        callq loop_latch
> -; CHECK:      .LBB4_3:
> -; CHECK:        callq exit
>
>
> define void @check_minsize() minsize nounwind {
>
> Removed: llvm/trunk/test/CodeGen/X86/loop-rotate.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/loop-rotate.ll?rev=369663&view=auto
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/loop-rotate.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/loop-rotate.ll (removed)
> @@ -1,120 +0,0 @@
> -; RUN: llc -mtriple=i686-linux < %s | FileCheck %s
> -
> -; Don't rotate the loop if the number of fall through to exit is not larger
> -; than the number of fall through to header.
> -define void @no_rotate() {
> -; CHECK-LABEL: no_rotate
> -; CHECK: %entry
> -; CHECK: %header
> -; CHECK: %middle
> -; CHECK: %latch1
> -; CHECK: %latch2
> -; CHECK: %end
> -entry:
> -  br label %header
> -
> -header:
> -  %val1 = call i1 @foo()
> -  br i1 %val1, label %middle, label %end
> -
> -middle:
> -  %val2 = call i1 @foo()
> -  br i1 %val2, label %latch1, label %end
> -
> -latch1:
> -  %val3 = call i1 @foo()
> -  br i1 %val3, label %latch2, label %header
> -
> -latch2:
> -  %val4 = call i1 @foo()
> -  br label %header
> -
> -end:
> -  ret void
> -}
> -
> -define void @do_rotate() {
> -; CHECK-LABEL: do_rotate
> -; CHECK: %entry
> -; CHECK: %then
> -; CHECK: %else
> -; CHECK: %latch1
> -; CHECK: %latch2
> -; CHECK: %header
> -; CHECK: %end
> -entry:
> -  %val0 = call i1 @foo()
> -  br i1 %val0, label %then, label %else
> -
> -then:
> -  call void @a()
> -  br label %header
> -
> -else:
> -  call void @b()
> -  br label %header
> -
> -header:
> -  %val1 = call i1 @foo()
> -  br i1 %val1, label %latch1, label %end
> -
> -latch1:
> -  %val3 = call i1 @foo()
> -  br i1 %val3, label %latch2, label %header
> -
> -latch2:
> -  %val4 = call i1 @foo()
> -  br label %header
> -
> -end:
> -  ret void
> -}
> -
> -; The loop structure is same as in @no_rotate, but the loop header's predecessor
> -; doesn't fall through to it, so it should be rotated to get exit fall through.
> -define void @do_rotate2() {
> -; CHECK-LABEL: do_rotate2
> -; CHECK: %entry
> -; CHECK: %then
> -; CHECK: %middle
> -; CHECK: %latch1
> -; CHECK: %latch2
> -; CHECK: %header
> -; CHECK: %exit
> -entry:
> -  %val0 = call i1 @foo()
> -  br i1 %val0, label %then, label %header, !prof !1
> -
> -then:
> -  call void @a()
> -  br label %end
> -
> -header:
> -  %val1 = call i1 @foo()
> -  br i1 %val1, label %middle, label %exit
> -
> -middle:
> -  %val2 = call i1 @foo()
> -  br i1 %val2, label %latch1, label %exit
> -
> -latch1:
> -  %val3 = call i1 @foo()
> -  br i1 %val3, label %latch2, label %header
> -
> -latch2:
> -  %val4 = call i1 @foo()
> -  br label %header
> -
> -exit:
> -  call void @b()
> -  br label %end
> -
> -end:
> -  ret void
> -}
> -
> -declare i1 @foo()
> -declare void @a()
> -declare void @b()
> -
> -!1 = !{!"branch_weights", i32 10, i32 1}
>
> Modified: llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll Thu Aug 22 09:21:32 2019
> @@ -21,7 +21,22 @@ define void @t(i8* nocapture %in, i8* no
> ; GENERIC-NEXT:    movq _Te1@{{.*}}(%rip), %r8
> ; GENERIC-NEXT:    movq _Te3@{{.*}}(%rip), %r10
> ; GENERIC-NEXT:    movq %rcx, %r11
> +; GENERIC-NEXT:    jmp LBB0_1
> ; GENERIC-NEXT:    .p2align 4, 0x90
> +; GENERIC-NEXT:  LBB0_2: ## %bb1
> +; GENERIC-NEXT:    ## in Loop: Header=BB0_1 Depth=1
> +; GENERIC-NEXT:    movl %edi, %ebx
> +; GENERIC-NEXT:    shrl $16, %ebx
> +; GENERIC-NEXT:    movzbl %bl, %ebx
> +; GENERIC-NEXT:    xorl (%r8,%rbx,4), %eax
> +; GENERIC-NEXT:    xorl -4(%r14), %eax
> +; GENERIC-NEXT:    shrl $24, %edi
> +; GENERIC-NEXT:    movzbl %bpl, %ebx
> +; GENERIC-NEXT:    movl (%r10,%rbx,4), %ebx
> +; GENERIC-NEXT:    xorl (%r9,%rdi,4), %ebx
> +; GENERIC-NEXT:    xorl (%r14), %ebx
> +; GENERIC-NEXT:    decq %r11
> +; GENERIC-NEXT:    addq $16, %r14
> ; GENERIC-NEXT:  LBB0_1: ## %bb
> ; GENERIC-NEXT:    ## =>This Inner Loop Header: Depth=1
> ; GENERIC-NEXT:    movzbl %al, %edi
> @@ -41,23 +56,8 @@ define void @t(i8* nocapture %in, i8* no
> ; GENERIC-NEXT:    shrl $24, %eax
> ; GENERIC-NEXT:    movl (%r9,%rax,4), %eax
> ; GENERIC-NEXT:    testq %r11, %r11
> -; GENERIC-NEXT:    je LBB0_3
> -; GENERIC-NEXT:  ## %bb.2: ## %bb1
> -; GENERIC-NEXT:    ## in Loop: Header=BB0_1 Depth=1
> -; GENERIC-NEXT:    movl %edi, %ebx
> -; GENERIC-NEXT:    shrl $16, %ebx
> -; GENERIC-NEXT:    movzbl %bl, %ebx
> -; GENERIC-NEXT:    xorl (%r8,%rbx,4), %eax
> -; GENERIC-NEXT:    xorl -4(%r14), %eax
> -; GENERIC-NEXT:    shrl $24, %edi
> -; GENERIC-NEXT:    movzbl %bpl, %ebx
> -; GENERIC-NEXT:    movl (%r10,%rbx,4), %ebx
> -; GENERIC-NEXT:    xorl (%r9,%rdi,4), %ebx
> -; GENERIC-NEXT:    xorl (%r14), %ebx
> -; GENERIC-NEXT:    decq %r11
> -; GENERIC-NEXT:    addq $16, %r14
> -; GENERIC-NEXT:    jmp LBB0_1
> -; GENERIC-NEXT:  LBB0_3: ## %bb2
> +; GENERIC-NEXT:    jne LBB0_2
> +; GENERIC-NEXT:  ## %bb.3: ## %bb2
> ; GENERIC-NEXT:    shlq $4, %rcx
> ; GENERIC-NEXT:    andl $-16777216, %eax ## imm = 0xFF000000
> ; GENERIC-NEXT:    movl %edi, %ebx
> @@ -105,7 +105,21 @@ define void @t(i8* nocapture %in, i8* no
> ; ATOM-NEXT:    movq _Te3@{{.*}}(%rip), %r10
> ; ATOM-NEXT:    decl %ecx
> ; ATOM-NEXT:    movq %rcx, %r11
> +; ATOM-NEXT:    jmp LBB0_1
> ; ATOM-NEXT:    .p2align 4, 0x90
> +; ATOM-NEXT:  LBB0_2: ## %bb1
> +; ATOM-NEXT:    ## in Loop: Header=BB0_1 Depth=1
> +; ATOM-NEXT:    shrl $16, %eax
> +; ATOM-NEXT:    shrl $24, %edi
> +; ATOM-NEXT:    decq %r11
> +; ATOM-NEXT:    movzbl %al, %ebp
> +; ATOM-NEXT:    movzbl %bl, %eax
> +; ATOM-NEXT:    movl (%r10,%rax,4), %eax
> +; ATOM-NEXT:    xorl (%r8,%rbp,4), %r15d
> +; ATOM-NEXT:    xorl (%r9,%rdi,4), %eax
> +; ATOM-NEXT:    xorl -4(%r14), %r15d
> +; ATOM-NEXT:    xorl (%r14), %eax
> +; ATOM-NEXT:    addq $16, %r14
> ; ATOM-NEXT:  LBB0_1: ## %bb
> ; ATOM-NEXT:    ## =>This Inner Loop Header: Depth=1
> ; ATOM-NEXT:    movl %eax, %edi
> @@ -126,22 +140,8 @@ define void @t(i8* nocapture %in, i8* no
> ; ATOM-NEXT:    movl (%r9,%rax,4), %r15d
> ; ATOM-NEXT:    testq %r11, %r11
> ; ATOM-NEXT:    movl %edi, %eax
> -; ATOM-NEXT:    je LBB0_3
> -; ATOM-NEXT:  ## %bb.2: ## %bb1
> -; ATOM-NEXT:    ## in Loop: Header=BB0_1 Depth=1
> -; ATOM-NEXT:    shrl $16, %eax
> -; ATOM-NEXT:    shrl $24, %edi
> -; ATOM-NEXT:    decq %r11
> -; ATOM-NEXT:    movzbl %al, %ebp
> -; ATOM-NEXT:    movzbl %bl, %eax
> -; ATOM-NEXT:    movl (%r10,%rax,4), %eax
> -; ATOM-NEXT:    xorl (%r8,%rbp,4), %r15d
> -; ATOM-NEXT:    xorl (%r9,%rdi,4), %eax
> -; ATOM-NEXT:    xorl -4(%r14), %r15d
> -; ATOM-NEXT:    xorl (%r14), %eax
> -; ATOM-NEXT:    addq $16, %r14
> -; ATOM-NEXT:    jmp LBB0_1
> -; ATOM-NEXT:  LBB0_3: ## %bb2
> +; ATOM-NEXT:    jne LBB0_2
> +; ATOM-NEXT:  ## %bb.3: ## %bb2
> ; ATOM-NEXT:    shrl $16, %eax
> ; ATOM-NEXT:    shrl $8, %edi
> ; ATOM-NEXT:    movzbl %bl, %ebp
>
> Modified: llvm/trunk/test/CodeGen/X86/move_latch_to_loop_top.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/move_latch_to_loop_top.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/move_latch_to_loop_top.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/move_latch_to_loop_top.ll Thu Aug 22 09:21:32 2019
> @@ -1,11 +1,11 @@
> -; RUN: llc  -mcpu=corei7 -mtriple=x86_64-linux < %s | FileCheck %s
> +; RUN: llc  -mcpu=corei7 -mtriple=x86_64-linux --force-precise-rotation-cost < %s | FileCheck %s
>
> ; The block latch should be moved before header.
> ;CHECK-LABEL: test1:
> ;CHECK:       %latch
> ;CHECK:       %header
> ;CHECK:       %false
> -define i32 @test1(i32* %p) {
> +define i32 @test1(i32* %p) !prof !0 {
> entry:
>   br label %header
>
> @@ -39,7 +39,7 @@ exit:
> ;CHECK:       %latch
> ;CHECK:       %header
> ;CHECK:       %false
> -define i32 @test2(i32* %p) {
> +define i32 @test2(i32* %p) !prof !0 {
> entry:
>   br label %header
>
> @@ -107,7 +107,7 @@ exit:
> ;CHECK:       %latch
> ;CHECK:       %header
> ;CHECK:       %false
> -define i32 @test3(i32* %p) {
> +define i32 @test3(i32* %p) !prof !0 {
> entry:
>   br label %header
>
> @@ -173,9 +173,9 @@ exit:
> ;CHECK:       %header
> ;CHECK:       %true
> ;CHECK:       %latch
> -;CHECK:       %false
> ;CHECK:       %exit
> -define i32 @test4(i32 %t, i32* %p) {
> +;CHECK:       %false
> +define i32 @test4(i32 %t, i32* %p) !prof !0 {
> entry:
>   br label %header
>
> @@ -207,6 +207,7 @@ exit:
>   ret i32 %count4
> }
>
> +!0 = !{!"function_entry_count", i32 1000}
> !1 = !{!"branch_weights", i32 100, i32 1}
> !2 = !{!"branch_weights", i32 16, i32 16}
> !3 = !{!"branch_weights", i32 51, i32 49}
> @@ -216,7 +217,7 @@ exit:
> ;CHECK:       %entry
> ;CHECK:       %header
> ;CHECK:       %latch
> -define void @test5(i32* %p) {
> +define void @test5(i32* %p) !prof !0 {
> entry:
>   br label %header
>
> @@ -236,4 +237,3 @@ latch:
> exit:
>   ret void
> }
> -
>
> Modified: llvm/trunk/test/CodeGen/X86/pr38185.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pr38185.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/pr38185.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/pr38185.ll Thu Aug 22 09:21:32 2019
> @@ -5,13 +5,9 @@ define void @foo(i32* %a, i32* %b, i32*
> ; CHECK-LABEL: foo:
> ; CHECK:       # %bb.0:
> ; CHECK-NEXT:    movq $0, -{{[0-9]+}}(%rsp)
> +; CHECK-NEXT:    jmp .LBB0_1
> ; CHECK-NEXT:    .p2align 4, 0x90
> -; CHECK-NEXT:  .LBB0_1: # %loop
> -; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> -; CHECK-NEXT:    movq -{{[0-9]+}}(%rsp), %r9
> -; CHECK-NEXT:    cmpq %rcx, %r9
> -; CHECK-NEXT:    je .LBB0_3
> -; CHECK-NEXT:  # %bb.2: # %body
> +; CHECK-NEXT:  .LBB0_2: # %body
> ; CHECK-NEXT:    # in Loop: Header=BB0_1 Depth=1
> ; CHECK-NEXT:    movl $1, (%rdx,%r9,4)
> ; CHECK-NEXT:    movzbl (%rdi,%r9,4), %r8d
> @@ -21,8 +17,12 @@ define void @foo(i32* %a, i32* %b, i32*
> ; CHECK-NEXT:    movl %eax, (%rdi,%r9,4)
> ; CHECK-NEXT:    incq %r9
> ; CHECK-NEXT:    movq %r9, -{{[0-9]+}}(%rsp)
> -; CHECK-NEXT:    jmp .LBB0_1
> -; CHECK-NEXT:  .LBB0_3: # %endloop
> +; CHECK-NEXT:  .LBB0_1: # %loop
> +; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> +; CHECK-NEXT:    movq -{{[0-9]+}}(%rsp), %r9
> +; CHECK-NEXT:    cmpq %rcx, %r9
> +; CHECK-NEXT:    jne .LBB0_2
> +; CHECK-NEXT:  # %bb.3: # %endloop
> ; CHECK-NEXT:    retq
> %i = alloca i64
> store i64 0, i64* %i
>
> Modified: llvm/trunk/test/CodeGen/X86/ragreedy-hoist-spill.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/ragreedy-hoist-spill.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/ragreedy-hoist-spill.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/ragreedy-hoist-spill.ll Thu Aug 22 09:21:32 2019
> @@ -103,34 +103,6 @@ define i8* @SyFgets(i8* %line, i64 %leng
> ; CHECK-NEXT:    xorl %r13d, %r13d
> ; CHECK-NEXT:    jmp LBB0_13
> ; CHECK-NEXT:    .p2align 4, 0x90
> -; CHECK-NEXT:  LBB0_20: ## %sw.bb256
> -; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> -; CHECK-NEXT:    movl %r14d, %r13d
> -; CHECK-NEXT:  LBB0_21: ## %while.cond197.backedge
> -; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> -; CHECK-NEXT:    decl %r15d
> -; CHECK-NEXT:    testl %r15d, %r15d
> -; CHECK-NEXT:    movl %r13d, %r14d
> -; CHECK-NEXT:    jle LBB0_22
> -; CHECK-NEXT:  LBB0_13: ## %while.body200
> -; CHECK-NEXT:    ## =>This Loop Header: Depth=1
> -; CHECK-NEXT:    ## Child Loop BB0_30 Depth 2
> -; CHECK-NEXT:    ## Child Loop BB0_38 Depth 2
> -; CHECK-NEXT:    leal -268(%r14), %eax
> -; CHECK-NEXT:    cmpl $105, %eax
> -; CHECK-NEXT:    ja LBB0_14
> -; CHECK-NEXT:  ## %bb.56: ## %while.body200
> -; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> -; CHECK-NEXT:    movslq (%rdi,%rax,4), %rax
> -; CHECK-NEXT:    addq %rdi, %rax
> -; CHECK-NEXT:    jmpq *%rax
> -; CHECK-NEXT:  LBB0_44: ## %while.cond1037.preheader
> -; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> -; CHECK-NEXT:    testb %dl, %dl
> -; CHECK-NEXT:    movl %r14d, %r13d
> -; CHECK-NEXT:    jne LBB0_21
> -; CHECK-NEXT:    jmp LBB0_55
> -; CHECK-NEXT:    .p2align 4, 0x90
> ; CHECK-NEXT:  LBB0_14: ## %while.body200
> ; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> ; CHECK-NEXT:    leal 1(%r14), %eax
> @@ -146,6 +118,12 @@ define i8* @SyFgets(i8* %line, i64 %leng
> ; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> ; CHECK-NEXT:    movl $1, %r13d
> ; CHECK-NEXT:    jmp LBB0_21
> +; CHECK-NEXT:  LBB0_44: ## %while.cond1037.preheader
> +; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> +; CHECK-NEXT:    testb %dl, %dl
> +; CHECK-NEXT:    movl %r14d, %r13d
> +; CHECK-NEXT:    jne LBB0_21
> +; CHECK-NEXT:    jmp LBB0_55
> ; CHECK-NEXT:  LBB0_26: ## %sw.bb474
> ; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> ; CHECK-NEXT:    testb %dl, %dl
> @@ -159,52 +137,30 @@ define i8* @SyFgets(i8* %line, i64 %leng
> ; CHECK-NEXT:  ## %bb.28: ## %land.rhs485.preheader
> ; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> ; CHECK-NEXT:    ## implicit-def: $rax
> -; CHECK-NEXT:    testb %al, %al
> -; CHECK-NEXT:    jns LBB0_30
> -; CHECK-NEXT:    jmp LBB0_55
> ; CHECK-NEXT:    .p2align 4, 0x90
> -; CHECK-NEXT:  LBB0_32: ## %do.body479.backedge
> -; CHECK-NEXT:    ## in Loop: Header=BB0_30 Depth=2
> -; CHECK-NEXT:    leaq 1(%r12), %rax
> -; CHECK-NEXT:    testb %dl, %dl
> -; CHECK-NEXT:    je LBB0_33
> -; CHECK-NEXT:  ## %bb.29: ## %land.rhs485
> -; CHECK-NEXT:    ## in Loop: Header=BB0_30 Depth=2
> -; CHECK-NEXT:    testb %al, %al
> -; CHECK-NEXT:    js LBB0_55
> -; CHECK-NEXT:  LBB0_30: ## %cond.true.i.i2780
> +; CHECK-NEXT:  LBB0_29: ## %land.rhs485
> ; CHECK-NEXT:    ## Parent Loop BB0_13 Depth=1
> ; CHECK-NEXT:    ## => This Inner Loop Header: Depth=2
> +; CHECK-NEXT:    testb %al, %al
> +; CHECK-NEXT:    js LBB0_55
> +; CHECK-NEXT:  ## %bb.30: ## %cond.true.i.i2780
> +; CHECK-NEXT:    ## in Loop: Header=BB0_29 Depth=2
> ; CHECK-NEXT:    movq %rax, %r12
> ; CHECK-NEXT:    testb %dl, %dl
> ; CHECK-NEXT:    jne LBB0_32
> ; CHECK-NEXT:  ## %bb.31: ## %lor.rhs500
> -; CHECK-NEXT:    ## in Loop: Header=BB0_30 Depth=2
> +; CHECK-NEXT:    ## in Loop: Header=BB0_29 Depth=2
> ; CHECK-NEXT:    movl $256, %esi ## imm = 0x100
> ; CHECK-NEXT:    callq ___maskrune
> ; CHECK-NEXT:    xorl %edx, %edx
> ; CHECK-NEXT:    testb %dl, %dl
> -; CHECK-NEXT:    jne LBB0_32
> -; CHECK-NEXT:    jmp LBB0_34
> -; CHECK-NEXT:  LBB0_45: ## %sw.bb1134
> -; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> -; CHECK-NEXT:    leaq {{[0-9]+}}(%rsp), %rax
> -; CHECK-NEXT:    leaq {{[0-9]+}}(%rsp), %rcx
> -; CHECK-NEXT:    cmpq %rax, %rcx
> -; CHECK-NEXT:    jb LBB0_55
> -; CHECK-NEXT:  ## %bb.46: ## in Loop: Header=BB0_13 Depth=1
> -; CHECK-NEXT:    xorl %ebp, %ebp
> -; CHECK-NEXT:    movl $268, %r13d ## imm = 0x10C
> -; CHECK-NEXT:    jmp LBB0_21
> -; CHECK-NEXT:  LBB0_19: ## %sw.bb243
> -; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> -; CHECK-NEXT:    movl $2, %r13d
> -; CHECK-NEXT:    jmp LBB0_21
> -; CHECK-NEXT:  LBB0_40: ## %sw.bb566
> -; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> -; CHECK-NEXT:    movl $20, %r13d
> -; CHECK-NEXT:    jmp LBB0_21
> -; CHECK-NEXT:  LBB0_33: ## %if.end517.loopexitsplit
> +; CHECK-NEXT:    je LBB0_34
> +; CHECK-NEXT:  LBB0_32: ## %do.body479.backedge
> +; CHECK-NEXT:    ## in Loop: Header=BB0_29 Depth=2
> +; CHECK-NEXT:    leaq 1(%r12), %rax
> +; CHECK-NEXT:    testb %dl, %dl
> +; CHECK-NEXT:    jne LBB0_29
> +; CHECK-NEXT:  ## %bb.33: ## %if.end517.loopexitsplit
> ; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> ; CHECK-NEXT:    incq %r12
> ; CHECK-NEXT:  LBB0_34: ## %if.end517
> @@ -243,6 +199,47 @@ define i8* @SyFgets(i8* %line, i64 %leng
> ; CHECK-NEXT:    leaq {{.*}}(%rip), %rsi
> ; CHECK-NEXT:    leaq {{.*}}(%rip), %rdi
> ; CHECK-NEXT:    jmp LBB0_21
> +; CHECK-NEXT:  LBB0_45: ## %sw.bb1134
> +; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> +; CHECK-NEXT:    leaq {{[0-9]+}}(%rsp), %rax
> +; CHECK-NEXT:    leaq {{[0-9]+}}(%rsp), %rcx
> +; CHECK-NEXT:    cmpq %rax, %rcx
> +; CHECK-NEXT:    jb LBB0_55
> +; CHECK-NEXT:  ## %bb.46: ## in Loop: Header=BB0_13 Depth=1
> +; CHECK-NEXT:    xorl %ebp, %ebp
> +; CHECK-NEXT:    movl $268, %r13d ## imm = 0x10C
> +; CHECK-NEXT:    jmp LBB0_21
> +; CHECK-NEXT:  LBB0_19: ## %sw.bb243
> +; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> +; CHECK-NEXT:    movl $2, %r13d
> +; CHECK-NEXT:    jmp LBB0_21
> +; CHECK-NEXT:  LBB0_40: ## %sw.bb566
> +; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> +; CHECK-NEXT:    movl $20, %r13d
> +; CHECK-NEXT:    jmp LBB0_21
> +; CHECK-NEXT:    .p2align 4, 0x90
> +; CHECK-NEXT:  LBB0_13: ## %while.body200
> +; CHECK-NEXT:    ## =>This Loop Header: Depth=1
> +; CHECK-NEXT:    ## Child Loop BB0_29 Depth 2
> +; CHECK-NEXT:    ## Child Loop BB0_38 Depth 2
> +; CHECK-NEXT:    leal -268(%r14), %eax
> +; CHECK-NEXT:    cmpl $105, %eax
> +; CHECK-NEXT:    ja LBB0_14
> +; CHECK-NEXT:  ## %bb.56: ## %while.body200
> +; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> +; CHECK-NEXT:    movslq (%rdi,%rax,4), %rax
> +; CHECK-NEXT:    addq %rdi, %rax
> +; CHECK-NEXT:    jmpq *%rax
> +; CHECK-NEXT:  LBB0_20: ## %sw.bb256
> +; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> +; CHECK-NEXT:    movl %r14d, %r13d
> +; CHECK-NEXT:  LBB0_21: ## %while.cond197.backedge
> +; CHECK-NEXT:    ## in Loop: Header=BB0_13 Depth=1
> +; CHECK-NEXT:    decl %r15d
> +; CHECK-NEXT:    testl %r15d, %r15d
> +; CHECK-NEXT:    movl %r13d, %r14d
> +; CHECK-NEXT:    jg LBB0_13
> +; CHECK-NEXT:    jmp LBB0_22
> ; CHECK-NEXT:    .p2align 4, 0x90
> ; CHECK-NEXT:  LBB0_42: ## %while.cond864
> ; CHECK-NEXT:    ## =>This Inner Loop Header: Depth=1
>
> Modified: llvm/trunk/test/CodeGen/X86/reverse_branches.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/reverse_branches.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/reverse_branches.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/reverse_branches.ll Thu Aug 22 09:21:32 2019
> @@ -85,36 +85,25 @@ define i32 @test_branches_order() uwtabl
> ; CHECK-NEXT:    jg LBB0_16
> ; CHECK-NEXT:  LBB0_9: ## %for.cond18.preheader
> ; CHECK-NEXT:    ## =>This Loop Header: Depth=1
> -; CHECK-NEXT:    ## Child Loop BB0_11 Depth 2
> +; CHECK-NEXT:    ## Child Loop BB0_10 Depth 2
> ; CHECK-NEXT:    ## Child Loop BB0_12 Depth 3
> ; CHECK-NEXT:    movq %rcx, %rdx
> ; CHECK-NEXT:    xorl %esi, %esi
> ; CHECK-NEXT:    xorl %edi, %edi
> -; CHECK-NEXT:    cmpl $999, %edi ## imm = 0x3E7
> -; CHECK-NEXT:    jle LBB0_11
> -; CHECK-NEXT:    jmp LBB0_15
> ; CHECK-NEXT:    .p2align 4, 0x90
> -; CHECK-NEXT:  LBB0_14: ## %exit
> -; CHECK-NEXT:    ## in Loop: Header=BB0_11 Depth=2
> -; CHECK-NEXT:    addq %rsi, %rbp
> -; CHECK-NEXT:    incq %rdi
> -; CHECK-NEXT:    decq %rsi
> -; CHECK-NEXT:    addq $1001, %rdx ## imm = 0x3E9
> -; CHECK-NEXT:    cmpq $-1000, %rbp ## imm = 0xFC18
> -; CHECK-NEXT:    jne LBB0_5
> -; CHECK-NEXT:  ## %bb.10: ## %for.cond18
> -; CHECK-NEXT:    ## in Loop: Header=BB0_11 Depth=2
> -; CHECK-NEXT:    cmpl $999, %edi ## imm = 0x3E7
> -; CHECK-NEXT:    jg LBB0_15
> -; CHECK-NEXT:  LBB0_11: ## %for.body20
> +; CHECK-NEXT:  LBB0_10: ## %for.cond18
> ; CHECK-NEXT:    ## Parent Loop BB0_9 Depth=1
> ; CHECK-NEXT:    ## => This Loop Header: Depth=2
> ; CHECK-NEXT:    ## Child Loop BB0_12 Depth 3
> +; CHECK-NEXT:    cmpl $999, %edi ## imm = 0x3E7
> +; CHECK-NEXT:    jg LBB0_15
> +; CHECK-NEXT:  ## %bb.11: ## %for.body20
> +; CHECK-NEXT:    ## in Loop: Header=BB0_10 Depth=2
> ; CHECK-NEXT:    movq $-1000, %rbp ## imm = 0xFC18
> ; CHECK-NEXT:    .p2align 4, 0x90
> ; CHECK-NEXT:  LBB0_12: ## %do.body.i
> ; CHECK-NEXT:    ## Parent Loop BB0_9 Depth=1
> -; CHECK-NEXT:    ## Parent Loop BB0_11 Depth=2
> +; CHECK-NEXT:    ## Parent Loop BB0_10 Depth=2
> ; CHECK-NEXT:    ## => This Inner Loop Header: Depth=3
> ; CHECK-NEXT:    cmpb $120, 1000(%rdx,%rbp)
> ; CHECK-NEXT:    je LBB0_14
> @@ -122,6 +111,16 @@ define i32 @test_branches_order() uwtabl
> ; CHECK-NEXT:    ## in Loop: Header=BB0_12 Depth=3
> ; CHECK-NEXT:    incq %rbp
> ; CHECK-NEXT:    jne LBB0_12
> +; CHECK-NEXT:    jmp LBB0_5
> +; CHECK-NEXT:    .p2align 4, 0x90
> +; CHECK-NEXT:  LBB0_14: ## %exit
> +; CHECK-NEXT:    ## in Loop: Header=BB0_10 Depth=2
> +; CHECK-NEXT:    addq %rsi, %rbp
> +; CHECK-NEXT:    incq %rdi
> +; CHECK-NEXT:    decq %rsi
> +; CHECK-NEXT:    addq $1001, %rdx ## imm = 0x3E9
> +; CHECK-NEXT:    cmpq $-1000, %rbp ## imm = 0xFC18
> +; CHECK-NEXT:    je LBB0_10
> ; CHECK-NEXT:  LBB0_5: ## %if.then
> ; CHECK-NEXT:    leaq {{.*}}(%rip), %rdi
> ; CHECK-NEXT:    callq _puts
>
> Modified: llvm/trunk/test/CodeGen/X86/speculative-load-hardening.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/speculative-load-hardening.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/speculative-load-hardening.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/speculative-load-hardening.ll Thu Aug 22 09:21:32 2019
> @@ -215,7 +215,10 @@ define void @test_basic_loop(i32 %a, i32
> ; X64-NEXT:    movl %esi, %ebp
> ; X64-NEXT:    cmovneq %r15, %rax
> ; X64-NEXT:    xorl %ebx, %ebx
> +; X64-NEXT:    jmp .LBB2_3
> ; X64-NEXT:    .p2align 4, 0x90
> +; X64-NEXT:  .LBB2_6: # in Loop: Header=BB2_3 Depth=1
> +; X64-NEXT:    cmovgeq %r15, %rax
> ; X64-NEXT:  .LBB2_3: # %l.header
> ; X64-NEXT:    # =>This Inner Loop Header: Depth=1
> ; X64-NEXT:    movslq (%r12), %rcx
> @@ -234,11 +237,8 @@ define void @test_basic_loop(i32 %a, i32
> ; X64-NEXT:    cmovneq %r15, %rax
> ; X64-NEXT:    incl %ebx
> ; X64-NEXT:    cmpl %ebp, %ebx
> -; X64-NEXT:    jge .LBB2_4
> -; X64-NEXT:  # %bb.6: # in Loop: Header=BB2_3 Depth=1
> -; X64-NEXT:    cmovgeq %r15, %rax
> -; X64-NEXT:    jmp .LBB2_3
> -; X64-NEXT:  .LBB2_4:
> +; X64-NEXT:    jl .LBB2_6
> +; X64-NEXT:  # %bb.4:
> ; X64-NEXT:    cmovlq %r15, %rax
> ; X64-NEXT:  .LBB2_5: # %exit
> ; X64-NEXT:    shlq $47, %rax
> @@ -328,12 +328,20 @@ define void @test_basic_nested_loop(i32
> ; X64-NEXT:    xorl %r13d, %r13d
> ; X64-NEXT:    movl %esi, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
> ; X64-NEXT:    testl %r15d, %r15d
> -; X64-NEXT:    jle .LBB3_4
> +; X64-NEXT:    jg .LBB3_5
> +; X64-NEXT:    jmp .LBB3_4
> ; X64-NEXT:    .p2align 4, 0x90
> +; X64-NEXT:  .LBB3_12:
> +; X64-NEXT:    cmovgeq %rbp, %rax
> +; X64-NEXT:    testl %r15d, %r15d
> +; X64-NEXT:    jle .LBB3_4
> ; X64-NEXT:  .LBB3_5: # %l2.header.preheader
> ; X64-NEXT:    cmovleq %rbp, %rax
> ; X64-NEXT:    xorl %r15d, %r15d
> +; X64-NEXT:    jmp .LBB3_6
> ; X64-NEXT:    .p2align 4, 0x90
> +; X64-NEXT:  .LBB3_11: # in Loop: Header=BB3_6 Depth=1
> +; X64-NEXT:    cmovgeq %rbp, %rax
> ; X64-NEXT:  .LBB3_6: # %l2.header
> ; X64-NEXT:    # =>This Inner Loop Header: Depth=1
> ; X64-NEXT:    movslq (%rbx), %rcx
> @@ -352,12 +360,8 @@ define void @test_basic_nested_loop(i32
> ; X64-NEXT:    cmovneq %rbp, %rax
> ; X64-NEXT:    incl %r15d
> ; X64-NEXT:    cmpl %r12d, %r15d
> -; X64-NEXT:    jge .LBB3_7
> -; X64-NEXT:  # %bb.11: # in Loop: Header=BB3_6 Depth=1
> -; X64-NEXT:    cmovgeq %rbp, %rax
> -; X64-NEXT:    jmp .LBB3_6
> -; X64-NEXT:    .p2align 4, 0x90
> -; X64-NEXT:  .LBB3_7:
> +; X64-NEXT:    jl .LBB3_11
> +; X64-NEXT:  # %bb.7:
> ; X64-NEXT:    cmovlq %rbp, %rax
> ; X64-NEXT:    movl {{[-0-9]+}}(%r{{[sb]}}p), %r15d # 4-byte Reload
> ; X64-NEXT:    jmp .LBB3_8
> @@ -381,13 +385,8 @@ define void @test_basic_nested_loop(i32
> ; X64-NEXT:    cmovneq %rbp, %rax
> ; X64-NEXT:    incl %r13d
> ; X64-NEXT:    cmpl %r15d, %r13d
> -; X64-NEXT:    jge .LBB3_9
> -; X64-NEXT:  # %bb.12:
> -; X64-NEXT:    cmovgeq %rbp, %rax
> -; X64-NEXT:    testl %r15d, %r15d
> -; X64-NEXT:    jg .LBB3_5
> -; X64-NEXT:    jmp .LBB3_4
> -; X64-NEXT:  .LBB3_9:
> +; X64-NEXT:    jl .LBB3_12
> +; X64-NEXT:  # %bb.9:
> ; X64-NEXT:    cmovlq %rbp, %rax
> ; X64-NEXT:  .LBB3_10: # %exit
> ; X64-NEXT:    shlq $47, %rax
> @@ -419,17 +418,7 @@ define void @test_basic_nested_loop(i32
> ; X64-LFENCE-NEXT:    movl %esi, %r15d
> ; X64-LFENCE-NEXT:    lfence
> ; X64-LFENCE-NEXT:    xorl %r12d, %r12d
> -; X64-LFENCE-NEXT:    jmp .LBB3_2
> ; X64-LFENCE-NEXT:    .p2align 4, 0x90
> -; X64-LFENCE-NEXT:  .LBB3_5: # %l1.latch
> -; X64-LFENCE-NEXT:    # in Loop: Header=BB3_2 Depth=1
> -; X64-LFENCE-NEXT:    lfence
> -; X64-LFENCE-NEXT:    movslq (%rbx), %rax
> -; X64-LFENCE-NEXT:    movl (%r14,%rax,4), %edi
> -; X64-LFENCE-NEXT:    callq sink
> -; X64-LFENCE-NEXT:    incl %r12d
> -; X64-LFENCE-NEXT:    cmpl %r15d, %r12d
> -; X64-LFENCE-NEXT:    jge .LBB3_6
> ; X64-LFENCE-NEXT:  .LBB3_2: # %l1.header
> ; X64-LFENCE-NEXT:    # =>This Loop Header: Depth=1
> ; X64-LFENCE-NEXT:    # Child Loop BB3_4 Depth 2
> @@ -451,7 +440,15 @@ define void @test_basic_nested_loop(i32
> ; X64-LFENCE-NEXT:    incl %ebp
> ; X64-LFENCE-NEXT:    cmpl %r13d, %ebp
> ; X64-LFENCE-NEXT:    jl .LBB3_4
> -; X64-LFENCE-NEXT:    jmp .LBB3_5
> +; X64-LFENCE-NEXT:  .LBB3_5: # %l1.latch
> +; X64-LFENCE-NEXT:    # in Loop: Header=BB3_2 Depth=1
> +; X64-LFENCE-NEXT:    lfence
> +; X64-LFENCE-NEXT:    movslq (%rbx), %rax
> +; X64-LFENCE-NEXT:    movl (%r14,%rax,4), %edi
> +; X64-LFENCE-NEXT:    callq sink
> +; X64-LFENCE-NEXT:    incl %r12d
> +; X64-LFENCE-NEXT:    cmpl %r15d, %r12d
> +; X64-LFENCE-NEXT:    jl .LBB3_2
> ; X64-LFENCE-NEXT:  .LBB3_6: # %exit
> ; X64-LFENCE-NEXT:    lfence
> ; X64-LFENCE-NEXT:    addq $8, %rsp
>
> Modified: llvm/trunk/test/CodeGen/X86/tail-dup-merge-loop-headers.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/tail-dup-merge-loop-headers.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/tail-dup-merge-loop-headers.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/tail-dup-merge-loop-headers.ll Thu Aug 22 09:21:32 2019
> @@ -12,17 +12,14 @@ define void @tail_dup_merge_loops(i32 %a
> ; CHECK-NEXT:    # in Loop: Header=BB0_2 Depth=1
> ; CHECK-NEXT:    incq %rsi
> ; CHECK-NEXT:    testl %edi, %edi
> -; CHECK-NEXT:    je .LBB0_5
> +; CHECK-NEXT:    jne .LBB0_2
> +; CHECK-NEXT:    jmp .LBB0_5
> ; CHECK-NEXT:    .p2align 4, 0x90
> -; CHECK-NEXT:  .LBB0_2: # %inner_loop_top
> -; CHECK-NEXT:    # =>This Loop Header: Depth=1
> -; CHECK-NEXT:    # Child Loop BB0_4 Depth 2
> -; CHECK-NEXT:    cmpb $0, (%rsi)
> -; CHECK-NEXT:    js .LBB0_3
> ; CHECK-NEXT:  .LBB0_4: # %inner_loop_latch
> -; CHECK-NEXT:    # Parent Loop BB0_2 Depth=1
> -; CHECK-NEXT:    # => This Inner Loop Header: Depth=2
> +; CHECK-NEXT:    # in Loop: Header=BB0_2 Depth=1
> ; CHECK-NEXT:    addq $2, %rsi
> +; CHECK-NEXT:  .LBB0_2: # %inner_loop_top
> +; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> ; CHECK-NEXT:    cmpb $0, (%rsi)
> ; CHECK-NEXT:    jns .LBB0_4
> ; CHECK-NEXT:    jmp .LBB0_3
> @@ -133,58 +130,58 @@ define i32 @loop_shared_header(i8* %exe,
> ; CHECK-NEXT:    testl %ebp, %ebp
> ; CHECK-NEXT:    je .LBB1_18
> ; CHECK-NEXT:    .p2align 4, 0x90
> -; CHECK-NEXT:  .LBB1_9: # %shared_loop_header
> +; CHECK-NEXT:  .LBB1_8: # %shared_loop_header
> ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> ; CHECK-NEXT:    testq %rbx, %rbx
> ; CHECK-NEXT:    jne .LBB1_27
> -; CHECK-NEXT:  # %bb.10: # %inner_loop_body
> -; CHECK-NEXT:    # in Loop: Header=BB1_9 Depth=1
> +; CHECK-NEXT:  # %bb.9: # %inner_loop_body
> +; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
> ; CHECK-NEXT:    testl %eax, %eax
> -; CHECK-NEXT:    jns .LBB1_9
> -; CHECK-NEXT:  # %bb.11: # %if.end96.i
> -; CHECK-NEXT:    # in Loop: Header=BB1_9 Depth=1
> +; CHECK-NEXT:    jns .LBB1_8
> +; CHECK-NEXT:  # %bb.10: # %if.end96.i
> +; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
> ; CHECK-NEXT:    cmpl $3, %ebp
> ; CHECK-NEXT:    jae .LBB1_22
> -; CHECK-NEXT:  # %bb.12: # %if.end287.i
> -; CHECK-NEXT:    # in Loop: Header=BB1_9 Depth=1
> +; CHECK-NEXT:  # %bb.11: # %if.end287.i
> +; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
> ; CHECK-NEXT:    xorl %esi, %esi
> ; CHECK-NEXT:    cmpl $1, %ebp
> ; CHECK-NEXT:    setne %dl
> ; CHECK-NEXT:    testb %al, %al
> -; CHECK-NEXT:    jne .LBB1_16
> -; CHECK-NEXT:  # %bb.13: # %if.end308.i
> -; CHECK-NEXT:    # in Loop: Header=BB1_9 Depth=1
> +; CHECK-NEXT:    jne .LBB1_15
> +; CHECK-NEXT:  # %bb.12: # %if.end308.i
> +; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
> ; CHECK-NEXT:    testb %al, %al
> -; CHECK-NEXT:    je .LBB1_7
> -; CHECK-NEXT:  # %bb.14: # %if.end335.i
> -; CHECK-NEXT:    # in Loop: Header=BB1_9 Depth=1
> +; CHECK-NEXT:    je .LBB1_17
> +; CHECK-NEXT:  # %bb.13: # %if.end335.i
> +; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
> ; CHECK-NEXT:    xorl %edx, %edx
> ; CHECK-NEXT:    testb %dl, %dl
> ; CHECK-NEXT:    movl $0, %esi
> -; CHECK-NEXT:    jne .LBB1_8
> -; CHECK-NEXT:  # %bb.15: # %merge_other
> -; CHECK-NEXT:    # in Loop: Header=BB1_9 Depth=1
> +; CHECK-NEXT:    jne .LBB1_7
> +; CHECK-NEXT:  # %bb.14: # %merge_other
> +; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
> ; CHECK-NEXT:    xorl %esi, %esi
> -; CHECK-NEXT:    jmp .LBB1_17
> -; CHECK-NEXT:  .LBB1_16: # in Loop: Header=BB1_9 Depth=1
> +; CHECK-NEXT:    jmp .LBB1_16
> +; CHECK-NEXT:  .LBB1_15: # in Loop: Header=BB1_8 Depth=1
> ; CHECK-NEXT:    movb %dl, %sil
> ; CHECK-NEXT:    addl $3, %esi
> -; CHECK-NEXT:  .LBB1_17: # %outer_loop_latch
> -; CHECK-NEXT:    # in Loop: Header=BB1_9 Depth=1
> +; CHECK-NEXT:  .LBB1_16: # %outer_loop_latch
> +; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
> ; CHECK-NEXT:    # implicit-def: $dl
> -; CHECK-NEXT:    jmp .LBB1_8
> -; CHECK-NEXT:  .LBB1_7: # %merge_predecessor_split
> -; CHECK-NEXT:    # in Loop: Header=BB1_9 Depth=1
> +; CHECK-NEXT:    jmp .LBB1_7
> +; CHECK-NEXT:  .LBB1_17: # %merge_predecessor_split
> +; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
> ; CHECK-NEXT:    movb $32, %dl
> ; CHECK-NEXT:    xorl %esi, %esi
> -; CHECK-NEXT:  .LBB1_8: # %outer_loop_latch
> -; CHECK-NEXT:    # in Loop: Header=BB1_9 Depth=1
> +; CHECK-NEXT:  .LBB1_7: # %outer_loop_latch
> +; CHECK-NEXT:    # in Loop: Header=BB1_8 Depth=1
> ; CHECK-NEXT:    movzwl %si, %esi
> ; CHECK-NEXT:    decl %esi
> ; CHECK-NEXT:    movzwl %si, %esi
> ; CHECK-NEXT:    leaq 1(%rcx,%rsi), %rcx
> ; CHECK-NEXT:    testl %ebp, %ebp
> -; CHECK-NEXT:    jne .LBB1_9
> +; CHECK-NEXT:    jne .LBB1_8
> ; CHECK-NEXT:  .LBB1_18: # %while.cond.us1412.i
> ; CHECK-NEXT:    xorl %eax, %eax
> ; CHECK-NEXT:    testb %al, %al
>
> Modified: llvm/trunk/test/CodeGen/X86/tail-dup-repeat.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/tail-dup-repeat.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/tail-dup-repeat.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/tail-dup-repeat.ll Thu Aug 22 09:21:32 2019
> @@ -10,30 +10,35 @@
> define void @repeated_tail_dup(i1 %a1, i1 %a2, i32* %a4, i32* %a5, i8* %a6, i32 %a7) #0 align 2 {
> ; CHECK-LABEL: repeated_tail_dup:
> ; CHECK:       # %bb.0: # %entry
> -; CHECK-NEXT:    testb $1, %dil
> -; CHECK-NEXT:    je .LBB0_3
> ; CHECK-NEXT:    .p2align 4, 0x90
> -; CHECK-NEXT:  .LBB0_2: # %land.lhs.true
> -; CHECK-NEXT:    movl $10, (%rdx)
> -; CHECK-NEXT:  .LBB0_6: # %dup2
> -; CHECK-NEXT:    movl $2, (%rcx)
> -; CHECK-NEXT:    testl %r9d, %r9d
> -; CHECK-NEXT:    jne .LBB0_8
> ; CHECK-NEXT:  .LBB0_1: # %for.cond
> +; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> ; CHECK-NEXT:    testb $1, %dil
> -; CHECK-NEXT:    jne .LBB0_2
> +; CHECK-NEXT:    je .LBB0_3
> +; CHECK-NEXT:  # %bb.2: # %land.lhs.true
> +; CHECK-NEXT:    # in Loop: Header=BB0_1 Depth=1
> +; CHECK-NEXT:    movl $10, (%rdx)
> +; CHECK-NEXT:    jmp .LBB0_6
> +; CHECK-NEXT:    .p2align 4, 0x90
> ; CHECK-NEXT:  .LBB0_3: # %if.end56
> +; CHECK-NEXT:    # in Loop: Header=BB0_1 Depth=1
> ; CHECK-NEXT:    testb $1, %sil
> ; CHECK-NEXT:    je .LBB0_5
> ; CHECK-NEXT:  # %bb.4: # %if.then64
> +; CHECK-NEXT:    # in Loop: Header=BB0_1 Depth=1
> ; CHECK-NEXT:    movb $1, (%r8)
> ; CHECK-NEXT:    testl %r9d, %r9d
> ; CHECK-NEXT:    je .LBB0_1
> ; CHECK-NEXT:    jmp .LBB0_8
> ; CHECK-NEXT:    .p2align 4, 0x90
> ; CHECK-NEXT:  .LBB0_5: # %if.end70
> +; CHECK-NEXT:    # in Loop: Header=BB0_1 Depth=1
> ; CHECK-NEXT:    movl $12, (%rdx)
> -; CHECK-NEXT:    jmp .LBB0_6
> +; CHECK-NEXT:  .LBB0_6: # %dup2
> +; CHECK-NEXT:    # in Loop: Header=BB0_1 Depth=1
> +; CHECK-NEXT:    movl $2, (%rcx)
> +; CHECK-NEXT:    testl %r9d, %r9d
> +; CHECK-NEXT:    je .LBB0_1
> ; CHECK-NEXT:  .LBB0_8: # %for.end
> ; CHECK-NEXT:    retq
> entry:
>
> Modified: llvm/trunk/test/CodeGen/X86/vector-shift-by-select-loop.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shift-by-select-loop.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/vector-shift-by-select-loop.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/vector-shift-by-select-loop.ll Thu Aug 22 09:21:32 2019
> @@ -115,17 +115,8 @@ define void @vector_variable_shift_left_
> ; SSE-NEXT:    jne .LBB0_4
> ; SSE-NEXT:  # %bb.5: # %middle.block
> ; SSE-NEXT:    cmpq %rax, %rdx
> -; SSE-NEXT:    jne .LBB0_6
> -; SSE-NEXT:  .LBB0_9: # %for.cond.cleanup
> -; SSE-NEXT:    retq
> -; SSE-NEXT:    .p2align 4, 0x90
> -; SSE-NEXT:  .LBB0_8: # %for.body
> -; SSE-NEXT:    # in Loop: Header=BB0_6 Depth=1
> -; SSE-NEXT:    # kill: def $cl killed $cl killed $ecx
> -; SSE-NEXT:    shll %cl, (%rdi,%rdx,4)
> -; SSE-NEXT:    incq %rdx
> -; SSE-NEXT:    cmpq %rdx, %rax
> ; SSE-NEXT:    je .LBB0_9
> +; SSE-NEXT:    .p2align 4, 0x90
> ; SSE-NEXT:  .LBB0_6: # %for.body
> ; SSE-NEXT:    # =>This Inner Loop Header: Depth=1
> ; SSE-NEXT:    cmpb $0, (%rsi,%rdx)
> @@ -134,7 +125,15 @@ define void @vector_variable_shift_left_
> ; SSE-NEXT:  # %bb.7: # %for.body
> ; SSE-NEXT:    # in Loop: Header=BB0_6 Depth=1
> ; SSE-NEXT:    movl %r8d, %ecx
> -; SSE-NEXT:    jmp .LBB0_8
> +; SSE-NEXT:  .LBB0_8: # %for.body
> +; SSE-NEXT:    # in Loop: Header=BB0_6 Depth=1
> +; SSE-NEXT:    # kill: def $cl killed $cl killed $ecx
> +; SSE-NEXT:    shll %cl, (%rdi,%rdx,4)
> +; SSE-NEXT:    incq %rdx
> +; SSE-NEXT:    cmpq %rdx, %rax
> +; SSE-NEXT:    jne .LBB0_6
> +; SSE-NEXT:  .LBB0_9: # %for.cond.cleanup
> +; SSE-NEXT:    retq
> ;
> ; AVX1-LABEL: vector_variable_shift_left_loop:
> ; AVX1:       # %bb.0: # %entry
> @@ -242,19 +241,8 @@ define void @vector_variable_shift_left_
> ; AVX1-NEXT:    jne .LBB0_4
> ; AVX1-NEXT:  # %bb.5: # %middle.block
> ; AVX1-NEXT:    cmpq %rax, %rdx
> -; AVX1-NEXT:    jne .LBB0_6
> -; AVX1-NEXT:  .LBB0_9: # %for.cond.cleanup
> -; AVX1-NEXT:    addq $24, %rsp
> -; AVX1-NEXT:    vzeroupper
> -; AVX1-NEXT:    retq
> -; AVX1-NEXT:    .p2align 4, 0x90
> -; AVX1-NEXT:  .LBB0_8: # %for.body
> -; AVX1-NEXT:    # in Loop: Header=BB0_6 Depth=1
> -; AVX1-NEXT:    # kill: def $cl killed $cl killed $ecx
> -; AVX1-NEXT:    shll %cl, (%rdi,%rdx,4)
> -; AVX1-NEXT:    incq %rdx
> -; AVX1-NEXT:    cmpq %rdx, %rax
> ; AVX1-NEXT:    je .LBB0_9
> +; AVX1-NEXT:    .p2align 4, 0x90
> ; AVX1-NEXT:  .LBB0_6: # %for.body
> ; AVX1-NEXT:    # =>This Inner Loop Header: Depth=1
> ; AVX1-NEXT:    cmpb $0, (%rsi,%rdx)
> @@ -263,7 +251,17 @@ define void @vector_variable_shift_left_
> ; AVX1-NEXT:  # %bb.7: # %for.body
> ; AVX1-NEXT:    # in Loop: Header=BB0_6 Depth=1
> ; AVX1-NEXT:    movl %r8d, %ecx
> -; AVX1-NEXT:    jmp .LBB0_8
> +; AVX1-NEXT:  .LBB0_8: # %for.body
> +; AVX1-NEXT:    # in Loop: Header=BB0_6 Depth=1
> +; AVX1-NEXT:    # kill: def $cl killed $cl killed $ecx
> +; AVX1-NEXT:    shll %cl, (%rdi,%rdx,4)
> +; AVX1-NEXT:    incq %rdx
> +; AVX1-NEXT:    cmpq %rdx, %rax
> +; AVX1-NEXT:    jne .LBB0_6
> +; AVX1-NEXT:  .LBB0_9: # %for.cond.cleanup
> +; AVX1-NEXT:    addq $24, %rsp
> +; AVX1-NEXT:    vzeroupper
> +; AVX1-NEXT:    retq
> ;
> ; AVX2-LABEL: vector_variable_shift_left_loop:
> ; AVX2:       # %bb.0: # %entry
> @@ -318,18 +316,8 @@ define void @vector_variable_shift_left_
> ; AVX2-NEXT:    jne .LBB0_4
> ; AVX2-NEXT:  # %bb.5: # %middle.block
> ; AVX2-NEXT:    cmpq %rax, %rdx
> -; AVX2-NEXT:    jne .LBB0_6
> -; AVX2-NEXT:  .LBB0_9: # %for.cond.cleanup
> -; AVX2-NEXT:    vzeroupper
> -; AVX2-NEXT:    retq
> -; AVX2-NEXT:    .p2align 4, 0x90
> -; AVX2-NEXT:  .LBB0_8: # %for.body
> -; AVX2-NEXT:    # in Loop: Header=BB0_6 Depth=1
> -; AVX2-NEXT:    # kill: def $cl killed $cl killed $ecx
> -; AVX2-NEXT:    shll %cl, (%rdi,%rdx,4)
> -; AVX2-NEXT:    incq %rdx
> -; AVX2-NEXT:    cmpq %rdx, %rax
> ; AVX2-NEXT:    je .LBB0_9
> +; AVX2-NEXT:    .p2align 4, 0x90
> ; AVX2-NEXT:  .LBB0_6: # %for.body
> ; AVX2-NEXT:    # =>This Inner Loop Header: Depth=1
> ; AVX2-NEXT:    cmpb $0, (%rsi,%rdx)
> @@ -338,7 +326,16 @@ define void @vector_variable_shift_left_
> ; AVX2-NEXT:  # %bb.7: # %for.body
> ; AVX2-NEXT:    # in Loop: Header=BB0_6 Depth=1
> ; AVX2-NEXT:    movl %r8d, %ecx
> -; AVX2-NEXT:    jmp .LBB0_8
> +; AVX2-NEXT:  .LBB0_8: # %for.body
> +; AVX2-NEXT:    # in Loop: Header=BB0_6 Depth=1
> +; AVX2-NEXT:    # kill: def $cl killed $cl killed $ecx
> +; AVX2-NEXT:    shll %cl, (%rdi,%rdx,4)
> +; AVX2-NEXT:    incq %rdx
> +; AVX2-NEXT:    cmpq %rdx, %rax
> +; AVX2-NEXT:    jne .LBB0_6
> +; AVX2-NEXT:  .LBB0_9: # %for.cond.cleanup
> +; AVX2-NEXT:    vzeroupper
> +; AVX2-NEXT:    retq
> entry:
>   %cmp12 = icmp sgt i32 %count, 0
>   br i1 %cmp12, label %for.body.preheader, label %for.cond.cleanup
>
> Modified: llvm/trunk/test/CodeGen/X86/widen_arith-1.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/widen_arith-1.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/widen_arith-1.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/widen_arith-1.ll Thu Aug 22 09:21:32 2019
> @@ -7,13 +7,9 @@ define void @update(<3 x i8>* %dst, <3 x
> ; CHECK-NEXT:    pushl %eax
> ; CHECK-NEXT:    movl $0, (%esp)
> ; CHECK-NEXT:    pcmpeqd %xmm0, %xmm0
> +; CHECK-NEXT:    jmp .LBB0_1
> ; CHECK-NEXT:    .p2align 4, 0x90
> -; CHECK-NEXT:  .LBB0_1: # %forcond
> -; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> -; CHECK-NEXT:    movl (%esp), %eax
> -; CHECK-NEXT:    cmpl {{[0-9]+}}(%esp), %eax
> -; CHECK-NEXT:    jge .LBB0_3
> -; CHECK-NEXT:  # %bb.2: # %forbody
> +; CHECK-NEXT:  .LBB0_2: # %forbody
> ; CHECK-NEXT:    # in Loop: Header=BB0_1 Depth=1
> ; CHECK-NEXT:    movl (%esp), %eax
> ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %ecx
> @@ -23,8 +19,12 @@ define void @update(<3 x i8>* %dst, <3 x
> ; CHECK-NEXT:    pextrb $2, %xmm1, 2(%ecx,%eax,4)
> ; CHECK-NEXT:    pextrw $0, %xmm1, (%ecx,%eax,4)
> ; CHECK-NEXT:    incl (%esp)
> -; CHECK-NEXT:    jmp .LBB0_1
> -; CHECK-NEXT:  .LBB0_3: # %afterfor
> +; CHECK-NEXT:  .LBB0_1: # %forcond
> +; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> +; CHECK-NEXT:    movl (%esp), %eax
> +; CHECK-NEXT:    cmpl {{[0-9]+}}(%esp), %eax
> +; CHECK-NEXT:    jl .LBB0_2
> +; CHECK-NEXT:  # %bb.3: # %afterfor
> ; CHECK-NEXT:    popl %eax
> ; CHECK-NEXT:    retl
> entry:
>
> Modified: llvm/trunk/test/CodeGen/X86/widen_arith-2.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/widen_arith-2.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/widen_arith-2.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/widen_arith-2.ll Thu Aug 22 09:21:32 2019
> @@ -10,13 +10,9 @@ define void @update(i64* %dst_i, i64* %s
> ; CHECK-NEXT:    movl $0, (%esp)
> ; CHECK-NEXT:    pcmpeqd %xmm0, %xmm0
> ; CHECK-NEXT:    movdqa {{.*#+}} xmm1 = <4,4,4,4,4,4,4,4,u,u,u,u,u,u,u,u>
> +; CHECK-NEXT:    jmp .LBB0_1
> ; CHECK-NEXT:    .p2align 4, 0x90
> -; CHECK-NEXT:  .LBB0_1: # %forcond
> -; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> -; CHECK-NEXT:    movl (%esp), %eax
> -; CHECK-NEXT:    cmpl {{[0-9]+}}(%esp), %eax
> -; CHECK-NEXT:    jge .LBB0_3
> -; CHECK-NEXT:  # %bb.2: # %forbody
> +; CHECK-NEXT:  .LBB0_2: # %forbody
> ; CHECK-NEXT:    # in Loop: Header=BB0_1 Depth=1
> ; CHECK-NEXT:    movl (%esp), %eax
> ; CHECK-NEXT:    leal (,%eax,8), %ecx
> @@ -30,8 +26,12 @@ define void @update(i64* %dst_i, i64* %s
> ; CHECK-NEXT:    pand %xmm1, %xmm2
> ; CHECK-NEXT:    movq %xmm2, (%edx,%eax,8)
> ; CHECK-NEXT:    incl (%esp)
> -; CHECK-NEXT:    jmp .LBB0_1
> -; CHECK-NEXT:  .LBB0_3: # %afterfor
> +; CHECK-NEXT:  .LBB0_1: # %forcond
> +; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> +; CHECK-NEXT:    movl (%esp), %eax
> +; CHECK-NEXT:    cmpl {{[0-9]+}}(%esp), %eax
> +; CHECK-NEXT:    jl .LBB0_2
> +; CHECK-NEXT:  # %bb.3: # %afterfor
> ; CHECK-NEXT:    addl $12, %esp
> ; CHECK-NEXT:    retl
> entry:
>
> Modified: llvm/trunk/test/CodeGen/X86/widen_arith-3.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/widen_arith-3.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/widen_arith-3.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/widen_arith-3.ll Thu Aug 22 09:21:32 2019
> @@ -17,13 +17,9 @@ define void @update(<3 x i16>* %dst, <3
> ; CHECK-NEXT:    movw $1, {{[0-9]+}}(%esp)
> ; CHECK-NEXT:    movl $65537, {{[0-9]+}}(%esp) # imm = 0x10001
> ; CHECK-NEXT:    movl $0, {{[0-9]+}}(%esp)
> +; CHECK-NEXT:    jmp .LBB0_1
> ; CHECK-NEXT:    .p2align 4, 0x90
> -; CHECK-NEXT:  .LBB0_1: # %forcond
> -; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> -; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
> -; CHECK-NEXT:    cmpl 16(%ebp), %eax
> -; CHECK-NEXT:    jge .LBB0_3
> -; CHECK-NEXT:  # %bb.2: # %forbody
> +; CHECK-NEXT:  .LBB0_2: # %forbody
> ; CHECK-NEXT:    # in Loop: Header=BB0_1 Depth=1
> ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
> ; CHECK-NEXT:    movl 12(%ebp), %edx
> @@ -34,8 +30,12 @@ define void @update(<3 x i16>* %dst, <3
> ; CHECK-NEXT:    pextrw $2, %xmm1, 4(%ecx,%eax,8)
> ; CHECK-NEXT:    movd %xmm1, (%ecx,%eax,8)
> ; CHECK-NEXT:    incl {{[0-9]+}}(%esp)
> -; CHECK-NEXT:    jmp .LBB0_1
> -; CHECK-NEXT:  .LBB0_3: # %afterfor
> +; CHECK-NEXT:  .LBB0_1: # %forcond
> +; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> +; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
> +; CHECK-NEXT:    cmpl 16(%ebp), %eax
> +; CHECK-NEXT:    jl .LBB0_2
> +; CHECK-NEXT:  # %bb.3: # %afterfor
> ; CHECK-NEXT:    movl %ebp, %esp
> ; CHECK-NEXT:    popl %ebp
> ; CHECK-NEXT:    retl
>
> Modified: llvm/trunk/test/CodeGen/X86/widen_arith-4.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/widen_arith-4.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/widen_arith-4.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/widen_arith-4.ll Thu Aug 22 09:21:32 2019
> @@ -16,13 +16,9 @@ define void @update(<5 x i16>* %dst, <5
> ; SSE2-NEXT:    movl $0, -{{[0-9]+}}(%rsp)
> ; SSE2-NEXT:    movdqa {{.*#+}} xmm0 = <271,271,271,271,271,u,u,u>
> ; SSE2-NEXT:    movdqa {{.*#+}} xmm1 = <2,4,2,2,2,u,u,u>
> +; SSE2-NEXT:    jmp .LBB0_1
> ; SSE2-NEXT:    .p2align 4, 0x90
> -; SSE2-NEXT:  .LBB0_1: # %forcond
> -; SSE2-NEXT:    # =>This Inner Loop Header: Depth=1
> -; SSE2-NEXT:    movl -{{[0-9]+}}(%rsp), %eax
> -; SSE2-NEXT:    cmpl -{{[0-9]+}}(%rsp), %eax
> -; SSE2-NEXT:    jge .LBB0_3
> -; SSE2-NEXT:  # %bb.2: # %forbody
> +; SSE2-NEXT:  .LBB0_2: # %forbody
> ; SSE2-NEXT:    # in Loop: Header=BB0_1 Depth=1
> ; SSE2-NEXT:    movslq -{{[0-9]+}}(%rsp), %rax
> ; SSE2-NEXT:    movq -{{[0-9]+}}(%rsp), %rcx
> @@ -35,8 +31,12 @@ define void @update(<5 x i16>* %dst, <5
> ; SSE2-NEXT:    pextrw $4, %xmm2, %edx
> ; SSE2-NEXT:    movw %dx, 8(%rcx,%rax)
> ; SSE2-NEXT:    incl -{{[0-9]+}}(%rsp)
> -; SSE2-NEXT:    jmp .LBB0_1
> -; SSE2-NEXT:  .LBB0_3: # %afterfor
> +; SSE2-NEXT:  .LBB0_1: # %forcond
> +; SSE2-NEXT:    # =>This Inner Loop Header: Depth=1
> +; SSE2-NEXT:    movl -{{[0-9]+}}(%rsp), %eax
> +; SSE2-NEXT:    cmpl -{{[0-9]+}}(%rsp), %eax
> +; SSE2-NEXT:    jl .LBB0_2
> +; SSE2-NEXT:  # %bb.3: # %afterfor
> ; SSE2-NEXT:    retq
> ;
> ; SSE41-LABEL: update:
> @@ -49,13 +49,9 @@ define void @update(<5 x i16>* %dst, <5
> ; SSE41-NEXT:    movw $0, -{{[0-9]+}}(%rsp)
> ; SSE41-NEXT:    movl $0, -{{[0-9]+}}(%rsp)
> ; SSE41-NEXT:    movdqa {{.*#+}} xmm0 = <271,271,271,271,271,u,u,u>
> +; SSE41-NEXT:    jmp .LBB0_1
> ; SSE41-NEXT:    .p2align 4, 0x90
> -; SSE41-NEXT:  .LBB0_1: # %forcond
> -; SSE41-NEXT:    # =>This Inner Loop Header: Depth=1
> -; SSE41-NEXT:    movl -{{[0-9]+}}(%rsp), %eax
> -; SSE41-NEXT:    cmpl -{{[0-9]+}}(%rsp), %eax
> -; SSE41-NEXT:    jge .LBB0_3
> -; SSE41-NEXT:  # %bb.2: # %forbody
> +; SSE41-NEXT:  .LBB0_2: # %forbody
> ; SSE41-NEXT:    # in Loop: Header=BB0_1 Depth=1
> ; SSE41-NEXT:    movslq -{{[0-9]+}}(%rsp), %rax
> ; SSE41-NEXT:    movq -{{[0-9]+}}(%rsp), %rcx
> @@ -70,8 +66,12 @@ define void @update(<5 x i16>* %dst, <5
> ; SSE41-NEXT:    pextrw $4, %xmm1, 8(%rcx,%rax)
> ; SSE41-NEXT:    movq %xmm2, (%rcx,%rax)
> ; SSE41-NEXT:    incl -{{[0-9]+}}(%rsp)
> -; SSE41-NEXT:    jmp .LBB0_1
> -; SSE41-NEXT:  .LBB0_3: # %afterfor
> +; SSE41-NEXT:  .LBB0_1: # %forcond
> +; SSE41-NEXT:    # =>This Inner Loop Header: Depth=1
> +; SSE41-NEXT:    movl -{{[0-9]+}}(%rsp), %eax
> +; SSE41-NEXT:    cmpl -{{[0-9]+}}(%rsp), %eax
> +; SSE41-NEXT:    jl .LBB0_2
> +; SSE41-NEXT:  # %bb.3: # %afterfor
> ; SSE41-NEXT:    retq
> entry:
> %dst.addr = alloca <5 x i16>*
>
> Modified: llvm/trunk/test/CodeGen/X86/widen_arith-5.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/widen_arith-5.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/widen_arith-5.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/widen_arith-5.ll Thu Aug 22 09:21:32 2019
> @@ -14,13 +14,9 @@ define void @update(<3 x i32>* %dst, <3
> ; CHECK-NEXT:    movl $1, -{{[0-9]+}}(%rsp)
> ; CHECK-NEXT:    movl $0, -{{[0-9]+}}(%rsp)
> ; CHECK-NEXT:    movdqa {{.*#+}} xmm0 = <3,3,3,u>
> +; CHECK-NEXT:    jmp .LBB0_1
> ; CHECK-NEXT:    .p2align 4, 0x90
> -; CHECK-NEXT:  .LBB0_1: # %forcond
> -; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> -; CHECK-NEXT:    movl -{{[0-9]+}}(%rsp), %eax
> -; CHECK-NEXT:    cmpl -{{[0-9]+}}(%rsp), %eax
> -; CHECK-NEXT:    jge .LBB0_3
> -; CHECK-NEXT:  # %bb.2: # %forbody
> +; CHECK-NEXT:  .LBB0_2: # %forbody
> ; CHECK-NEXT:    # in Loop: Header=BB0_1 Depth=1
> ; CHECK-NEXT:    movslq -{{[0-9]+}}(%rsp), %rax
> ; CHECK-NEXT:    movq -{{[0-9]+}}(%rsp), %rcx
> @@ -32,8 +28,12 @@ define void @update(<3 x i32>* %dst, <3
> ; CHECK-NEXT:    pextrd $2, %xmm1, 8(%rcx,%rax)
> ; CHECK-NEXT:    movq %xmm1, (%rcx,%rax)
> ; CHECK-NEXT:    incl -{{[0-9]+}}(%rsp)
> -; CHECK-NEXT:    jmp .LBB0_1
> -; CHECK-NEXT:  .LBB0_3: # %afterfor
> +; CHECK-NEXT:  .LBB0_1: # %forcond
> +; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> +; CHECK-NEXT:    movl -{{[0-9]+}}(%rsp), %eax
> +; CHECK-NEXT:    cmpl -{{[0-9]+}}(%rsp), %eax
> +; CHECK-NEXT:    jl .LBB0_2
> +; CHECK-NEXT:  # %bb.3: # %afterfor
> ; CHECK-NEXT:    retq
> entry:
> %dst.addr = alloca <3 x i32>*
>
> Modified: llvm/trunk/test/CodeGen/X86/widen_arith-6.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/widen_arith-6.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/widen_arith-6.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/widen_arith-6.ll Thu Aug 22 09:21:32 2019
> @@ -15,13 +15,9 @@ define void @update(<3 x float>* %dst, <
> ; CHECK-NEXT:    movl $1065353216, {{[0-9]+}}(%esp) # imm = 0x3F800000
> ; CHECK-NEXT:    movl $0, {{[0-9]+}}(%esp)
> ; CHECK-NEXT:    movaps {{.*#+}} xmm0 = <1.97604004E+3,1.97604004E+3,1.97604004E+3,u>
> +; CHECK-NEXT:    jmp .LBB0_1
> ; CHECK-NEXT:    .p2align 4, 0x90
> -; CHECK-NEXT:  .LBB0_1: # %forcond
> -; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> -; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
> -; CHECK-NEXT:    cmpl 16(%ebp), %eax
> -; CHECK-NEXT:    jge .LBB0_3
> -; CHECK-NEXT:  # %bb.2: # %forbody
> +; CHECK-NEXT:  .LBB0_2: # %forbody
> ; CHECK-NEXT:    # in Loop: Header=BB0_1 Depth=1
> ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
> ; CHECK-NEXT:    movl 8(%ebp), %ecx
> @@ -34,8 +30,12 @@ define void @update(<3 x float>* %dst, <
> ; CHECK-NEXT:    extractps $1, %xmm1, 4(%ecx,%eax)
> ; CHECK-NEXT:    movss %xmm1, (%ecx,%eax)
> ; CHECK-NEXT:    incl {{[0-9]+}}(%esp)
> -; CHECK-NEXT:    jmp .LBB0_1
> -; CHECK-NEXT:  .LBB0_3: # %afterfor
> +; CHECK-NEXT:  .LBB0_1: # %forcond
> +; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
> +; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
> +; CHECK-NEXT:    cmpl 16(%ebp), %eax
> +; CHECK-NEXT:    jl .LBB0_2
> +; CHECK-NEXT:  # %bb.3: # %afterfor
> ; CHECK-NEXT:    movl %ebp, %esp
> ; CHECK-NEXT:    popl %ebp
> ; CHECK-NEXT:    retl
>
> Modified: llvm/trunk/test/CodeGen/X86/widen_cast-4.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/widen_cast-4.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/widen_cast-4.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/widen_cast-4.ll Thu Aug 22 09:21:32 2019
> @@ -11,13 +11,9 @@ define void @update(i64* %dst_i, i64* %s
> ; WIDE-NEXT:    pcmpeqd %xmm0, %xmm0
> ; WIDE-NEXT:    movdqa {{.*#+}} xmm1 = [63,63,63,63,63,63,63,63,63,63,63,63,63,63,63,63]
> ; WIDE-NEXT:    movdqa {{.*#+}} xmm2 = [32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32]
> +; WIDE-NEXT:    jmp .LBB0_1
> ; WIDE-NEXT:    .p2align 4, 0x90
> -; WIDE-NEXT:  .LBB0_1: # %forcond
> -; WIDE-NEXT:    # =>This Inner Loop Header: Depth=1
> -; WIDE-NEXT:    movl (%esp), %eax
> -; WIDE-NEXT:    cmpl {{[0-9]+}}(%esp), %eax
> -; WIDE-NEXT:    jge .LBB0_3
> -; WIDE-NEXT:  # %bb.2: # %forbody
> +; WIDE-NEXT:  .LBB0_2: # %forbody
> ; WIDE-NEXT:    # in Loop: Header=BB0_1 Depth=1
> ; WIDE-NEXT:    movl (%esp), %eax
> ; WIDE-NEXT:    leal (,%eax,8), %ecx
> @@ -34,8 +30,12 @@ define void @update(i64* %dst_i, i64* %s
> ; WIDE-NEXT:    psubb %xmm2, %xmm3
> ; WIDE-NEXT:    movq %xmm3, (%edx,%eax,8)
> ; WIDE-NEXT:    incl (%esp)
> -; WIDE-NEXT:    jmp .LBB0_1
> -; WIDE-NEXT:  .LBB0_3: # %afterfor
> +; WIDE-NEXT:  .LBB0_1: # %forcond
> +; WIDE-NEXT:    # =>This Inner Loop Header: Depth=1
> +; WIDE-NEXT:    movl (%esp), %eax
> +; WIDE-NEXT:    cmpl {{[0-9]+}}(%esp), %eax
> +; WIDE-NEXT:    jl .LBB0_2
> +; WIDE-NEXT:  # %bb.3: # %afterfor
> ; WIDE-NEXT:    addl $12, %esp
> ; WIDE-NEXT:    retl
> entry:
>
> Modified: llvm/trunk/test/DebugInfo/X86/PR37234.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/DebugInfo/X86/PR37234.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/DebugInfo/X86/PR37234.ll (original)
> +++ llvm/trunk/test/DebugInfo/X86/PR37234.ll Thu Aug 22 09:21:32 2019
> @@ -21,18 +21,18 @@
> ; CHECK-LABEL: # %bb.{{.*}}:
> ; CHECK:        #DEBUG_VALUE: main:aa <- 0
> ; CHECK: #DEBUG_VALUE: main:aa <- $[[REG:[0-9a-z]+]]
> -; CHECK: .LBB0_1:
> -; CHECK:        #DEBUG_VALUE: main:aa <- $[[REG]]
> -; CHECK:        je      .LBB0_4
> -; CHECK: # %bb.{{.*}}:
> +; CHECK: jmp .LBB0_1
> +; CHECK: .LBB0_2:
> ; CHECK:        #DEBUG_VALUE: main:aa <- $[[REG]]
> ; CHECK:        jne     .LBB0_1
> ; CHECK: # %bb.{{.*}}:
> ; CHECK:        #DEBUG_VALUE: main:aa <- $[[REG]]
> ; CHECK:        incl    %[[REG]]
> ; CHECK:        #DEBUG_VALUE: main:aa <- $[[REG]]
> -; CHECK:        jmp     .LBB0_1
> -; CHECK: .LBB0_4:
> +; CHECK: .LBB0_1:
> +; CHECK: #DEBUG_VALUE: main:aa <- $[[REG]]
> +; CHECK:        jne     .LBB0_2
> +; CHECK: # %bb.{{.*}}:
> ; CHECK: #DEBUG_VALUE: main:aa <- $[[REG]]
> ; CHECK: retq
>
>
> Modified: llvm/trunk/test/DebugInfo/X86/dbg-value-transfer-order.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/DebugInfo/X86/dbg-value-transfer-order.ll?rev=369664&r1=369663&r2=369664&view=diff
> ==============================================================================
> --- llvm/trunk/test/DebugInfo/X86/dbg-value-transfer-order.ll (original)
> +++ llvm/trunk/test/DebugInfo/X86/dbg-value-transfer-order.ll Thu Aug 22 09:21:32 2019
> @@ -24,12 +24,6 @@
> ; with the Orders insertion point vector.
>
> ; CHECK-LABEL: f: # @f
> -; CHECK: .LBB0_4:
> -;        Check that this DEBUG_VALUE comes before the left shift.
> -; CHECK:         #DEBUG_VALUE: bit_offset <- $ecx
> -; CHECK:         .cv_loc 0 1 8 28                # t.c:8:28
> -; CHECK:         movl    $1, %[[reg:[^ ]*]]
> -; CHECK:         shll    %cl, %[[reg]]
> ; CHECK: .LBB0_2:                                # %while.body
> ; CHECK:         movl    $32, %ecx
> ; CHECK:         testl   {{.*}}
> @@ -37,7 +31,12 @@
> ; CHECK: # %bb.3:                                 # %if.then
> ; CHECK:         callq   if_then
> ; CHECK:         movl    %eax, %ecx
> -; CHECK:         jmp     .LBB0_4
> +; CHECK: .LBB0_4:                                # %if.end
> +;        Check that this DEBUG_VALUE comes before the left shift.
> +; CHECK:         #DEBUG_VALUE: bit_offset <- $ecx
> +; CHECK:         .cv_loc 0 1 8 28                # t.c:8:28
> +; CHECK:         movl    $1, %[[reg:[^ ]*]]
> +; CHECK:         shll    %cl, %[[reg]]
>
> ; ModuleID = 't.c'
> source_filename = "t.c"
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>


More information about the llvm-commits mailing list