<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 22 February 2018 at 17:38, Richard Smith <span dir="ltr"><<a href="mailto:richard@metafoo.co.uk" target="_blank">richard@metafoo.co.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 22 February 2018 at 17:19, Richard Smith <span dir="ltr"><<a href="mailto:richard@metafoo.co.uk" target="_blank">richard@metafoo.co.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On 14 February 2018 at 06:58, Lama Saba via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Author: lsaba<br>
Date: Wed Feb 14 06:58:53 2018<br>
New Revision: 325128<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=325128&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject?rev=325128&view=rev</a><br>
Log:<br>
[X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346<br>
<br>
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.<br>
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.<br>
The estimated penalty for a store forward block is ~13 cycles.<br>
<br>
This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence<br>
of a load and a store.<br>
<br>
The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.<br>
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.<br>
<br>
Change-Id: Ic41aa9ade6512e0478db66e07e2fd<wbr>e41b4fb35f9<br>
<br>
Added:<br>
llvm/trunk/lib/Target/X86/X86F<wbr>ixupSFB.cpp<br>
llvm/trunk/test/CodeGen/X86/fi<wbr>xup-sfb-32.ll<br>
llvm/trunk/test/CodeGen/X86/fi<wbr>xup-sfb.ll<br>
Modified:<br>
llvm/trunk/lib/Target/X86/CMak<wbr>eLists.txt<br>
llvm/trunk/lib/Target/X86/X86.<wbr>h<br>
llvm/trunk/lib/Target/X86/X86T<wbr>argetMachine.cpp<br>
<br>
Modified: llvm/trunk/lib/Target/X86/CMak<wbr>eLists.txt<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/CMakeLists.txt?rev=325128&r1=325127&r2=325128&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/Target/X8<wbr>6/CMakeLists.txt?rev=325128&r1<wbr>=325127&r2=325128&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/Target/X86/CMak<wbr>eLists.txt (original)<br>
+++ llvm/trunk/lib/Target/X86/CMak<wbr>eLists.txt Wed Feb 14 06:58:53 2018<br>
@@ -31,6 +31,7 @@ set(sources<br>
X86FastISel.cpp<br>
X86FixupBWInsts.cpp<br>
X86FixupLEAs.cpp<br>
+ X86FixupSFB.cpp<br>
X86FixupSetCC.cpp<br>
X86FloatingPoint.cpp<br>
X86FrameLowering.cpp<br>
<br>
Modified: llvm/trunk/lib/Target/X86/X86.<wbr>h<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86.h?rev=325128&r1=325127&r2=325128&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/Target/X8<wbr>6/X86.h?rev=325128&r1=325127&r<wbr>2=325128&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/Target/X86/X86.<wbr>h (original)<br>
+++ llvm/trunk/lib/Target/X86/X86.<wbr>h Wed Feb 14 06:58:53 2018<br>
@@ -70,6 +70,9 @@ FunctionPass *createX86OptimizeLEAs();<br>
/// Return a pass that transforms setcc + movzx pairs into xor + setcc.<br>
FunctionPass *createX86FixupSetCC();<br>
<br>
+/// Return a pass that avoids creating store forward block issues in the hardware.<br>
+FunctionPass *createX86FixupSFB();<br>
+<br>
/// Return a pass that expands WinAlloca pseudo-instructions.<br>
FunctionPass *createX86WinAllocaExpander();<br>
<br>
<br>
Added: llvm/trunk/lib/Target/X86/X86F<wbr>ixupSFB.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86FixupSFB.cpp?rev=325128&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/Target/X8<wbr>6/X86FixupSFB.cpp?rev=325128&v<wbr>iew=auto</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/Target/X86/X86F<wbr>ixupSFB.cpp (added)<br>
+++ llvm/trunk/lib/Target/X86/X86F<wbr>ixupSFB.cpp Wed Feb 14 06:58:53 2018<br>
@@ -0,0 +1,580 @@<br>
+//===- X86FixupSFB.cpp - Avoid HW Store Forward Block issues -----------===//<br>
+//<br>
+// The LLVM Compiler Infrastructure<br>
+//<br>
+// This file is distributed under the University of Illinois Open Source<br>
+// License. See LICENSE.TXT for details.<br>
+//<br>
+//===------------------------<wbr>------------------------------<wbr>----------------===//<br>
+//<br>
+// If a load follows a store and reloads data that the store has written to<br>
+// memory, Intel microarchitectures can in many cases forward the data directly<br>
+// from the store to the load, This "store forwarding" saves cycles by enabling<br>
+// the load to directly obtain the data instead of accessing the data from<br>
+// cache or memory.<br>
+// A "store forward block" occurs in cases that a store cannot be forwarded to<br>
+// the load. The most typical case of store forward block on Intel Core<br>
+// microarchitecture that a small store cannot be forwarded to a large load.<br>
+// The estimated penalty for a store forward block is ~13 cycles.<br>
+//<br>
+// This pass tries to recognize and handle cases where "store forward block"<br>
+// is created by the compiler when lowering memcpy calls to a sequence<br>
+// of a load and a store.<br>
+//<br>
+// The pass currently only handles cases where memcpy is lowered to<br>
+// XMM/YMM registers, it tries to break the memcpy into smaller copies.<br>
+// breaking the memcpy should be possible since there is no atomicity<br>
+// guarantee for loads and stores to XMM/YMM.<br>
+//<br>
+// It could be better for performance to solve the problem by loading<br>
+// to XMM/YMM then inserting the partial store before storing back from XMM/YMM<br>
+// to memory, but this will result in a more conservative optimization since it<br>
+// requires we prove that all memory accesses between the blocking store and the<br>
+// load must alias/don't alias before we can move the store, whereas the<br>
+// transformation done here is correct regardless to other memory accesses.<br>
+//===------------------------<wbr>------------------------------<wbr>----------------===//<br>
+<br>
+#include "X86InstrInfo.h"<br>
+#include "X86Subtarget.h"<br>
+#include "llvm/CodeGen/MachineBasicBloc<wbr>k.h"<br>
+#include "llvm/CodeGen/MachineFunction.<wbr>h"<br>
+#include "llvm/CodeGen/MachineFunctionP<wbr>ass.h"<br>
+#include "llvm/CodeGen/MachineInstr.h"<br>
+#include "llvm/CodeGen/MachineInstrBuil<wbr>der.h"<br>
+#include "llvm/CodeGen/MachineOperand.h<wbr>"<br>
+#include "llvm/CodeGen/MachineRegisterI<wbr>nfo.h"<br>
+#include "llvm/IR/DebugInfoMetadata.h"<br>
+#include "llvm/IR/DebugLoc.h"<br>
+#include "llvm/IR/Function.h"<br>
+#include "llvm/MC/MCInstrDesc.h"<br>
+<br>
+using namespace llvm;<br>
+<br>
+#define DEBUG_TYPE "x86-fixup-SFB"<br>
+<br>
+static cl::opt<bool> DisableX86FixupSFB("disable-fi<wbr>xup-SFB", cl::Hidden,<br>
+ cl::desc("X86: Disable SFB fixup."),<br>
+ cl::init(false));<br>
+namespace {<br>
+<br>
+class FixupSFBPass : public MachineFunctionPass {<br>
+public:<br>
+ FixupSFBPass() : MachineFunctionPass(ID) {}<br>
+<br>
+ StringRef getPassName() const override {<br>
+ return "X86 Fixup Store Forward Block";<br>
+ }<br>
+<br>
+ bool runOnMachineFunction(MachineFu<wbr>nction &MF) override;<br>
+<br>
+private:<br>
+ MachineRegisterInfo *MRI;<br>
+ const X86InstrInfo *TII;<br>
+ const X86RegisterInfo *TRI;<br>
+ SmallVector<std::pair<MachineI<wbr>nstr *, MachineInstr *>, 2> BlockedLoadsStores;<br>
+ SmallVector<MachineInstr *, 2> ForRemoval;<br>
+ bool Is64Bit;<br>
+<br>
+ /// \brief Returns couples of Load then Store to memory which look<br>
+ /// like a memcpy.<br>
+ void findPotentiallylBlockedCopies(<wbr>MachineFunction &MF);<br></blockquote></div></div></div></blockquote><div><br></div><div>There's a typo in this function name.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+ /// \brief Break the memcpy's load and store into smaller copies<br>
+ /// such that each memory load that was blocked by a smaller store<br>
+ /// would now be copied separately.<br>
+ void<br>
+ breakBlockedCopies(MachineInst<wbr>r *LoadInst, MachineInstr *StoreInst,<br>
+ const std::map<int64_t, unsigned> &BlockingStoresDisp);<br>
+ /// \brief Break a copy of size Size to smaller copies.<br>
+ void buildCopies(int Size, MachineInstr *LoadInst, int64_t LdDispImm,<br>
+ MachineInstr *StoreInst, int64_t StDispImm,<br>
+ int64_t LMMOffset, int64_t SMMOffset);<br>
+<br>
+ void buildCopy(MachineInstr *LoadInst, unsigned NLoadOpcode, int64_t LoadDisp,<br>
+ MachineInstr *StoreInst, unsigned NStoreOpcode,<br>
+ int64_t StoreDisp, unsigned Size, int64_t LMMOffset,<br>
+ int64_t SMMOffset);<br>
+<br>
+ unsigned getRegSizeInBytes(MachineInstr *Inst);<br>
+ static char ID;<br>
+};<br>
+<br>
+} // end anonymous namespace<br>
+<br>
+char FixupSFBPass::ID = 0;<br>
+<br>
+FunctionPass *llvm::createX86FixupSFB() { return new FixupSFBPass(); }<br>
+<br>
+static bool isXMMLoadOpcode(unsigned Opcode) {<br>
+ return Opcode == X86::MOVUPSrm || Opcode == X86::MOVAPSrm ||<br>
+ Opcode == X86::VMOVUPSrm || Opcode == X86::VMOVAPSrm ||<br>
+ Opcode == X86::VMOVUPDrm || Opcode == X86::VMOVAPDrm ||<br>
+ Opcode == X86::VMOVDQUrm || Opcode == X86::VMOVDQArm ||<br>
+ Opcode == X86::VMOVUPSZ128rm || Opcode == X86::VMOVAPSZ128rm ||<br>
+ Opcode == X86::VMOVUPDZ128rm || Opcode == X86::VMOVAPDZ128rm ||<br>
+ Opcode == X86::VMOVDQU64Z128rm || Opcode == X86::VMOVDQA64Z128rm ||<br>
+ Opcode == X86::VMOVDQU32Z128rm || Opcode == X86::VMOVDQA32Z128rm;<br>
+}<br>
+static bool isYMMLoadOpcode(unsigned Opcode) {<br>
+ return Opcode == X86::VMOVUPSYrm || Opcode == X86::VMOVAPSYrm ||<br>
+ Opcode == X86::VMOVUPDYrm || Opcode == X86::VMOVAPDYrm ||<br>
+ Opcode == X86::VMOVDQUYrm || Opcode == X86::VMOVDQAYrm ||<br>
+ Opcode == X86::VMOVUPSZ256rm || Opcode == X86::VMOVAPSZ256rm ||<br>
+ Opcode == X86::VMOVUPDZ256rm || Opcode == X86::VMOVAPDZ256rm ||<br>
+ Opcode == X86::VMOVDQU64Z256rm || Opcode == X86::VMOVDQA64Z256rm ||<br>
+ Opcode == X86::VMOVDQU32Z256rm || Opcode == X86::VMOVDQA32Z256rm;<br>
+}<br>
+<br>
+static bool isPotentialBlockedMemCpyLd(uns<wbr>igned Opcode) {<br>
+ return isXMMLoadOpcode(Opcode) || isYMMLoadOpcode(Opcode);<br>
+}<br>
+<br>
+std::map<unsigned, std::pair<unsigned, unsigned>> PotentialBlockedMemCpy{<br></blockquote><div><br></div><div>This map and the others below need to be marked 'static' and should also be marked 'const'.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+ {X86::MOVUPSrm, {X86::MOVUPSmr, X86::MOVAPSmr}},<br>
+ {X86::MOVAPSrm, {X86::MOVUPSmr, X86::MOVAPSmr}},<br>
+ {X86::VMOVUPSrm, {X86::VMOVUPSmr, X86::VMOVAPSmr}},<br>
+ {X86::VMOVAPSrm, {X86::VMOVUPSmr, X86::VMOVAPSmr}},<br>
+ {X86::VMOVUPDrm, {X86::VMOVUPDmr, X86::VMOVAPDmr}},<br>
+ {X86::VMOVAPDrm, {X86::VMOVUPDmr, X86::VMOVAPDmr}},<br>
+ {X86::VMOVDQUrm, {X86::VMOVDQUmr, X86::VMOVDQAmr}},<br>
+ {X86::VMOVDQArm, {X86::VMOVDQUmr, X86::VMOVDQAmr}},<br>
+ {X86::VMOVUPSZ128rm, {X86::VMOVUPSZ128mr, X86::VMOVAPSZ128mr}},<br>
+ {X86::VMOVAPSZ128rm, {X86::VMOVUPSZ128mr, X86::VMOVAPSZ128mr}},<br>
+ {X86::VMOVUPDZ128rm, {X86::VMOVUPDZ128mr, X86::VMOVAPDZ128mr}},<br>
+ {X86::VMOVAPDZ128rm, {X86::VMOVUPDZ128mr, X86::VMOVAPDZ128mr}},<br>
+ {X86::VMOVUPSYrm, {X86::VMOVUPSYmr, X86::VMOVAPSYmr}},<br>
+ {X86::VMOVAPSYrm, {X86::VMOVUPSYmr, X86::VMOVAPSYmr}},<br>
+ {X86::VMOVUPDYrm, {X86::VMOVUPDYmr, X86::VMOVAPDYmr}},<br>
+ {X86::VMOVAPDYrm, {X86::VMOVUPDYmr, X86::VMOVAPDYmr}},<br>
+ {X86::VMOVDQUYrm, {X86::VMOVDQUYmr, X86::VMOVDQAYmr}},<br>
+ {X86::VMOVDQAYrm, {X86::VMOVDQUYmr, X86::VMOVDQAYmr}},<br>
+ {X86::VMOVUPSZ256rm, {X86::VMOVUPSZ256mr, X86::VMOVAPSZ256mr}},<br>
+ {X86::VMOVAPSZ256rm, {X86::VMOVUPSZ256mr, X86::VMOVAPSZ256mr}},<br>
+ {X86::VMOVUPDZ256rm, {X86::VMOVUPDZ256mr, X86::VMOVAPDZ256mr}},<br>
+ {X86::VMOVAPDZ256rm, {X86::VMOVUPDZ256mr, X86::VMOVAPDZ256mr}},<br>
+ {X86::VMOVDQU64Z128rm, {X86::VMOVDQU64Z128mr, X86::VMOVDQA64Z128mr}},<br>
+ {X86::VMOVDQA64Z128rm, {X86::VMOVDQU64Z128mr, X86::VMOVDQA64Z128mr}},<br>
+ {X86::VMOVDQU32Z128rm, {X86::VMOVDQU32Z128mr, X86::VMOVDQA32Z128mr}},<br>
+ {X86::VMOVDQA32Z128rm, {X86::VMOVDQU32Z128mr, X86::VMOVDQA32Z128mr}},<br>
+ {X86::VMOVDQU64Z256rm, {X86::VMOVDQU64Z256mr, X86::VMOVDQA64Z256mr}},<br>
+ {X86::VMOVDQA64Z256rm, {X86::VMOVDQU64Z256mr, X86::VMOVDQA64Z256mr}},<br>
+ {X86::VMOVDQU32Z256rm, {X86::VMOVDQU32Z256mr, X86::VMOVDQA32Z256mr}},<br>
+ {X86::VMOVDQA32Z256rm, {X86::VMOVDQU32Z256mr, X86::VMOVDQA32Z256mr}},<br>
+};<br>
+<br>
+static bool isPotentialBlockedMemCpyPair(u<wbr>nsigned LdOpcode, unsigned StOpcode) {<br>
+ auto PotentialStores = PotentialBlockedMemCpy.at(LdOp<wbr>code);<br>
+ return PotentialStores.first == StOpcode ||<br>
+ PotentialStores.second == StOpcode;<br>
+}<br>
+<br>
+static bool isPotentialBlockingStoreInst(i<wbr>nt Opcode, int LoadOpcode) {<br>
+ bool PBlock = false;<br>
+ PBlock |= Opcode == X86::MOV64mr || Opcode == X86::MOV64mi32 ||<br>
+ Opcode == X86::MOV32mr || Opcode == X86::MOV32mi ||<br>
+ Opcode == X86::MOV16mr || Opcode == X86::MOV16mi ||<br>
+ Opcode == X86::MOV8mr || Opcode == X86::MOV8mi;<br>
+ if (isYMMLoadOpcode(LoadOpcode))<br>
+ PBlock |= Opcode == X86::VMOVUPSmr || Opcode == X86::VMOVAPSmr ||<br>
+ Opcode == X86::VMOVUPDmr || Opcode == X86::VMOVAPDmr ||<br>
+ Opcode == X86::VMOVDQUmr || Opcode == X86::VMOVDQAmr ||<br>
+ Opcode == X86::VMOVUPSZ128mr || Opcode == X86::VMOVAPSZ128mr ||<br>
+ Opcode == X86::VMOVUPDZ128mr || Opcode == X86::VMOVAPDZ128mr ||<br>
+ Opcode == X86::VMOVDQU64Z128mr ||<br>
+ Opcode == X86::VMOVDQA64Z128mr ||<br>
+ Opcode == X86::VMOVDQU32Z128mr || Opcode == X86::VMOVDQA32Z128mr;<br>
+ return PBlock;<br>
+}<br>
+<br>
+static const int MOV128SZ = 16;<br>
+static const int MOV64SZ = 8;<br>
+static const int MOV32SZ = 4;<br>
+static const int MOV16SZ = 2;<br>
+static const int MOV8SZ = 1;<br>
+<br>
+std::map<unsigned, unsigned> YMMtoXMMLoadMap = {<br>
+ {X86::VMOVUPSYrm, X86::VMOVUPSrm},<br>
+ {X86::VMOVAPSYrm, X86::VMOVUPSrm},<br>
+ {X86::VMOVUPDYrm, X86::VMOVUPDrm},<br>
+ {X86::VMOVAPDYrm, X86::VMOVUPDrm},<br>
+ {X86::VMOVDQUYrm, X86::VMOVDQUrm},<br>
+ {X86::VMOVDQAYrm, X86::VMOVDQUrm},<br>
+ {X86::VMOVUPSZ256rm, X86::VMOVUPSZ128rm},<br>
+ {X86::VMOVAPSZ256rm, X86::VMOVUPSZ128rm},<br>
+ {X86::VMOVUPDZ256rm, X86::VMOVUPDZ128rm},<br>
+ {X86::VMOVAPDZ256rm, X86::VMOVUPDZ128rm},<br>
+ {X86::VMOVDQU64Z256rm, X86::VMOVDQU64Z128rm},<br>
+ {X86::VMOVDQA64Z256rm, X86::VMOVDQU64Z128rm},<br>
+ {X86::VMOVDQU32Z256rm, X86::VMOVDQU32Z128rm},<br>
+ {X86::VMOVDQA32Z256rm, X86::VMOVDQU32Z128rm},<br>
+};<br>
+<br>
+std::map<unsigned, unsigned> YMMtoXMMStoreMap = {<br>
+ {X86::VMOVUPSYmr, X86::VMOVUPSmr},<br>
+ {X86::VMOVAPSYmr, X86::VMOVUPSmr},<br>
+ {X86::VMOVUPDYmr, X86::VMOVUPDmr},<br>
+ {X86::VMOVAPDYmr, X86::VMOVUPDmr},<br>
+ {X86::VMOVDQUYmr, X86::VMOVDQUmr},<br>
+ {X86::VMOVDQAYmr, X86::VMOVDQUmr},<br>
+ {X86::VMOVUPSZ256mr, X86::VMOVUPSZ128mr},<br>
+ {X86::VMOVAPSZ256mr, X86::VMOVUPSZ128mr},<br>
+ {X86::VMOVUPDZ256mr, X86::VMOVUPDZ128mr},<br>
+ {X86::VMOVAPDZ256mr, X86::VMOVUPDZ128mr},<br>
+ {X86::VMOVDQU64Z256mr, X86::VMOVDQU64Z128mr},<br>
+ {X86::VMOVDQA64Z256mr, X86::VMOVDQU64Z128mr},<br>
+ {X86::VMOVDQU32Z256mr, X86::VMOVDQU32Z128mr},<br>
+ {X86::VMOVDQA32Z256mr, X86::VMOVDQU32Z128mr},<br>
+};<br>
+<br>
+static int getAddrOffset(MachineInstr *MI) {<br>
+ const MCInstrDesc &Descl = MI->getDesc();<br>
+ int AddrOffset = X86II::getMemoryOperandNo(Desc<wbr>l.TSFlags);<br>
+ assert(AddrOffset != -1 && "Expected Memory Operand");<br>
+ AddrOffset += X86II::getOperandBias(Descl);<br>
+ return AddrOffset;<br>
+}<br>
+<br>
+static MachineOperand &getBaseOperand(MachineInstr *MI) {<br>
+ int AddrOffset = getAddrOffset(MI);<br>
+ return MI->getOperand(AddrOffset + X86::AddrBaseReg);<br>
+}<br>
+<br>
+static MachineOperand &getDispOperand(MachineInstr *MI) {<br>
+ int AddrOffset = getAddrOffset(MI);<br>
+ return MI->getOperand(AddrOffset + X86::AddrDisp);<br>
+}<br>
+<br>
+// Relevant addressing modes contain only base register and immediate<br>
+// displacement or frameindex and immediate displacement.<br>
+// TODO: Consider expanding to other addressing modes in the future<br>
+static bool isRelevantAddressingMode(Machi<wbr>neInstr *MI) {<br>
+ int AddrOffset = getAddrOffset(MI);<br>
+ MachineOperand &Base = MI->getOperand(AddrOffset + X86::AddrBaseReg);<br>
+ MachineOperand &Disp = MI->getOperand(AddrOffset + X86::AddrDisp);<br>
+ MachineOperand &Scale = MI->getOperand(AddrOffset + X86::AddrScaleAmt);<br>
+ MachineOperand &Index = MI->getOperand(AddrOffset + X86::AddrIndexReg);<br>
+ MachineOperand &Segment = MI->getOperand(AddrOffset + X86::AddrSegmentReg);<br>
+<br>
+ if (!((Base.isReg() && Base.getReg() != X86::NoRegister) || Base.isFI()))<br>
+ return false;<br>
+ if (!Disp.isImm())<br>
+ return false;<br>
+ if (Scale.getImm() != 1)<br>
+ return false;<br>
+ if (!(Index.isReg() && Index.getReg() == X86::NoRegister))<br>
+ return false;<br>
+ if (!(Segment.isReg() && Segment.getReg() == X86::NoRegister))<br>
+ return false;<br>
+ return true;<br>
+}<br>
+<br>
+// Collect potentially blocking stores.<br>
+// Limit the number of instructions backwards we want to inspect<br>
+// since the effect of store block won't be visible if the store<br>
+// and load instructions have enough instructions in between to<br>
+// keep the core busy.<br>
+static const unsigned LIMIT = 20;<br></blockquote><div><br></div><div>Please follow the LLVM naming convention (<a href="https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly" target="_blank">https://llvm.org/docs/CodingS<wbr>tandards.html#name-types-<wbr>functions-variables-and-enumer<wbr>ators-properly</a>). Also, move this limit into the findPotentialBlockers function rather than giving it file scope.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+static SmallVector<MachineInstr *, 2><br>
+findPotentialBlockers(Machine<wbr>Instr *LoadInst) {<br>
+ SmallVector<MachineInstr *, 2> PotentialBlockers;<br>
+ unsigned BlockLimit = 0;<br>
+ for (MachineBasicBlock::iterator LI = LoadInst,<br>
+ BB = LoadInst->getParent()->begin()<wbr>;<br>
+ LI != BB; --LI) {<br>
+ BlockLimit++;<br>
+ if (BlockLimit >= LIMIT)<br>
+ break;<br>
+ MachineInstr &MI = *LI;<br>
+ if (MI.getDesc().isCall())<br>
+ break;<br>
+ PotentialBlockers.push_back(&M<wbr>I);<br>
+ }<br>
+ // If we didn't get to the instructions limit try predecessing blocks.<br>
+ // Ideally we should traverse the predecessor blocks in depth with some<br>
+ // coloring algorithm, but for now let's just look at the first order<br>
+ // predecessors.<br>
+ if (BlockLimit < LIMIT) {<br>
+ MachineBasicBlock *MBB = LoadInst->getParent();<br>
+ int LimitLeft = LIMIT - BlockLimit;<br>
+ for (MachineBasicBlock::pred_itera<wbr>tor PB = MBB->pred_begin(),<br>
+ PE = MBB->pred_end();<br>
+ PB != PE; ++PB) {<br>
+ MachineBasicBlock *PMBB = *PB;<br>
+ int PredLimit = 0;<br>
+ for (MachineBasicBlock::reverse_it<wbr>erator PMI = PMBB->rbegin(),<br>
+ PME = PMBB->rend();<br>
+ PMI != PME; ++PMI) {<br>
+ PredLimit++;<br>
+ if (PredLimit >= LimitLeft)<br>
+ break;<br>
+ if (PMI->getDesc().isCall())<br>
+ break;<br>
+ PotentialBlockers.push_back(&*<wbr>PMI);<br>
+ }<br>
+ }<br>
+ }<br>
+ return PotentialBlockers;<br>
+}<br>
+<br>
+void FixupSFBPass::buildCopy(Machin<wbr>eInstr *LoadInst, unsigned NLoadOpcode,<br>
+ int64_t LoadDisp, MachineInstr *StoreInst,<br>
+ unsigned NStoreOpcode, int64_t StoreDisp,<br>
+ unsigned Size, int64_t LMMOffset,<br>
+ int64_t SMMOffset) {<br>
+ MachineOperand &LoadBase = getBaseOperand(LoadInst);<br>
+ MachineOperand &StoreBase = getBaseOperand(StoreInst);<br>
+ MachineBasicBlock *MBB = LoadInst->getParent();<br>
+ MachineMemOperand *LMMO = *LoadInst->memoperands_begin()<wbr>;<br>
+ MachineMemOperand *SMMO = *StoreInst->memoperands_begin(<wbr>);<br>
+<br>
+ unsigned Reg1 = MRI->createVirtualRegister(<br>
+ TII->getRegClass(TII->get(NLoa<wbr>dOpcode), 0, TRI, *(MBB->getParent())));<br>
+ BuildMI(*MBB, LoadInst, LoadInst->getDebugLoc(), TII->get(NLoadOpcode), Reg1)<br>
+ .add(LoadBase)<br>
+ .addImm(1)<br>
+ .addReg(X86::NoRegister)<br>
+ .addImm(LoadDisp)<br>
+ .addReg(X86::NoRegister)<br>
+ .addMemOperand(<br>
+ MBB->getParent()->getMachineMe<wbr>mOperand(LMMO, LMMOffset, Size));<br>
+ DEBUG(LoadInst->getPrevNode()-<wbr>>dump());<br>
+ // If the load and store are consecutive, use the loadInst location to<br>
+ // reduce register pressure.<br>
+ MachineInstr *StInst = StoreInst;<br>
+ if (StoreInst->getPrevNode() == LoadInst)<br>
+ StInst = LoadInst;<br>
+ BuildMI(*MBB, StInst, StInst->getDebugLoc(), TII->get(NStoreOpcode))<br>
+ .add(StoreBase)<br>
+ .addImm(1)<br>
+ .addReg(X86::NoRegister)<br>
+ .addImm(StoreDisp)<br>
+ .addReg(X86::NoRegister)<br>
+ .addReg(Reg1)<br>
+ .addMemOperand(<br>
+ MBB->getParent()->getMachineMe<wbr>mOperand(SMMO, SMMOffset, Size));<br>
+ DEBUG(StInst->getPrevNode()->d<wbr>ump());<br>
+}<br>
+<br>
+void FixupSFBPass::buildCopies(int Size, MachineInstr *LoadInst,<br>
+ int64_t LdDispImm, MachineInstr *StoreInst,<br>
+ int64_t StDispImm, int64_t LMMOffset,<br>
+ int64_t SMMOffset) {<br>
+ int LdDisp = LdDispImm;<br>
+ int StDisp = StDispImm;<br>
+ while (Size > 0) {<br>
+ if ((Size - MOV128SZ >= 0) && isYMMLoadOpcode(LoadInst->getO<wbr>pcode())) {<br>
+ Size = Size - MOV128SZ;<br>
+ buildCopy(LoadInst, YMMtoXMMLoadMap.at(LoadInst->g<wbr>etOpcode()), LdDisp,<br>
+ StoreInst, YMMtoXMMStoreMap.at(StoreInst-<wbr>>getOpcode()), StDisp,<br>
+ MOV128SZ, LMMOffset, SMMOffset);<br>
+ LdDisp += MOV128SZ;<br>
+ StDisp += MOV128SZ;<br>
+ LMMOffset += MOV128SZ;<br>
+ SMMOffset += MOV128SZ;<br>
+ continue;<br>
+ }<br>
+ if (Size - MOV64SZ >= 0 && Is64Bit) {<br>
+ Size = Size - MOV64SZ;<br>
+ buildCopy(LoadInst, X86::MOV64rm, LdDisp, StoreInst, X86::MOV64mr, StDisp,<br>
+ MOV64SZ, LMMOffset, SMMOffset);<br>
+ LdDisp += MOV64SZ;<br>
+ StDisp += MOV64SZ;<br>
+ LMMOffset += MOV64SZ;<br>
+ SMMOffset += MOV64SZ;<br>
+ continue;<br>
+ }<br>
+ if (Size - MOV32SZ >= 0) {<br>
+ Size = Size - MOV32SZ;<br>
+ buildCopy(LoadInst, X86::MOV32rm, LdDisp, StoreInst, X86::MOV32mr, StDisp,<br>
+ MOV32SZ, LMMOffset, SMMOffset);<br>
+ LdDisp += MOV32SZ;<br>
+ StDisp += MOV32SZ;<br>
+ LMMOffset += MOV32SZ;<br>
+ SMMOffset += MOV32SZ;<br>
+ continue;<br>
+ }<br>
+ if (Size - MOV16SZ >= 0) {<br>
+ Size = Size - MOV16SZ;<br>
+ buildCopy(LoadInst, X86::MOV16rm, LdDisp, StoreInst, X86::MOV16mr, StDisp,<br>
+ MOV16SZ, LMMOffset, SMMOffset);<br>
+ LdDisp += MOV16SZ;<br>
+ StDisp += MOV16SZ;<br>
+ LMMOffset += MOV16SZ;<br>
+ SMMOffset += MOV16SZ;<br>
+ continue;<br>
+ }<br>
+ if (Size - MOV8SZ >= 0) {<br>
+ Size = Size - MOV8SZ;<br>
+ buildCopy(LoadInst, X86::MOV8rm, LdDisp, StoreInst, X86::MOV8mr, StDisp,<br>
+ MOV8SZ, LMMOffset, SMMOffset);<br>
+ LdDisp += MOV8SZ;<br>
+ StDisp += MOV8SZ;<br>
+ LMMOffset += MOV8SZ;<br>
+ SMMOffset += MOV8SZ;<br>
+ continue;<br>
+ }<br>
+ }<br>
+ assert(Size == 0 && "Wrong size division");<br>
+}<br>
+<br>
+static void updateKillStatus(MachineInstr *LoadInst, MachineInstr *StoreInst) {<br>
+ MachineOperand &LoadBase = getBaseOperand(LoadInst);<br>
+ MachineOperand &StoreBase = getBaseOperand(StoreInst);<br>
+ if (LoadBase.isReg()) {<br>
+ MachineInstr *LastLoad = LoadInst->getPrevNode();<br>
+ // If the original load and store to xmm/ymm were consecutive<br>
+ // then the partial copies were also created in<br>
+ // a consecutive order to reduce register pressure,<br>
+ // and the location of the last load is before the last store.<br>
+ if (StoreInst->getPrevNode() == LoadInst)<br>
+ LastLoad = LoadInst->getPrevNode()->getPr<wbr>evNode();<br>
+ getBaseOperand(LastLoad).setIs<wbr>Kill(LoadBase.isKill());<br>
+ }<br>
+ if (StoreBase.isReg()) {<br>
+ MachineInstr *StInst = StoreInst;<br>
+ if (StoreInst->getPrevNode() == LoadInst)<br>
+ StInst = LoadInst;<br>
+ getBaseOperand(StInst->getPrev<wbr>Node()).setIsKill(StoreBase.is<wbr>Kill());<br>
+ }<br>
+}<br>
+<br>
+void FixupSFBPass::findPotentiallyl<wbr>BlockedCopies(MachineFunction &MF) {<br>
+ for (auto &MBB : MF)<br>
+ for (auto &MI : MBB)<br>
+ if (isPotentialBlockedMemCpyLd(MI<wbr>.getOpcode())) {<br>
+ int DefVR = MI.getOperand(0).getReg();<br>
+ if (MRI->hasOneUse(DefVR))<br>
+ for (auto UI = MRI->use_nodbg_begin(DefVR), UE = MRI->use_nodbg_end();<br>
+ UI != UE;) {<br>
+ MachineOperand &StoreMO = *UI++;<br>
+ MachineInstr &StoreMI = *StoreMO.getParent();<br>
+ if (isPotentialBlockedMemCpyPair(<wbr>MI.getOpcode(),<br>
+ StoreMI.getOpcode()) &&<br>
+ (StoreMI.getParent() == MI.getParent()))<br>
+ if (isRelevantAddressingMode(&MI) &&<br>
+ isRelevantAddressingMode(&Stor<wbr>eMI))<br>
+ BlockedLoadsStores.push_back(<br>
+ std::pair<MachineInstr *, MachineInstr *>(&MI, &StoreMI));<br>
+ }<br>
+ }<br></blockquote></div></div></div></blockquote><div><br></div><div>The LLVM coding style generally braces if and for loop bodies containing multiple statements (recursively).</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+}<br>
+unsigned FixupSFBPass::getRegSizeInByte<wbr>s(MachineInstr *LoadInst) {<br>
+ auto TRC = TII->getRegClass(TII->get(Load<wbr>Inst->getOpcode()), 0, TRI,<br>
+ *LoadInst->getParent()->getPar<wbr>ent());<br>
+ return TRI->getRegSizeInBits(*TRC) / 8;<br>
+}<br>
+<br>
+void FixupSFBPass::breakBlockedCopi<wbr>es(<br>
+ MachineInstr *LoadInst, MachineInstr *StoreInst,<br>
+ const std::map<int64_t, unsigned> &BlockingStoresDisp) {<br>
+ int64_t LdDispImm = getDispOperand(LoadInst).getIm<wbr>m();<br>
+ int64_t StDispImm = getDispOperand(StoreInst).getI<wbr>mm();<br>
+ int64_t LMMOffset = (*LoadInst->memoperands_begin(<wbr>))->getOffset();<br>
+ int64_t SMMOffset = (*StoreInst->memoperands_begin<wbr>())->getOffset();<br>
+<br>
+ int64_t LdDisp1 = LdDispImm;<br>
+ int64_t LdDisp2 = 0;<br>
+ int64_t StDisp1 = StDispImm;<br>
+ int64_t StDisp2 = 0;<br>
+ unsigned Size1 = 0;<br>
+ unsigned Size2 = 0;<br>
+ int64_t LdStDelta = StDispImm - LdDispImm;<br>
+ for (auto inst : BlockingStoresDisp) {<br></blockquote><div><br></div><div>"inst" here should be capitalized.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+ LdDisp2 = inst.first;<br>
+ StDisp2 = inst.first + LdStDelta;<br>
+ Size1 = std::abs(std::abs(LdDisp2) - std::abs(LdDisp1));<br>
+ Size2 = inst.second;<br>
+ buildCopies(Size1, LoadInst, LdDisp1, StoreInst, StDisp1, LMMOffset,<br>
+ SMMOffset);<br>
+ buildCopies(Size2, LoadInst, LdDisp2, StoreInst, StDisp2, LMMOffset + Size1,<br>
+ SMMOffset + Size1);<br>
+ LdDisp1 = LdDisp2 + Size2;<br>
+ StDisp1 = StDisp2 + Size2;<br>
+ LMMOffset += Size1 + Size2;<br>
+ SMMOffset += Size1 + Size2;<br>
+ }<br>
+ unsigned Size3 = (LdDispImm + getRegSizeInBytes(LoadInst)) - LdDisp1;<br>
+ buildCopies(Size3, LoadInst, LdDisp1, StoreInst, StDisp1, LMMOffset,<br>
+ LMMOffset);<br></blockquote><div><br></div><div>I'm going to add some extra debug info to the above to try to narrow down the miscompile.</div></div></div></div></blockquote><div><br></div><div>Here you go:</div><div><br></div><div><div>Blocked load and store instructions: </div><div> %8:vr128 = MOVUPSrm %stack.0, 1, $noreg, 16, $noreg; mem:LD16[%8](align=8)(<wbr>dereferenceable)</div><div> MOVUPSmr %4:gr64, 1, $noreg, 16, $noreg, killed %8:vr128; mem:ST16[getelementptr inbounds (%widget, %widget* @other.widget, i64 0, i32 0, i32 1, i64 8)](align=8)</div><div>Replaced with:</div><div>breakBlockedCopies iter LdStDelta 0 LdDisp1 16 StDisp1 16 inst.first 16 inst.second 8 LdDisp2 16 StDisp2 16 Size1 0 Size2 8 LMMOffset 0 SMMOffset 0</div><div> %17:gr64 = MOV64rm %stack.0, 1, $noreg, 16, $noreg; mem:LD8[%8](dereferenceable)</div><div> MOV64mr %4:gr64, 1, $noreg, 16, $noreg, %17:gr64; mem:ST8[getelementptr inbounds (%widget, %widget* @other.widget, i64 0, i32 0, i32 1, i64 8)]</div><div>breakBlockedCopies iter LdStDelta 0 LdDisp1 24 StDisp1 24 inst.first 23 inst.second 8 LdDisp2 23 StDisp2 23 Size1 1 Size2 8 LMMOffset 8 SMMOffset 8</div><div> %18:gr8 = MOV8rm %stack.0, 1, $noreg, 24, $noreg; mem:LD1[%8+8](align=8)(<wbr>dereferenceable)</div><div> MOV8mr %4:gr64, 1, $noreg, 24, $noreg, %18:gr8; mem:ST1[getelementptr inbounds (%widget, %widget* @other.widget, i64 0, i32 0, i32 1, i64 8)+8](align=8)</div><div> %19:gr64 = MOV64rm %stack.0, 1, $noreg, 23, $noreg; mem:LD8[%8(align=8)+9](align=<wbr>1)(dereferenceable)</div><div> MOV64mr %4:gr64, 1, $noreg, 23, $noreg, %19:gr64; mem:ST8[getelementptr inbounds (%widget, %widget* @other.widget, i64 0, i32 0, i32 1, i64 8)(align=8)+9](align=1)</div><div>breakBlockedCopies iter LdStDelta 0 LdDisp1 31 StDisp1 31 inst.first 24 inst.second 4 LdDisp2 24 StDisp2 24 Size1 7 Size2 4 LMMOffset 17 SMMOffset 17</div></div></div></div></div></blockquote><div><br></div><div>In the case where unrelated parts of the input are perturbed and the miscompile disappears, the above line of debug and corresponding output below is also gone.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div> %20:gr32 = MOV32rm %stack.0, 1, $noreg, 31, $noreg; mem:LD4[%8(align=8)+17](align=<wbr>1)(dereferenceable)</div><div> MOV32mr %4:gr64, 1, $noreg, 31, $noreg, %20:gr32; mem:ST4[getelementptr inbounds (%widget, %widget* @other.widget, i64 0, i32 0, i32 1, i64 8)(align=8)+17](align=1)</div><div> %21:gr16 = MOV16rm %stack.0, 1, $noreg, 35, $noreg; mem:LD2[%8(align=8)+21](align=<wbr>1)(dereferenceable)</div><div> MOV16mr %4:gr64, 1, $noreg, 35, $noreg, %21:gr16; mem:ST2[getelementptr inbounds (%widget, %widget* @other.widget, i64 0, i32 0, i32 1, i64 8)(align=8)+21](align=1)</div><div> %22:gr8 = MOV8rm %stack.0, 1, $noreg, 37, $noreg; mem:LD1[%8(align=8)+23](align=<wbr>1)(dereferenceable)</div><div> MOV8mr %4:gr64, 1, $noreg, 37, $noreg, %22:gr8; mem:ST1[getelementptr inbounds (%widget, %widget* @other.widget, i64 0, i32 0, i32 1, i64 8)(align=8)+23](align=1)</div><div> %23:gr32 = MOV32rm %stack.0, 1, $noreg, 24, $noreg; mem:LD4[%8+24](align=8)(<wbr>dereferenceable)</div><div> MOV32mr %4:gr64, 1, $noreg, 24, $noreg, %23:gr32; mem:ST4[getelementptr inbounds (%widget, %widget* @other.widget, i64 0, i32 0, i32 1, i64 8)+24](align=8)</div><div>breakBlockedCopies iter LdStDelta 0 LdDisp1 28 StDisp1 28 inst.first 31 inst.second 1 LdDisp2 31 StDisp2 31 Size1 3 Size2 1 LMMOffset 28 SMMOffset 28</div><div> %24:gr16 = MOV16rm %stack.0, 1, $noreg, 28, $noreg; mem:LD2[%8(align=8)+28](align=<wbr>4)(dereferenceable)</div><div> MOV16mr %4:gr64, 1, $noreg, 28, $noreg, %24:gr16; mem:ST2[getelementptr inbounds (%widget, %widget* @other.widget, i64 0, i32 0, i32 1, i64 8)(align=8)+28](align=4) </div><div> %25:gr8 = MOV8rm %stack.0, 1, $noreg, 30, $noreg; mem:LD1[%8(align=8)+30](align=<wbr>2)(dereferenceable)</div><div> MOV8mr %4:gr64, 1, $noreg, 30, $noreg, %25:gr8; mem:ST1[getelementptr inbounds (%widget, %widget* @other.widget, i64 0, i32 0, i32 1, i64 8)(align=8)+30](align=2) </div><div> %26:gr8 = MOV8rm %stack.0, 1, $noreg, 31, $noreg; mem:LD1[%8(align=8)+31](align=<wbr>1)(dereferenceable)</div><div> MOV8mr %4:gr64, 1, $noreg, 31, $noreg, %26:gr8; mem:ST1[getelementptr inbounds (%widget, %widget* @other.widget, i64 0, i32 0, i32 1, i64 8)(align=8)+31](align=1) </div><div>cleanup LdDisp1 32 StDisp1 32 LdDispImm 16 Size3 0 LMMOffset 32 SMMOffset 32</div><div>End X86FixupSFB</div></div><div><br></div><div>(The above values are dumped before the buidlCopies calls in the loop and before the buildCopies call after the loop.) I'm going to go ahead and revert this. If the above isn't enough for you to figure out what's wrong, let me know; I'll keep around the testcase and I'm happy to do more digging.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+}<br>
+<br>
+bool FixupSFBPass::runOnMachineFunc<wbr>tion(MachineFunction &MF) {<br>
+ bool Changed = false;<br>
+<br>
+ if (DisableX86FixupSFB || skipFunction(MF.getFunction())<wbr>)<br>
+ return false;<br>
+<br>
+ MRI = &MF.getRegInfo();<br>
+ assert(MRI->isSSA() && "Expected MIR to be in SSA form");<br>
+ TII = MF.getSubtarget<X86Subtarget>(<wbr>).getInstrInfo();<br>
+ TRI = MF.getSubtarget<X86Subtarget>(<wbr>).getRegisterInfo();<br>
+ Is64Bit = MF.getSubtarget<X86Subtarget>(<wbr>).is64Bit();<br>
+ DEBUG(dbgs() << "Start X86FixupSFB\n";);<br>
+ // Look for a load then a store to XMM/YMM which look like a memcpy<br>
+ findPotentiallylBlockedCopies(<wbr>MF);<br>
+<br>
+ for (auto LoadStoreInst : BlockedLoadsStores) {<br>
+ MachineInstr *LoadInst = LoadStoreInst.first;<br>
+ SmallVector<MachineInstr *, 2> PotentialBlockers =<br>
+ findPotentialBlockers(LoadInst<wbr>);<br>
+<br>
+ MachineOperand &LoadBase = getBaseOperand(LoadInst);<br>
+ int64_t LdDispImm = getDispOperand(LoadInst).getIm<wbr>m();<br>
+ std::map<int64_t, unsigned> BlockingStoresDisp;<br>
+ int LdBaseReg = LoadBase.isReg() ? LoadBase.getReg() : LoadBase.getIndex();<br>
+<br>
+ for (auto PBInst : PotentialBlockers) {<br>
+ if (isPotentialBlockingStoreInst(<wbr>PBInst->getOpcode(),<br>
+ LoadInst->getOpcode())) {<br>
+ if (!isRelevantAddressingMode(PBI<wbr>nst))<br>
+ continue;<br>
+ MachineOperand &PBstoreBase = getBaseOperand(PBInst);<br>
+ int64_t PBstDispImm = getDispOperand(PBInst).getImm(<wbr>);<br>
+ assert(PBInst->hasOneMemOperan<wbr>d() && "Expected One Memory Operand");<br>
+ unsigned PBstSize = (*PBInst->memoperands_begin())<wbr>->getSize();<br>
+ int PBstBaseReg =<br>
+ PBstoreBase.isReg() ? PBstoreBase.getReg() : PBstoreBase.getIndex();<br>
+ // This check doesn't cover all cases, but it will suffice for now.<br>
+ // TODO: take branch probability into consideration, if the blocking<br>
+ // store is in an unreached block, breaking the memcopy could lose<br>
+ // performance.<br>
+ if (((LoadBase.isReg() && PBstoreBase.isReg()) ||<br>
+ (LoadBase.isFI() && PBstoreBase.isFI())) &&<br>
+ LdBaseReg == PBstBaseReg &&<br>
+ ((PBstDispImm >= LdDispImm) &&<br>
+ (PBstDispImm <=<br>
+ LdDispImm + (getRegSizeInBytes(LoadInst) - PBstSize)))) {<br>
+ if (BlockingStoresDisp.count(PBst<wbr>DispImm)) {<br>
+ if (BlockingStoresDisp[PBstDispIm<wbr>m] > PBstSize)<br>
+ BlockingStoresDisp[PBstDispImm<wbr>] = PBstSize;<br>
+<br>
+ } else<br>
+ BlockingStoresDisp[PBstDispImm<wbr>] = PBstSize;<br>
+ }<br>
+ }<br>
+ }<br>
+<br>
+ if (BlockingStoresDisp.size() == 0)<br>
+ continue;<br>
+<br>
+ // We found a store forward block, break the memcpy's load and store<br>
+ // into smaller copies such that each smaller store that was causing<br>
+ // a store block would now be copied separately.<br>
+ MachineInstr *StoreInst = LoadStoreInst.second;<br>
+ DEBUG(dbgs() << "Blocked load and store instructions: \n");<br>
+ DEBUG(LoadInst->dump());<br>
+ DEBUG(StoreInst->dump());<br>
+ DEBUG(dbgs() << "Replaced with:\n");<br>
+ breakBlockedCopies(LoadInst, StoreInst, BlockingStoresDisp);<br>
+ updateKillStatus(LoadInst, StoreInst);<br>
+ ForRemoval.push_back(LoadInst)<wbr>;<br>
+ ForRemoval.push_back(StoreInst<wbr>);<br>
+ }<br>
+ for (auto RemovedInst : ForRemoval) {<br>
+ RemovedInst->eraseFromParent()<wbr>;<br>
+ }<br>
+ ForRemoval.clear();<br>
+ BlockedLoadsStores.clear();<br>
+ DEBUG(dbgs() << "End X86FixupSFB\n";);<br>
+<br>
+ return Changed;<br>
+}<br>
<br>
Modified: llvm/trunk/lib/Target/X86/X86T<wbr>argetMachine.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetMachine.cpp?rev=325128&r1=325127&r2=325128&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/Target/X8<wbr>6/X86TargetMachine.cpp?rev=325<wbr>128&r1=325127&r2=325128&view=d<wbr>iff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/Target/X86/X86T<wbr>argetMachine.cpp (original)<br>
+++ llvm/trunk/lib/Target/X86/X86T<wbr>argetMachine.cpp Wed Feb 14 06:58:53 2018<br>
@@ -449,6 +449,7 @@ void X86PassConfig::addPreRegAlloc(<wbr>) {<br>
addPass(createX86FixupSetCC()<wbr>);<br>
addPass(createX86OptimizeLEAs<wbr>());<br>
addPass(createX86CallFrameOpt<wbr>imization());<br>
+ addPass(createX86FixupSFB());<br>
}<br>
<br>
addPass(createX86WinAllocaExp<wbr>ander());<br>
<br>
Added: llvm/trunk/test/CodeGen/X86/fi<wbr>xup-sfb-32.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/fixup-sfb-32.ll?rev=325128&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/CodeGen/<wbr>X86/fixup-sfb-32.ll?rev=325128<wbr>&view=auto</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/fi<wbr>xup-sfb-32.ll (added)<br>
+++ llvm/trunk/test/CodeGen/X86/fi<wbr>xup-sfb-32.ll Wed Feb 14 06:58:53 2018<br>
@@ -0,0 +1,1926 @@<br>
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.p<wbr>y<br>
+; RUN: llc < %s -mtriple=i686-linux | FileCheck %s -check-prefix=CHECK<br>
+; RUN: llc < %s -mtriple=i686-linux --disable-fixup-SFB | FileCheck %s --check-prefix=DISABLED<br>
+; RUN: llc < %s -mtriple=i686-linux -mattr +sse4.1 | FileCheck %s -check-prefix=CHECK-AVX2<br>
+; RUN: llc < %s -mtriple=i686-linux -mattr=+avx512f,+avx512bw,+avx<wbr>512vl,+avx512dq | FileCheck %s -check-prefix=CHECK-AVX512<br>
+<br>
+%struct.S = type { i32, i32, i32, i32 }<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_conditional_block(%struc<wbr>t.S* nocapture %s1, %struct.S* nocapture %s2, i32 %x, %struct.S* nocapture %s3, %struct.S* nocapture readonly %s4) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_conditional_block:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: pushl %edi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-NEXT: pushl %esi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-NEXT: .cfi_offset %esi, -12<br>
+; CHECK-NEXT: .cfi_offset %edi, -8<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: cmpl $18, %edi<br>
+; CHECK-NEXT: jl .LBB0_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl %edi, 4(%ecx)<br>
+; CHECK-NEXT: .LBB0_2: # %if.end<br>
+; CHECK-NEXT: movups (%esi), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%edx)<br>
+; CHECK-NEXT: movl (%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, (%eax)<br>
+; CHECK-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-NEXT: movl 12(%ecx), %ecx<br>
+; CHECK-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-NEXT: popl %esi<br>
+; CHECK-NEXT: popl %edi<br>
+; CHECK-NEXT: retl<br>
+;<br>
+; DISABLED-LABEL: test_conditional_block:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: pushl %edi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 8<br>
+; DISABLED-NEXT: pushl %esi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 12<br>
+; DISABLED-NEXT: .cfi_offset %esi, -12<br>
+; DISABLED-NEXT: .cfi_offset %edi, -8<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; DISABLED-NEXT: cmpl $18, %edi<br>
+; DISABLED-NEXT: jl .LBB0_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl %edi, 4(%esi)<br>
+; DISABLED-NEXT: .LBB0_2: # %if.end<br>
+; DISABLED-NEXT: movups (%edx), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%ecx)<br>
+; DISABLED-NEXT: movups (%esi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%eax)<br>
+; DISABLED-NEXT: popl %esi<br>
+; DISABLED-NEXT: popl %edi<br>
+; DISABLED-NEXT: retl<br>
+;<br>
+; CHECK-AVX2-LABEL: test_conditional_block:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: pushl %edi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX2-NEXT: pushl %esi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX2-NEXT: .cfi_offset %esi, -12<br>
+; CHECK-AVX2-NEXT: .cfi_offset %edi, -8<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: cmpl $18, %edi<br>
+; CHECK-AVX2-NEXT: jl .LBB0_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl %edi, 4(%ecx)<br>
+; CHECK-AVX2-NEXT: .LBB0_2: # %if.end<br>
+; CHECK-AVX2-NEXT: movups (%esi), %xmm0<br>
+; CHECK-AVX2-NEXT: movups %xmm0, (%edx)<br>
+; CHECK-AVX2-NEXT: movl (%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, (%eax)<br>
+; CHECK-AVX2-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-AVX2-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-AVX2-NEXT: movl 12(%ecx), %ecx<br>
+; CHECK-AVX2-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-AVX2-NEXT: popl %esi<br>
+; CHECK-AVX2-NEXT: popl %edi<br>
+; CHECK-AVX2-NEXT: retl<br>
+;<br>
+; CHECK-AVX512-LABEL: test_conditional_block:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: pushl %edi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX512-NEXT: pushl %esi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX512-NEXT: .cfi_offset %esi, -12<br>
+; CHECK-AVX512-NEXT: .cfi_offset %edi, -8<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: cmpl $18, %edi<br>
+; CHECK-AVX512-NEXT: jl .LBB0_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl %edi, 4(%ecx)<br>
+; CHECK-AVX512-NEXT: .LBB0_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%esi), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%edx)<br>
+; CHECK-AVX512-NEXT: movl (%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, (%eax)<br>
+; CHECK-AVX512-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-AVX512-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-AVX512-NEXT: movl 12(%ecx), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-AVX512-NEXT: popl %esi<br>
+; CHECK-AVX512-NEXT: popl %edi<br>
+; CHECK-AVX512-NEXT: retl<br>
+entry:<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %b = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 1<br>
+ store i32 %x, i32* %b, align 4<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S* %s3 to i8*<br>
+ %1 = bitcast %struct.S* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 4, i1 false)<br>
+ %2 = bitcast %struct.S* %s2 to i8*<br>
+ %3 = bitcast %struct.S* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 16, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_imm_store(%struct.S* nocapture %s1, %struct.S* nocapture %s2, i32 %x, %struct.S* nocapture %s3) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_imm_store:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-NEXT: movl $0, (%edx)<br>
+; CHECK-NEXT: movl $1, (%ecx)<br>
+; CHECK-NEXT: movl (%edx), %ecx<br>
+; CHECK-NEXT: movl %ecx, (%eax)<br>
+; CHECK-NEXT: movl 4(%edx), %ecx<br>
+; CHECK-NEXT: movl %ecx, 4(%eax)<br>
+; CHECK-NEXT: movl 8(%edx), %ecx<br>
+; CHECK-NEXT: movl %ecx, 8(%eax)<br>
+; CHECK-NEXT: movl 12(%edx), %ecx<br>
+; CHECK-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-NEXT: retl<br>
+;<br>
+; DISABLED-LABEL: test_imm_store:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; DISABLED-NEXT: movl $0, (%edx)<br>
+; DISABLED-NEXT: movl $1, (%ecx)<br>
+; DISABLED-NEXT: movups (%edx), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%eax)<br>
+; DISABLED-NEXT: retl<br>
+;<br>
+; CHECK-AVX2-LABEL: test_imm_store:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX2-NEXT: movl $0, (%edx)<br>
+; CHECK-AVX2-NEXT: movl $1, (%ecx)<br>
+; CHECK-AVX2-NEXT: movl (%edx), %ecx<br>
+; CHECK-AVX2-NEXT: movl %ecx, (%eax)<br>
+; CHECK-AVX2-NEXT: movl 4(%edx), %ecx<br>
+; CHECK-AVX2-NEXT: movl %ecx, 4(%eax)<br>
+; CHECK-AVX2-NEXT: movl 8(%edx), %ecx<br>
+; CHECK-AVX2-NEXT: movl %ecx, 8(%eax)<br>
+; CHECK-AVX2-NEXT: movl 12(%edx), %ecx<br>
+; CHECK-AVX2-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-AVX2-NEXT: retl<br>
+;<br>
+; CHECK-AVX512-LABEL: test_imm_store:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX512-NEXT: movl $0, (%edx)<br>
+; CHECK-AVX512-NEXT: movl $1, (%ecx)<br>
+; CHECK-AVX512-NEXT: movl (%edx), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, (%eax)<br>
+; CHECK-AVX512-NEXT: movl 4(%edx), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, 4(%eax)<br>
+; CHECK-AVX512-NEXT: movl 8(%edx), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, 8(%eax)<br>
+; CHECK-AVX512-NEXT: movl 12(%edx), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-AVX512-NEXT: retl<br>
+entry:<br>
+ %a = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 0<br>
+ store i32 0, i32* %a, align 4<br>
+ %a1 = getelementptr inbounds %struct.S, %struct.S* %s3, i64 0, i32 0<br>
+ store i32 1, i32* %a1, align 4<br>
+ %0 = bitcast %struct.S* %s2 to i8*<br>
+ %1 = bitcast %struct.S* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_nondirect_br(%struct.S* nocapture %s1, %struct.S* nocapture %s2, i32 %x, %struct.S* nocapture %s3, %struct.S* nocapture readonly %s4, i32 %x2) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_nondirect_br:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: pushl %edi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-NEXT: pushl %esi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-NEXT: .cfi_offset %esi, -12<br>
+; CHECK-NEXT: .cfi_offset %edi, -8<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: cmpl $18, %ecx<br>
+; CHECK-NEXT: jl .LBB2_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl %ecx, 4(%eax)<br>
+; CHECK-NEXT: .LBB2_2: # %if.end<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: cmpl $14, %edx<br>
+; CHECK-NEXT: jl .LBB2_4<br>
+; CHECK-NEXT: # %bb.3: # %if.then2<br>
+; CHECK-NEXT: movl %edx, 12(%eax)<br>
+; CHECK-NEXT: .LBB2_4: # %if.end3<br>
+; CHECK-NEXT: movups (%edi), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%esi)<br>
+; CHECK-NEXT: movl (%eax), %edx<br>
+; CHECK-NEXT: movl %edx, (%ecx)<br>
+; CHECK-NEXT: movl 4(%eax), %edx<br>
+; CHECK-NEXT: movl %edx, 4(%ecx)<br>
+; CHECK-NEXT: movl 8(%eax), %edx<br>
+; CHECK-NEXT: movl %edx, 8(%ecx)<br>
+; CHECK-NEXT: movl 12(%eax), %eax<br>
+; CHECK-NEXT: movl %eax, 12(%ecx)<br>
+; CHECK-NEXT: popl %esi<br>
+; CHECK-NEXT: popl %edi<br>
+; CHECK-NEXT: retl<br>
+;<br>
+; DISABLED-LABEL: test_nondirect_br:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: pushl %edi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 8<br>
+; DISABLED-NEXT: pushl %esi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 12<br>
+; DISABLED-NEXT: .cfi_offset %esi, -12<br>
+; DISABLED-NEXT: .cfi_offset %edi, -8<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: cmpl $18, %edx<br>
+; DISABLED-NEXT: jl .LBB2_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl %edx, 4(%eax)<br>
+; DISABLED-NEXT: .LBB2_2: # %if.end<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; DISABLED-NEXT: cmpl $14, %ecx<br>
+; DISABLED-NEXT: jl .LBB2_4<br>
+; DISABLED-NEXT: # %bb.3: # %if.then2<br>
+; DISABLED-NEXT: movl %ecx, 12(%eax)<br>
+; DISABLED-NEXT: .LBB2_4: # %if.end3<br>
+; DISABLED-NEXT: movups (%edi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%esi)<br>
+; DISABLED-NEXT: movups (%eax), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%edx)<br>
+; DISABLED-NEXT: popl %esi<br>
+; DISABLED-NEXT: popl %edi<br>
+; DISABLED-NEXT: retl<br>
+;<br>
+; CHECK-AVX2-LABEL: test_nondirect_br:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: pushl %edi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX2-NEXT: pushl %esi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX2-NEXT: .cfi_offset %esi, -12<br>
+; CHECK-AVX2-NEXT: .cfi_offset %edi, -8<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: cmpl $18, %ecx<br>
+; CHECK-AVX2-NEXT: jl .LBB2_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl %ecx, 4(%eax)<br>
+; CHECK-AVX2-NEXT: .LBB2_2: # %if.end<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: cmpl $14, %edx<br>
+; CHECK-AVX2-NEXT: jl .LBB2_4<br>
+; CHECK-AVX2-NEXT: # %bb.3: # %if.then2<br>
+; CHECK-AVX2-NEXT: movl %edx, 12(%eax)<br>
+; CHECK-AVX2-NEXT: .LBB2_4: # %if.end3<br>
+; CHECK-AVX2-NEXT: movups (%edi), %xmm0<br>
+; CHECK-AVX2-NEXT: movups %xmm0, (%esi)<br>
+; CHECK-AVX2-NEXT: movl (%eax), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, (%ecx)<br>
+; CHECK-AVX2-NEXT: movl 4(%eax), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 4(%ecx)<br>
+; CHECK-AVX2-NEXT: movl 8(%eax), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 8(%ecx)<br>
+; CHECK-AVX2-NEXT: movl 12(%eax), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 12(%ecx)<br>
+; CHECK-AVX2-NEXT: popl %esi<br>
+; CHECK-AVX2-NEXT: popl %edi<br>
+; CHECK-AVX2-NEXT: retl<br>
+;<br>
+; CHECK-AVX512-LABEL: test_nondirect_br:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: pushl %edi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX512-NEXT: pushl %esi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX512-NEXT: .cfi_offset %esi, -12<br>
+; CHECK-AVX512-NEXT: .cfi_offset %edi, -8<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: cmpl $18, %ecx<br>
+; CHECK-AVX512-NEXT: jl .LBB2_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl %ecx, 4(%eax)<br>
+; CHECK-AVX512-NEXT: .LBB2_2: # %if.end<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: cmpl $14, %edx<br>
+; CHECK-AVX512-NEXT: jl .LBB2_4<br>
+; CHECK-AVX512-NEXT: # %bb.3: # %if.then2<br>
+; CHECK-AVX512-NEXT: movl %edx, 12(%eax)<br>
+; CHECK-AVX512-NEXT: .LBB2_4: # %if.end3<br>
+; CHECK-AVX512-NEXT: vmovups (%edi), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%esi)<br>
+; CHECK-AVX512-NEXT: movl (%eax), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, (%ecx)<br>
+; CHECK-AVX512-NEXT: movl 4(%eax), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 4(%ecx)<br>
+; CHECK-AVX512-NEXT: movl 8(%eax), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 8(%ecx)<br>
+; CHECK-AVX512-NEXT: movl 12(%eax), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 12(%ecx)<br>
+; CHECK-AVX512-NEXT: popl %esi<br>
+; CHECK-AVX512-NEXT: popl %edi<br>
+; CHECK-AVX512-NEXT: retl<br>
+entry:<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %b = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 1<br>
+ store i32 %x, i32* %b, align 4<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %cmp1 = icmp sgt i32 %x2, 13<br>
+ br i1 %cmp1, label %if.then2, label %if.end3<br>
+<br>
+if.then2: ; preds = %if.end<br>
+ %d = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 3<br>
+ store i32 %x2, i32* %d, align 4<br>
+ br label %if.end3<br>
+<br>
+if.end3: ; preds = %if.then2, %if.end<br>
+ %0 = bitcast %struct.S* %s3 to i8*<br>
+ %1 = bitcast %struct.S* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 4, i1 false)<br>
+ %2 = bitcast %struct.S* %s2 to i8*<br>
+ %3 = bitcast %struct.S* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 16, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_2preds_block(%struct.S* nocapture %s1, %struct.S* nocapture %s2, i32 %x, %struct.S* nocapture %s3, %struct.S* nocapture readonly %s4, i32 %x2) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_2preds_block:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: pushl %ebx<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-NEXT: pushl %edi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-NEXT: pushl %esi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-NEXT: .cfi_offset %esi, -16<br>
+; CHECK-NEXT: .cfi_offset %edi, -12<br>
+; CHECK-NEXT: .cfi_offset %ebx, -8<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ebx<br>
+; CHECK-NEXT: movl %ebx, 12(%ecx)<br>
+; CHECK-NEXT: cmpl $18, %edi<br>
+; CHECK-NEXT: jl .LBB3_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl %edi, 4(%ecx)<br>
+; CHECK-NEXT: .LBB3_2: # %if.end<br>
+; CHECK-NEXT: movups (%esi), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%edx)<br>
+; CHECK-NEXT: movl (%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, (%eax)<br>
+; CHECK-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-NEXT: movl 12(%ecx), %ecx<br>
+; CHECK-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-NEXT: popl %esi<br>
+; CHECK-NEXT: popl %edi<br>
+; CHECK-NEXT: popl %ebx<br>
+; CHECK-NEXT: retl<br>
+;<br>
+; DISABLED-LABEL: test_2preds_block:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: pushl %ebx<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 8<br>
+; DISABLED-NEXT: pushl %edi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 12<br>
+; DISABLED-NEXT: pushl %esi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 16<br>
+; DISABLED-NEXT: .cfi_offset %esi, -16<br>
+; DISABLED-NEXT: .cfi_offset %edi, -12<br>
+; DISABLED-NEXT: .cfi_offset %ebx, -8<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ebx<br>
+; DISABLED-NEXT: movl %ebx, 12(%esi)<br>
+; DISABLED-NEXT: cmpl $18, %edi<br>
+; DISABLED-NEXT: jl .LBB3_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl %edi, 4(%esi)<br>
+; DISABLED-NEXT: .LBB3_2: # %if.end<br>
+; DISABLED-NEXT: movups (%edx), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%ecx)<br>
+; DISABLED-NEXT: movups (%esi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%eax)<br>
+; DISABLED-NEXT: popl %esi<br>
+; DISABLED-NEXT: popl %edi<br>
+; DISABLED-NEXT: popl %ebx<br>
+; DISABLED-NEXT: retl<br>
+;<br>
+; CHECK-AVX2-LABEL: test_2preds_block:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: pushl %ebx<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX2-NEXT: pushl %edi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX2-NEXT: pushl %esi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-AVX2-NEXT: .cfi_offset %esi, -16<br>
+; CHECK-AVX2-NEXT: .cfi_offset %edi, -12<br>
+; CHECK-AVX2-NEXT: .cfi_offset %ebx, -8<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ebx<br>
+; CHECK-AVX2-NEXT: movl %ebx, 12(%ecx)<br>
+; CHECK-AVX2-NEXT: cmpl $18, %edi<br>
+; CHECK-AVX2-NEXT: jl .LBB3_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl %edi, 4(%ecx)<br>
+; CHECK-AVX2-NEXT: .LBB3_2: # %if.end<br>
+; CHECK-AVX2-NEXT: movups (%esi), %xmm0<br>
+; CHECK-AVX2-NEXT: movups %xmm0, (%edx)<br>
+; CHECK-AVX2-NEXT: movl (%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, (%eax)<br>
+; CHECK-AVX2-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-AVX2-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-AVX2-NEXT: movl 12(%ecx), %ecx<br>
+; CHECK-AVX2-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-AVX2-NEXT: popl %esi<br>
+; CHECK-AVX2-NEXT: popl %edi<br>
+; CHECK-AVX2-NEXT: popl %ebx<br>
+; CHECK-AVX2-NEXT: retl<br>
+;<br>
+; CHECK-AVX512-LABEL: test_2preds_block:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: pushl %ebx<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX512-NEXT: pushl %edi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX512-NEXT: pushl %esi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-AVX512-NEXT: .cfi_offset %esi, -16<br>
+; CHECK-AVX512-NEXT: .cfi_offset %edi, -12<br>
+; CHECK-AVX512-NEXT: .cfi_offset %ebx, -8<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ebx<br>
+; CHECK-AVX512-NEXT: movl %ebx, 12(%ecx)<br>
+; CHECK-AVX512-NEXT: cmpl $18, %edi<br>
+; CHECK-AVX512-NEXT: jl .LBB3_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl %edi, 4(%ecx)<br>
+; CHECK-AVX512-NEXT: .LBB3_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%esi), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%edx)<br>
+; CHECK-AVX512-NEXT: movl (%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, (%eax)<br>
+; CHECK-AVX512-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-AVX512-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-AVX512-NEXT: movl 12(%ecx), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-AVX512-NEXT: popl %esi<br>
+; CHECK-AVX512-NEXT: popl %edi<br>
+; CHECK-AVX512-NEXT: popl %ebx<br>
+; CHECK-AVX512-NEXT: retl<br>
+entry:<br>
+ %d = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 3<br>
+ store i32 %x2, i32* %d, align 4<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %b = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 1<br>
+ store i32 %x, i32* %b, align 4<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S* %s3 to i8*<br>
+ %1 = bitcast %struct.S* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 4, i1 false)<br>
+ %2 = bitcast %struct.S* %s2 to i8*<br>
+ %3 = bitcast %struct.S* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 16, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+%struct.S2 = type { i64, i64 }<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_type64(%struct.S2* nocapture %s1, %struct.S2* nocapture %s2, i32 %x, %struct.S2* nocapture %s3, %struct.S2* nocapture readonly %s4) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_type64:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: pushl %edi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-NEXT: pushl %esi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-NEXT: .cfi_offset %esi, -12<br>
+; CHECK-NEXT: .cfi_offset %edi, -8<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: cmpl $18, %edi<br>
+; CHECK-NEXT: jl .LBB4_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl %edi, 8(%ecx)<br>
+; CHECK-NEXT: sarl $31, %edi<br>
+; CHECK-NEXT: movl %edi, 12(%ecx)<br>
+; CHECK-NEXT: .LBB4_2: # %if.end<br>
+; CHECK-NEXT: movups (%esi), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%edx)<br>
+; CHECK-NEXT: movl (%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, (%eax)<br>
+; CHECK-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-NEXT: movl 12(%ecx), %ecx<br>
+; CHECK-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-NEXT: popl %esi<br>
+; CHECK-NEXT: popl %edi<br>
+; CHECK-NEXT: retl<br>
+;<br>
+; DISABLED-LABEL: test_type64:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: pushl %edi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 8<br>
+; DISABLED-NEXT: pushl %esi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 12<br>
+; DISABLED-NEXT: .cfi_offset %esi, -12<br>
+; DISABLED-NEXT: .cfi_offset %edi, -8<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; DISABLED-NEXT: cmpl $18, %edi<br>
+; DISABLED-NEXT: jl .LBB4_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl %edi, 8(%esi)<br>
+; DISABLED-NEXT: sarl $31, %edi<br>
+; DISABLED-NEXT: movl %edi, 12(%esi)<br>
+; DISABLED-NEXT: .LBB4_2: # %if.end<br>
+; DISABLED-NEXT: movups (%edx), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%ecx)<br>
+; DISABLED-NEXT: movups (%esi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%eax)<br>
+; DISABLED-NEXT: popl %esi<br>
+; DISABLED-NEXT: popl %edi<br>
+; DISABLED-NEXT: retl<br>
+;<br>
+; CHECK-AVX2-LABEL: test_type64:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: pushl %edi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX2-NEXT: pushl %esi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX2-NEXT: .cfi_offset %esi, -12<br>
+; CHECK-AVX2-NEXT: .cfi_offset %edi, -8<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: cmpl $18, %edi<br>
+; CHECK-AVX2-NEXT: jl .LBB4_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl %edi, 8(%ecx)<br>
+; CHECK-AVX2-NEXT: sarl $31, %edi<br>
+; CHECK-AVX2-NEXT: movl %edi, 12(%ecx)<br>
+; CHECK-AVX2-NEXT: .LBB4_2: # %if.end<br>
+; CHECK-AVX2-NEXT: movups (%esi), %xmm0<br>
+; CHECK-AVX2-NEXT: movups %xmm0, (%edx)<br>
+; CHECK-AVX2-NEXT: movl (%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, (%eax)<br>
+; CHECK-AVX2-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-AVX2-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-AVX2-NEXT: movl 12(%ecx), %ecx<br>
+; CHECK-AVX2-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-AVX2-NEXT: popl %esi<br>
+; CHECK-AVX2-NEXT: popl %edi<br>
+; CHECK-AVX2-NEXT: retl<br>
+;<br>
+; CHECK-AVX512-LABEL: test_type64:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: pushl %edi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX512-NEXT: pushl %esi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX512-NEXT: .cfi_offset %esi, -12<br>
+; CHECK-AVX512-NEXT: .cfi_offset %edi, -8<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: cmpl $18, %edi<br>
+; CHECK-AVX512-NEXT: jl .LBB4_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl %edi, 8(%ecx)<br>
+; CHECK-AVX512-NEXT: sarl $31, %edi<br>
+; CHECK-AVX512-NEXT: movl %edi, 12(%ecx)<br>
+; CHECK-AVX512-NEXT: .LBB4_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%esi), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%edx)<br>
+; CHECK-AVX512-NEXT: movl (%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, (%eax)<br>
+; CHECK-AVX512-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-AVX512-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-AVX512-NEXT: movl 12(%ecx), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-AVX512-NEXT: popl %esi<br>
+; CHECK-AVX512-NEXT: popl %edi<br>
+; CHECK-AVX512-NEXT: retl<br>
+entry:<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %conv = sext i32 %x to i64<br>
+ %b = getelementptr inbounds %struct.S2, %struct.S2* %s1, i64 0, i32 1<br>
+ store i64 %conv, i64* %b, align 8<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S2* %s3 to i8*<br>
+ %1 = bitcast %struct.S2* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8, i1 false)<br>
+ %2 = bitcast %struct.S2* %s2 to i8*<br>
+ %3 = bitcast %struct.S2* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 16, i32 8, i1 false)<br>
+ ret void<br>
+}<br>
+%struct.S3 = type { i64, i8, i8, i16, i32 }<br>
+<br>
+; Function Attrs: noinline nounwind uwtable<br>
+define void @test_mixed_type(%struct.S3* nocapture %s1, %struct.S3* nocapture %s2, i32 %x, %struct.S3* nocapture readnone %s3, %struct.S3* nocapture readnone %s4) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_mixed_type:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: pushl %esi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-NEXT: .cfi_offset %esi, -8<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: cmpl $18, %edx<br>
+; CHECK-NEXT: jl .LBB5_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl %edx, %esi<br>
+; CHECK-NEXT: sarl $31, %esi<br>
+; CHECK-NEXT: movl %edx, (%ecx)<br>
+; CHECK-NEXT: movl %esi, 4(%ecx)<br>
+; CHECK-NEXT: movb %dl, 8(%ecx)<br>
+; CHECK-NEXT: .LBB5_2: # %if.end<br>
+; CHECK-NEXT: movl (%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, (%eax)<br>
+; CHECK-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-NEXT: movb 8(%ecx), %dl<br>
+; CHECK-NEXT: movb %dl, 8(%eax)<br>
+; CHECK-NEXT: movl 9(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 9(%eax)<br>
+; CHECK-NEXT: movzwl 13(%ecx), %edx<br>
+; CHECK-NEXT: movw %dx, 13(%eax)<br>
+; CHECK-NEXT: movb 15(%ecx), %cl<br>
+; CHECK-NEXT: movb %cl, 15(%eax)<br>
+; CHECK-NEXT: popl %esi<br>
+; CHECK-NEXT: retl<br>
+;<br>
+; DISABLED-LABEL: test_mixed_type:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: pushl %esi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 8<br>
+; DISABLED-NEXT: .cfi_offset %esi, -8<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; DISABLED-NEXT: cmpl $18, %edx<br>
+; DISABLED-NEXT: jl .LBB5_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl %edx, %esi<br>
+; DISABLED-NEXT: sarl $31, %esi<br>
+; DISABLED-NEXT: movl %edx, (%ecx)<br>
+; DISABLED-NEXT: movl %esi, 4(%ecx)<br>
+; DISABLED-NEXT: movb %dl, 8(%ecx)<br>
+; DISABLED-NEXT: .LBB5_2: # %if.end<br>
+; DISABLED-NEXT: movups (%ecx), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%eax)<br>
+; DISABLED-NEXT: popl %esi<br>
+; DISABLED-NEXT: retl<br>
+;<br>
+; CHECK-AVX2-LABEL: test_mixed_type:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: pushl %esi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX2-NEXT: .cfi_offset %esi, -8<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX2-NEXT: jl .LBB5_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl %edx, %esi<br>
+; CHECK-AVX2-NEXT: sarl $31, %esi<br>
+; CHECK-AVX2-NEXT: movl %edx, (%ecx)<br>
+; CHECK-AVX2-NEXT: movl %esi, 4(%ecx)<br>
+; CHECK-AVX2-NEXT: movb %dl, 8(%ecx)<br>
+; CHECK-AVX2-NEXT: .LBB5_2: # %if.end<br>
+; CHECK-AVX2-NEXT: movl (%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, (%eax)<br>
+; CHECK-AVX2-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-AVX2-NEXT: movb 8(%ecx), %dl<br>
+; CHECK-AVX2-NEXT: movb %dl, 8(%eax)<br>
+; CHECK-AVX2-NEXT: movl 9(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 9(%eax)<br>
+; CHECK-AVX2-NEXT: movzwl 13(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movw %dx, 13(%eax)<br>
+; CHECK-AVX2-NEXT: movb 15(%ecx), %cl<br>
+; CHECK-AVX2-NEXT: movb %cl, 15(%eax)<br>
+; CHECK-AVX2-NEXT: popl %esi<br>
+; CHECK-AVX2-NEXT: retl<br>
+;<br>
+; CHECK-AVX512-LABEL: test_mixed_type:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: pushl %esi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX512-NEXT: .cfi_offset %esi, -8<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX512-NEXT: jl .LBB5_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl %edx, %esi<br>
+; CHECK-AVX512-NEXT: sarl $31, %esi<br>
+; CHECK-AVX512-NEXT: movl %edx, (%ecx)<br>
+; CHECK-AVX512-NEXT: movl %esi, 4(%ecx)<br>
+; CHECK-AVX512-NEXT: movb %dl, 8(%ecx)<br>
+; CHECK-AVX512-NEXT: .LBB5_2: # %if.end<br>
+; CHECK-AVX512-NEXT: movl (%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, (%eax)<br>
+; CHECK-AVX512-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-AVX512-NEXT: movb 8(%ecx), %dl<br>
+; CHECK-AVX512-NEXT: movb %dl, 8(%eax)<br>
+; CHECK-AVX512-NEXT: movl 9(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 9(%eax)<br>
+; CHECK-AVX512-NEXT: movzwl 13(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movw %dx, 13(%eax)<br>
+; CHECK-AVX512-NEXT: movb 15(%ecx), %cl<br>
+; CHECK-AVX512-NEXT: movb %cl, 15(%eax)<br>
+; CHECK-AVX512-NEXT: popl %esi<br>
+; CHECK-AVX512-NEXT: retl<br>
+entry:<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %conv = sext i32 %x to i64<br>
+ %a = getelementptr inbounds %struct.S3, %struct.S3* %s1, i64 0, i32 0<br>
+ store i64 %conv, i64* %a, align 8<br>
+ %conv1 = trunc i32 %x to i8<br>
+ %b = getelementptr inbounds %struct.S3, %struct.S3* %s1, i64 0, i32 1<br>
+ store i8 %conv1, i8* %b, align 8<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S3* %s2 to i8*<br>
+ %1 = bitcast %struct.S3* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8, i1 false)<br>
+ ret void<br>
+}<br>
+%struct.S4 = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_multiple_blocks(%struct.<wbr>S4* nocapture %s1, %struct.S4* nocapture %s2) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_multiple_blocks:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: movl $0, 4(%ecx)<br>
+; CHECK-NEXT: movl $0, 36(%ecx)<br>
+; CHECK-NEXT: movups 16(%ecx), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, 16(%eax)<br>
+; CHECK-NEXT: movl 32(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 32(%eax)<br>
+; CHECK-NEXT: movl 36(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 36(%eax)<br>
+; CHECK-NEXT: movl 40(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 40(%eax)<br>
+; CHECK-NEXT: movl 44(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 44(%eax)<br>
+; CHECK-NEXT: movl (%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, (%eax)<br>
+; CHECK-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-NEXT: movl 12(%ecx), %ecx<br>
+; CHECK-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-NEXT: retl<br>
+;<br>
+; DISABLED-LABEL: test_multiple_blocks:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; DISABLED-NEXT: movl $0, 4(%ecx)<br>
+; DISABLED-NEXT: movl $0, 36(%ecx)<br>
+; DISABLED-NEXT: movups 16(%ecx), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, 16(%eax)<br>
+; DISABLED-NEXT: movups 32(%ecx), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, 32(%eax)<br>
+; DISABLED-NEXT: movups (%ecx), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%eax)<br>
+; DISABLED-NEXT: retl<br>
+;<br>
+; CHECK-AVX2-LABEL: test_multiple_blocks:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: movl $0, 4(%ecx)<br>
+; CHECK-AVX2-NEXT: movl $0, 36(%ecx)<br>
+; CHECK-AVX2-NEXT: movups 16(%ecx), %xmm0<br>
+; CHECK-AVX2-NEXT: movups %xmm0, 16(%eax)<br>
+; CHECK-AVX2-NEXT: movl 32(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 32(%eax)<br>
+; CHECK-AVX2-NEXT: movl 36(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 36(%eax)<br>
+; CHECK-AVX2-NEXT: movl 40(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 40(%eax)<br>
+; CHECK-AVX2-NEXT: movl 44(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 44(%eax)<br>
+; CHECK-AVX2-NEXT: movl (%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, (%eax)<br>
+; CHECK-AVX2-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-AVX2-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-AVX2-NEXT: movl 12(%ecx), %ecx<br>
+; CHECK-AVX2-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-AVX2-NEXT: retl<br>
+;<br>
+; CHECK-AVX512-LABEL: test_multiple_blocks:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: movl $0, 4(%ecx)<br>
+; CHECK-AVX512-NEXT: movl $0, 36(%ecx)<br>
+; CHECK-AVX512-NEXT: vmovups 16(%ecx), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, 16(%eax)<br>
+; CHECK-AVX512-NEXT: movl 32(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 32(%eax)<br>
+; CHECK-AVX512-NEXT: movl 36(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 36(%eax)<br>
+; CHECK-AVX512-NEXT: movl 40(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 40(%eax)<br>
+; CHECK-AVX512-NEXT: movl 44(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 44(%eax)<br>
+; CHECK-AVX512-NEXT: movl (%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, (%eax)<br>
+; CHECK-AVX512-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-AVX512-NEXT: vmovups 8(%ecx), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, 8(%eax)<br>
+; CHECK-AVX512-NEXT: movl 24(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 24(%eax)<br>
+; CHECK-AVX512-NEXT: movl 28(%ecx), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, 28(%eax)<br>
+; CHECK-AVX512-NEXT: retl<br>
+entry:<br>
+ %b = getelementptr inbounds %struct.S4, %struct.S4* %s1, i64 0, i32 1<br>
+ store i32 0, i32* %b, align 4<br>
+ %b3 = getelementptr inbounds %struct.S4, %struct.S4* %s1, i64 0, i32 9<br>
+ store i32 0, i32* %b3, align 4<br>
+ %0 = bitcast %struct.S4* %s2 to i8*<br>
+ %1 = bitcast %struct.S4* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 48, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+%struct.S5 = type { i16, i16, i16, i16, i16, i16, i16, i16 }<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_type16(%struct.S5* nocapture %s1, %struct.S5* nocapture %s2, i32 %x, %struct.S5* nocapture %s3, %struct.S5* nocapture readonly %s4) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_type16:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: pushl %edi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-NEXT: pushl %esi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-NEXT: .cfi_offset %esi, -12<br>
+; CHECK-NEXT: .cfi_offset %edi, -8<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: cmpl $18, %edi<br>
+; CHECK-NEXT: jl .LBB7_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movw %di, 2(%ecx)<br>
+; CHECK-NEXT: .LBB7_2: # %if.end<br>
+; CHECK-NEXT: movups (%esi), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%edx)<br>
+; CHECK-NEXT: movzwl (%ecx), %edx<br>
+; CHECK-NEXT: movw %dx, (%eax)<br>
+; CHECK-NEXT: movzwl 2(%ecx), %edx<br>
+; CHECK-NEXT: movw %dx, 2(%eax)<br>
+; CHECK-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-NEXT: movl 12(%ecx), %ecx<br>
+; CHECK-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-NEXT: popl %esi<br>
+; CHECK-NEXT: popl %edi<br>
+; CHECK-NEXT: retl<br>
+;<br>
+; DISABLED-LABEL: test_type16:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: pushl %edi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 8<br>
+; DISABLED-NEXT: pushl %esi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 12<br>
+; DISABLED-NEXT: .cfi_offset %esi, -12<br>
+; DISABLED-NEXT: .cfi_offset %edi, -8<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; DISABLED-NEXT: cmpl $18, %edi<br>
+; DISABLED-NEXT: jl .LBB7_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movw %di, 2(%esi)<br>
+; DISABLED-NEXT: .LBB7_2: # %if.end<br>
+; DISABLED-NEXT: movups (%edx), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%ecx)<br>
+; DISABLED-NEXT: movups (%esi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%eax)<br>
+; DISABLED-NEXT: popl %esi<br>
+; DISABLED-NEXT: popl %edi<br>
+; DISABLED-NEXT: retl<br>
+;<br>
+; CHECK-AVX2-LABEL: test_type16:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: pushl %edi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX2-NEXT: pushl %esi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX2-NEXT: .cfi_offset %esi, -12<br>
+; CHECK-AVX2-NEXT: .cfi_offset %edi, -8<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: cmpl $18, %edi<br>
+; CHECK-AVX2-NEXT: jl .LBB7_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movw %di, 2(%ecx)<br>
+; CHECK-AVX2-NEXT: .LBB7_2: # %if.end<br>
+; CHECK-AVX2-NEXT: movups (%esi), %xmm0<br>
+; CHECK-AVX2-NEXT: movups %xmm0, (%edx)<br>
+; CHECK-AVX2-NEXT: movzwl (%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movw %dx, (%eax)<br>
+; CHECK-AVX2-NEXT: movzwl 2(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movw %dx, 2(%eax)<br>
+; CHECK-AVX2-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-AVX2-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-AVX2-NEXT: movl 12(%ecx), %ecx<br>
+; CHECK-AVX2-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-AVX2-NEXT: popl %esi<br>
+; CHECK-AVX2-NEXT: popl %edi<br>
+; CHECK-AVX2-NEXT: retl<br>
+;<br>
+; CHECK-AVX512-LABEL: test_type16:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: pushl %edi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX512-NEXT: pushl %esi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX512-NEXT: .cfi_offset %esi, -12<br>
+; CHECK-AVX512-NEXT: .cfi_offset %edi, -8<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: cmpl $18, %edi<br>
+; CHECK-AVX512-NEXT: jl .LBB7_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movw %di, 2(%ecx)<br>
+; CHECK-AVX512-NEXT: .LBB7_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%esi), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%edx)<br>
+; CHECK-AVX512-NEXT: movzwl (%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movw %dx, (%eax)<br>
+; CHECK-AVX512-NEXT: movzwl 2(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movw %dx, 2(%eax)<br>
+; CHECK-AVX512-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-AVX512-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-AVX512-NEXT: movl 12(%ecx), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, 12(%eax)<br>
+; CHECK-AVX512-NEXT: popl %esi<br>
+; CHECK-AVX512-NEXT: popl %edi<br>
+; CHECK-AVX512-NEXT: retl<br>
+entry:<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %conv = trunc i32 %x to i16<br>
+ %b = getelementptr inbounds %struct.S5, %struct.S5* %s1, i64 0, i32 1<br>
+ store i16 %conv, i16* %b, align 2<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S5* %s3 to i8*<br>
+ %1 = bitcast %struct.S5* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 2, i1 false)<br>
+ %2 = bitcast %struct.S5* %s2 to i8*<br>
+ %3 = bitcast %struct.S5* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 16, i32 2, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+%struct.S6 = type { [4 x i32], i32, i32, i32, i32 }<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_stack(%struct.S6* noalias nocapture sret %agg.result, %struct.S6* byval nocapture readnone align 8 %s1, %struct.S6* byval nocapture align 8 %s2, i32 %x) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_stack:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: pushl %eax<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: movl %eax, {{[0-9]+}}(%esp)<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: movups {{[0-9]+}}(%esp), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%eax)<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: movl %ecx, 16(%eax)<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: movl %ecx, 20(%eax)<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: movl %ecx, 24(%eax)<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: movl %ecx, 28(%eax)<br>
+; CHECK-NEXT: popl %ecx<br>
+; CHECK-NEXT: retl $4<br>
+;<br>
+; DISABLED-LABEL: test_stack:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: pushl %eax<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 8<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: movl %eax, {{[0-9]+}}(%esp)<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: movups {{[0-9]+}}(%esp), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%eax)<br>
+; DISABLED-NEXT: movups {{[0-9]+}}(%esp), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, 16(%eax)<br>
+; DISABLED-NEXT: popl %ecx<br>
+; DISABLED-NEXT: retl $4<br>
+;<br>
+; CHECK-AVX2-LABEL: test_stack:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: pushl %eax<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, {{[0-9]+}}(%esp)<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: movups {{[0-9]+}}(%esp), %xmm0<br>
+; CHECK-AVX2-NEXT: movups %xmm0, (%eax)<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: movl %ecx, 16(%eax)<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: movl %ecx, 20(%eax)<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: movl %ecx, 24(%eax)<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: movl %ecx, 28(%eax)<br>
+; CHECK-AVX2-NEXT: popl %ecx<br>
+; CHECK-AVX2-NEXT: retl $4<br>
+;<br>
+; CHECK-AVX512-LABEL: test_stack:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: pushl %eax<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, {{[0-9]+}}(%esp)<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, 16(%eax)<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, 20(%eax)<br>
+; CHECK-AVX512-NEXT: vmovups {{[0-9]+}}(%esp), %xmm0<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, 24(%eax)<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%eax)<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, 28(%eax)<br>
+; CHECK-AVX512-NEXT: popl %ecx<br>
+; CHECK-AVX512-NEXT: retl $4<br>
+entry:<br>
+ %s6.sroa.0.0..sroa_cast1 = bitcast %struct.S6* %s2 to i8*<br>
+ %s6.sroa.3.0..sroa_idx4 = getelementptr inbounds %struct.S6, %struct.S6* %s2, i64 0, i32 3<br>
+ store i32 %x, i32* %s6.sroa.3.0..sroa_idx4, align 8<br>
+ %0 = bitcast %struct.S6* %agg.result to i8*<br>
+ call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* nonnull %s6.sroa.0.0..sroa_cast1, i64 32, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_limit_all(%struct.S* %s1, %struct.S* nocapture %s2, i32 %x, %struct.S* nocapture %s3, %struct.S* nocapture readonly %s4, i32 %x2) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_limit_all:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: pushl %ebp<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-NEXT: pushl %ebx<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-NEXT: pushl %edi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-NEXT: pushl %esi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 20<br>
+; CHECK-NEXT: subl $12, %esp<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 32<br>
+; CHECK-NEXT: .cfi_offset %esi, -20<br>
+; CHECK-NEXT: .cfi_offset %edi, -16<br>
+; CHECK-NEXT: .cfi_offset %ebx, -12<br>
+; CHECK-NEXT: .cfi_offset %ebp, -8<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ebx<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ebp<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: movl %eax, 12(%ebp)<br>
+; CHECK-NEXT: movl %ebp, (%esp)<br>
+; CHECK-NEXT: calll bar<br>
+; CHECK-NEXT: cmpl $18, %esi<br>
+; CHECK-NEXT: jl .LBB9_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl %esi, 4(%ebp)<br>
+; CHECK-NEXT: movl %ebp, (%esp)<br>
+; CHECK-NEXT: calll bar<br>
+; CHECK-NEXT: .LBB9_2: # %if.end<br>
+; CHECK-NEXT: movups (%ebx), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%edi)<br>
+; CHECK-NEXT: movups (%ebp), %xmm0<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: movups %xmm0, (%eax)<br>
+; CHECK-NEXT: addl $12, %esp<br>
+; CHECK-NEXT: popl %esi<br>
+; CHECK-NEXT: popl %edi<br>
+; CHECK-NEXT: popl %ebx<br>
+; CHECK-NEXT: popl %ebp<br>
+; CHECK-NEXT: retl<br>
+;<br>
+; DISABLED-LABEL: test_limit_all:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: pushl %ebp<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 8<br>
+; DISABLED-NEXT: pushl %ebx<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 12<br>
+; DISABLED-NEXT: pushl %edi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 16<br>
+; DISABLED-NEXT: pushl %esi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 20<br>
+; DISABLED-NEXT: subl $12, %esp<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 32<br>
+; DISABLED-NEXT: .cfi_offset %esi, -20<br>
+; DISABLED-NEXT: .cfi_offset %edi, -16<br>
+; DISABLED-NEXT: .cfi_offset %ebx, -12<br>
+; DISABLED-NEXT: .cfi_offset %ebp, -8<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ebx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ebp<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: movl %eax, 12(%ebp)<br>
+; DISABLED-NEXT: movl %ebp, (%esp)<br>
+; DISABLED-NEXT: calll bar<br>
+; DISABLED-NEXT: cmpl $18, %esi<br>
+; DISABLED-NEXT: jl .LBB9_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl %esi, 4(%ebp)<br>
+; DISABLED-NEXT: movl %ebp, (%esp)<br>
+; DISABLED-NEXT: calll bar<br>
+; DISABLED-NEXT: .LBB9_2: # %if.end<br>
+; DISABLED-NEXT: movups (%ebx), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%edi)<br>
+; DISABLED-NEXT: movups (%ebp), %xmm0<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: movups %xmm0, (%eax)<br>
+; DISABLED-NEXT: addl $12, %esp<br>
+; DISABLED-NEXT: popl %esi<br>
+; DISABLED-NEXT: popl %edi<br>
+; DISABLED-NEXT: popl %ebx<br>
+; DISABLED-NEXT: popl %ebp<br>
+; DISABLED-NEXT: retl<br>
+;<br>
+; CHECK-AVX2-LABEL: test_limit_all:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: pushl %ebp<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX2-NEXT: pushl %ebx<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX2-NEXT: pushl %edi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-AVX2-NEXT: pushl %esi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 20<br>
+; CHECK-AVX2-NEXT: subl $12, %esp<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 32<br>
+; CHECK-AVX2-NEXT: .cfi_offset %esi, -20<br>
+; CHECK-AVX2-NEXT: .cfi_offset %edi, -16<br>
+; CHECK-AVX2-NEXT: .cfi_offset %ebx, -12<br>
+; CHECK-AVX2-NEXT: .cfi_offset %ebp, -8<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ebx<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ebp<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 12(%ebp)<br>
+; CHECK-AVX2-NEXT: movl %ebp, (%esp)<br>
+; CHECK-AVX2-NEXT: calll bar<br>
+; CHECK-AVX2-NEXT: cmpl $18, %esi<br>
+; CHECK-AVX2-NEXT: jl .LBB9_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl %esi, 4(%ebp)<br>
+; CHECK-AVX2-NEXT: movl %ebp, (%esp)<br>
+; CHECK-AVX2-NEXT: calll bar<br>
+; CHECK-AVX2-NEXT: .LBB9_2: # %if.end<br>
+; CHECK-AVX2-NEXT: movups (%ebx), %xmm0<br>
+; CHECK-AVX2-NEXT: movups %xmm0, (%edi)<br>
+; CHECK-AVX2-NEXT: movups (%ebp), %xmm0<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: movups %xmm0, (%eax)<br>
+; CHECK-AVX2-NEXT: addl $12, %esp<br>
+; CHECK-AVX2-NEXT: popl %esi<br>
+; CHECK-AVX2-NEXT: popl %edi<br>
+; CHECK-AVX2-NEXT: popl %ebx<br>
+; CHECK-AVX2-NEXT: popl %ebp<br>
+; CHECK-AVX2-NEXT: retl<br>
+;<br>
+; CHECK-AVX512-LABEL: test_limit_all:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: pushl %ebp<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX512-NEXT: pushl %ebx<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX512-NEXT: pushl %edi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-AVX512-NEXT: pushl %esi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 20<br>
+; CHECK-AVX512-NEXT: subl $12, %esp<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 32<br>
+; CHECK-AVX512-NEXT: .cfi_offset %esi, -20<br>
+; CHECK-AVX512-NEXT: .cfi_offset %edi, -16<br>
+; CHECK-AVX512-NEXT: .cfi_offset %ebx, -12<br>
+; CHECK-AVX512-NEXT: .cfi_offset %ebp, -8<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ebx<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ebp<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 12(%ebp)<br>
+; CHECK-AVX512-NEXT: movl %ebp, (%esp)<br>
+; CHECK-AVX512-NEXT: calll bar<br>
+; CHECK-AVX512-NEXT: cmpl $18, %esi<br>
+; CHECK-AVX512-NEXT: jl .LBB9_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl %esi, 4(%ebp)<br>
+; CHECK-AVX512-NEXT: movl %ebp, (%esp)<br>
+; CHECK-AVX512-NEXT: calll bar<br>
+; CHECK-AVX512-NEXT: .LBB9_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%ebx), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%edi)<br>
+; CHECK-AVX512-NEXT: vmovups (%ebp), %xmm0<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%eax)<br>
+; CHECK-AVX512-NEXT: addl $12, %esp<br>
+; CHECK-AVX512-NEXT: popl %esi<br>
+; CHECK-AVX512-NEXT: popl %edi<br>
+; CHECK-AVX512-NEXT: popl %ebx<br>
+; CHECK-AVX512-NEXT: popl %ebp<br>
+; CHECK-AVX512-NEXT: retl<br>
+entry:<br>
+ %d = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 3<br>
+ store i32 %x2, i32* %d, align 4<br>
+ tail call void @bar(%struct.S* %s1) #3<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %b = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 1<br>
+ store i32 %x, i32* %b, align 4<br>
+ tail call void @bar(%struct.S* nonnull %s1) #3<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S* %s3 to i8*<br>
+ %1 = bitcast %struct.S* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 4, i1 false)<br>
+ %2 = bitcast %struct.S* %s2 to i8*<br>
+ %3 = bitcast %struct.S* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 16, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_limit_one_pred(%struct.S<wbr>* %s1, %struct.S* nocapture %s2, i32 %x, %struct.S* nocapture %s3, %struct.S* nocapture readonly %s4, i32 %x2) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_limit_one_pred:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: pushl %ebp<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-NEXT: pushl %ebx<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-NEXT: pushl %edi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-NEXT: pushl %esi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 20<br>
+; CHECK-NEXT: subl $12, %esp<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 32<br>
+; CHECK-NEXT: .cfi_offset %esi, -20<br>
+; CHECK-NEXT: .cfi_offset %edi, -16<br>
+; CHECK-NEXT: .cfi_offset %ebx, -12<br>
+; CHECK-NEXT: .cfi_offset %ebp, -8<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ebp<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ebx<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: movl %ecx, 12(%edi)<br>
+; CHECK-NEXT: cmpl $18, %eax<br>
+; CHECK-NEXT: jl .LBB10_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl %eax, 4(%edi)<br>
+; CHECK-NEXT: movl %edi, (%esp)<br>
+; CHECK-NEXT: calll bar<br>
+; CHECK-NEXT: .LBB10_2: # %if.end<br>
+; CHECK-NEXT: movups (%ebp), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%ebx)<br>
+; CHECK-NEXT: movl (%edi), %eax<br>
+; CHECK-NEXT: movl %eax, (%esi)<br>
+; CHECK-NEXT: movl 4(%edi), %eax<br>
+; CHECK-NEXT: movl %eax, 4(%esi)<br>
+; CHECK-NEXT: movl 8(%edi), %eax<br>
+; CHECK-NEXT: movl %eax, 8(%esi)<br>
+; CHECK-NEXT: movl 12(%edi), %eax<br>
+; CHECK-NEXT: movl %eax, 12(%esi)<br>
+; CHECK-NEXT: addl $12, %esp<br>
+; CHECK-NEXT: popl %esi<br>
+; CHECK-NEXT: popl %edi<br>
+; CHECK-NEXT: popl %ebx<br>
+; CHECK-NEXT: popl %ebp<br>
+; CHECK-NEXT: retl<br>
+;<br>
+; DISABLED-LABEL: test_limit_one_pred:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: pushl %ebp<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 8<br>
+; DISABLED-NEXT: pushl %ebx<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 12<br>
+; DISABLED-NEXT: pushl %edi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 16<br>
+; DISABLED-NEXT: pushl %esi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 20<br>
+; DISABLED-NEXT: subl $12, %esp<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 32<br>
+; DISABLED-NEXT: .cfi_offset %esi, -20<br>
+; DISABLED-NEXT: .cfi_offset %edi, -16<br>
+; DISABLED-NEXT: .cfi_offset %ebx, -12<br>
+; DISABLED-NEXT: .cfi_offset %ebp, -8<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ebx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ebp<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; DISABLED-NEXT: movl %ecx, 12(%ebp)<br>
+; DISABLED-NEXT: cmpl $18, %eax<br>
+; DISABLED-NEXT: jl .LBB10_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl %eax, 4(%ebp)<br>
+; DISABLED-NEXT: movl %ebp, (%esp)<br>
+; DISABLED-NEXT: calll bar<br>
+; DISABLED-NEXT: .LBB10_2: # %if.end<br>
+; DISABLED-NEXT: movups (%ebx), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%edi)<br>
+; DISABLED-NEXT: movups (%ebp), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%esi)<br>
+; DISABLED-NEXT: addl $12, %esp<br>
+; DISABLED-NEXT: popl %esi<br>
+; DISABLED-NEXT: popl %edi<br>
+; DISABLED-NEXT: popl %ebx<br>
+; DISABLED-NEXT: popl %ebp<br>
+; DISABLED-NEXT: retl<br>
+;<br>
+; CHECK-AVX2-LABEL: test_limit_one_pred:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: pushl %ebp<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX2-NEXT: pushl %ebx<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX2-NEXT: pushl %edi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-AVX2-NEXT: pushl %esi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 20<br>
+; CHECK-AVX2-NEXT: subl $12, %esp<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 32<br>
+; CHECK-AVX2-NEXT: .cfi_offset %esi, -20<br>
+; CHECK-AVX2-NEXT: .cfi_offset %edi, -16<br>
+; CHECK-AVX2-NEXT: .cfi_offset %ebx, -12<br>
+; CHECK-AVX2-NEXT: .cfi_offset %ebp, -8<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ebp<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ebx<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: movl %ecx, 12(%edi)<br>
+; CHECK-AVX2-NEXT: cmpl $18, %eax<br>
+; CHECK-AVX2-NEXT: jl .LBB10_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl %eax, 4(%edi)<br>
+; CHECK-AVX2-NEXT: movl %edi, (%esp)<br>
+; CHECK-AVX2-NEXT: calll bar<br>
+; CHECK-AVX2-NEXT: .LBB10_2: # %if.end<br>
+; CHECK-AVX2-NEXT: movups (%ebp), %xmm0<br>
+; CHECK-AVX2-NEXT: movups %xmm0, (%ebx)<br>
+; CHECK-AVX2-NEXT: movl (%edi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, (%esi)<br>
+; CHECK-AVX2-NEXT: movl 4(%edi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 4(%esi)<br>
+; CHECK-AVX2-NEXT: movl 8(%edi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 8(%esi)<br>
+; CHECK-AVX2-NEXT: movl 12(%edi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 12(%esi)<br>
+; CHECK-AVX2-NEXT: addl $12, %esp<br>
+; CHECK-AVX2-NEXT: popl %esi<br>
+; CHECK-AVX2-NEXT: popl %edi<br>
+; CHECK-AVX2-NEXT: popl %ebx<br>
+; CHECK-AVX2-NEXT: popl %ebp<br>
+; CHECK-AVX2-NEXT: retl<br>
+;<br>
+; CHECK-AVX512-LABEL: test_limit_one_pred:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: pushl %ebp<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX512-NEXT: pushl %ebx<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX512-NEXT: pushl %edi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-AVX512-NEXT: pushl %esi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 20<br>
+; CHECK-AVX512-NEXT: subl $12, %esp<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 32<br>
+; CHECK-AVX512-NEXT: .cfi_offset %esi, -20<br>
+; CHECK-AVX512-NEXT: .cfi_offset %edi, -16<br>
+; CHECK-AVX512-NEXT: .cfi_offset %ebx, -12<br>
+; CHECK-AVX512-NEXT: .cfi_offset %ebp, -8<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ebp<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ebx<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, 12(%edi)<br>
+; CHECK-AVX512-NEXT: cmpl $18, %eax<br>
+; CHECK-AVX512-NEXT: jl .LBB10_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl %eax, 4(%edi)<br>
+; CHECK-AVX512-NEXT: movl %edi, (%esp)<br>
+; CHECK-AVX512-NEXT: calll bar<br>
+; CHECK-AVX512-NEXT: .LBB10_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%ebp), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%ebx)<br>
+; CHECK-AVX512-NEXT: movl (%edi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, (%esi)<br>
+; CHECK-AVX512-NEXT: movl 4(%edi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 4(%esi)<br>
+; CHECK-AVX512-NEXT: movl 8(%edi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 8(%esi)<br>
+; CHECK-AVX512-NEXT: movl 12(%edi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 12(%esi)<br>
+; CHECK-AVX512-NEXT: addl $12, %esp<br>
+; CHECK-AVX512-NEXT: popl %esi<br>
+; CHECK-AVX512-NEXT: popl %edi<br>
+; CHECK-AVX512-NEXT: popl %ebx<br>
+; CHECK-AVX512-NEXT: popl %ebp<br>
+; CHECK-AVX512-NEXT: retl<br>
+entry:<br>
+ %d = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 3<br>
+ store i32 %x2, i32* %d, align 4<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %b = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 1<br>
+ store i32 %x, i32* %b, align 4<br>
+ tail call void @bar(%struct.S* nonnull %s1) #3<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S* %s3 to i8*<br>
+ %1 = bitcast %struct.S* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 4, i1 false)<br>
+ %2 = bitcast %struct.S* %s2 to i8*<br>
+ %3 = bitcast %struct.S* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 16, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+<br>
+declare void @bar(%struct.S*) local_unnamed_addr #1<br>
+<br>
+<br>
+; Function Attrs: argmemonly nounwind<br>
+declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i32, i1) #1<br>
+<br>
+attributes #0 = { nounwind uwtable "target-cpu"="x86-64" }<br>
+<br>
+%struct.S7 = type { float, float, float , float, float, float, float, float }<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_conditional_block_float(<wbr>%struct.S7* nocapture %s1, %struct.S7* nocapture %s2, i32 %x, %struct.S7* nocapture %s3, %struct.S7* nocapture readonly %s4, float %y) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_conditional_block_float:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: pushl %ebx<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-NEXT: pushl %edi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-NEXT: pushl %esi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-NEXT: .cfi_offset %esi, -16<br>
+; CHECK-NEXT: .cfi_offset %edi, -12<br>
+; CHECK-NEXT: .cfi_offset %ebx, -8<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: cmpl $18, {{[0-9]+}}(%esp)<br>
+; CHECK-NEXT: jl .LBB11_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl $1065353216, 4(%ecx) # imm = 0x3F800000<br>
+; CHECK-NEXT: .LBB11_2: # %if.end<br>
+; CHECK-NEXT: movups (%esi), %xmm0<br>
+; CHECK-NEXT: movups 16(%esi), %xmm1<br>
+; CHECK-NEXT: movups %xmm1, 16(%edx)<br>
+; CHECK-NEXT: movups %xmm0, (%edx)<br>
+; CHECK-NEXT: movl (%ecx), %edx<br>
+; CHECK-NEXT: movl 4(%ecx), %esi<br>
+; CHECK-NEXT: movl 8(%ecx), %edi<br>
+; CHECK-NEXT: movl 12(%ecx), %ebx<br>
+; CHECK-NEXT: movups 16(%ecx), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, 16(%eax)<br>
+; CHECK-NEXT: movl %edx, (%eax)<br>
+; CHECK-NEXT: movl %esi, 4(%eax)<br>
+; CHECK-NEXT: movl %edi, 8(%eax)<br>
+; CHECK-NEXT: movl %ebx, 12(%eax)<br>
+; CHECK-NEXT: popl %esi<br>
+; CHECK-NEXT: popl %edi<br>
+; CHECK-NEXT: popl %ebx<br>
+; CHECK-NEXT: retl<br>
+;<br>
+; DISABLED-LABEL: test_conditional_block_float:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: pushl %esi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 8<br>
+; DISABLED-NEXT: .cfi_offset %esi, -8<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; DISABLED-NEXT: cmpl $18, {{[0-9]+}}(%esp)<br>
+; DISABLED-NEXT: jl .LBB11_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl $1065353216, 4(%ecx) # imm = 0x3F800000<br>
+; DISABLED-NEXT: .LBB11_2: # %if.end<br>
+; DISABLED-NEXT: movups (%esi), %xmm0<br>
+; DISABLED-NEXT: movups 16(%esi), %xmm1<br>
+; DISABLED-NEXT: movups %xmm1, 16(%edx)<br>
+; DISABLED-NEXT: movups %xmm0, (%edx)<br>
+; DISABLED-NEXT: movups (%ecx), %xmm0<br>
+; DISABLED-NEXT: movups 16(%ecx), %xmm1<br>
+; DISABLED-NEXT: movups %xmm1, 16(%eax)<br>
+; DISABLED-NEXT: movups %xmm0, (%eax)<br>
+; DISABLED-NEXT: popl %esi<br>
+; DISABLED-NEXT: retl<br>
+;<br>
+; CHECK-AVX2-LABEL: test_conditional_block_float:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: pushl %ebx<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX2-NEXT: pushl %edi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX2-NEXT: pushl %esi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-AVX2-NEXT: .cfi_offset %esi, -16<br>
+; CHECK-AVX2-NEXT: .cfi_offset %edi, -12<br>
+; CHECK-AVX2-NEXT: .cfi_offset %ebx, -8<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: cmpl $18, {{[0-9]+}}(%esp)<br>
+; CHECK-AVX2-NEXT: jl .LBB11_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl $1065353216, 4(%ecx) # imm = 0x3F800000<br>
+; CHECK-AVX2-NEXT: .LBB11_2: # %if.end<br>
+; CHECK-AVX2-NEXT: movups (%esi), %xmm0<br>
+; CHECK-AVX2-NEXT: movups 16(%esi), %xmm1<br>
+; CHECK-AVX2-NEXT: movups %xmm1, 16(%edx)<br>
+; CHECK-AVX2-NEXT: movups %xmm0, (%edx)<br>
+; CHECK-AVX2-NEXT: movl (%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl 4(%ecx), %esi<br>
+; CHECK-AVX2-NEXT: movl 8(%ecx), %edi<br>
+; CHECK-AVX2-NEXT: movl 12(%ecx), %ebx<br>
+; CHECK-AVX2-NEXT: movups 16(%ecx), %xmm0<br>
+; CHECK-AVX2-NEXT: movups %xmm0, 16(%eax)<br>
+; CHECK-AVX2-NEXT: movl %edx, (%eax)<br>
+; CHECK-AVX2-NEXT: movl %esi, 4(%eax)<br>
+; CHECK-AVX2-NEXT: movl %edi, 8(%eax)<br>
+; CHECK-AVX2-NEXT: movl %ebx, 12(%eax)<br>
+; CHECK-AVX2-NEXT: popl %esi<br>
+; CHECK-AVX2-NEXT: popl %edi<br>
+; CHECK-AVX2-NEXT: popl %ebx<br>
+; CHECK-AVX2-NEXT: retl<br>
+;<br>
+; CHECK-AVX512-LABEL: test_conditional_block_float:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: pushl %esi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX512-NEXT: .cfi_offset %esi, -8<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: cmpl $18, {{[0-9]+}}(%esp)<br>
+; CHECK-AVX512-NEXT: jl .LBB11_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl $1065353216, 4(%ecx) # imm = 0x3F800000<br>
+; CHECK-AVX512-NEXT: .LBB11_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%esi), %ymm0<br>
+; CHECK-AVX512-NEXT: vmovups %ymm0, (%edx)<br>
+; CHECK-AVX512-NEXT: movl (%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, (%eax)<br>
+; CHECK-AVX512-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-AVX512-NEXT: vmovups 8(%ecx), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, 8(%eax)<br>
+; CHECK-AVX512-NEXT: movl 24(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 24(%eax)<br>
+; CHECK-AVX512-NEXT: movl 28(%ecx), %ecx<br>
+; CHECK-AVX512-NEXT: movl %ecx, 28(%eax)<br>
+; CHECK-AVX512-NEXT: popl %esi<br>
+; CHECK-AVX512-NEXT: vzeroupper<br>
+; CHECK-AVX512-NEXT: retl<br>
+entry:<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %b = getelementptr inbounds %struct.S7, %struct.S7* %s1, i64 0, i32 1<br>
+ store float 1.0, float* %b, align 4<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S7* %s3 to i8*<br>
+ %1 = bitcast %struct.S7* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 32, i32 4, i1 false)<br>
+ %2 = bitcast %struct.S7* %s2 to i8*<br>
+ %3 = bitcast %struct.S7* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 32, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+%struct.S8 = type { i64, i64, i64, i64, i64, i64 }<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_conditional_block_ymm(%s<wbr>truct.S8* nocapture %s1, %struct.S8* nocapture %s2, i32 %x, %struct.S8* nocapture %s3, %struct.S8* nocapture readonly %s4) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_conditional_block_ymm:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: pushl %ebx<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-NEXT: pushl %edi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-NEXT: pushl %esi<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-NEXT: .cfi_offset %esi, -16<br>
+; CHECK-NEXT: .cfi_offset %edi, -12<br>
+; CHECK-NEXT: .cfi_offset %ebx, -8<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-NEXT: cmpl $18, {{[0-9]+}}(%esp)<br>
+; CHECK-NEXT: jl .LBB12_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl $0, 12(%ecx)<br>
+; CHECK-NEXT: movl $1, 8(%ecx)<br>
+; CHECK-NEXT: .LBB12_2: # %if.end<br>
+; CHECK-NEXT: movups (%esi), %xmm0<br>
+; CHECK-NEXT: movups 16(%esi), %xmm1<br>
+; CHECK-NEXT: movups %xmm1, 16(%edx)<br>
+; CHECK-NEXT: movups %xmm0, (%edx)<br>
+; CHECK-NEXT: movl (%ecx), %edx<br>
+; CHECK-NEXT: movl 4(%ecx), %esi<br>
+; CHECK-NEXT: movl 8(%ecx), %edi<br>
+; CHECK-NEXT: movl 12(%ecx), %ebx<br>
+; CHECK-NEXT: movups 16(%ecx), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, 16(%eax)<br>
+; CHECK-NEXT: movl %edx, (%eax)<br>
+; CHECK-NEXT: movl %esi, 4(%eax)<br>
+; CHECK-NEXT: movl %edi, 8(%eax)<br>
+; CHECK-NEXT: movl %ebx, 12(%eax)<br>
+; CHECK-NEXT: popl %esi<br>
+; CHECK-NEXT: popl %edi<br>
+; CHECK-NEXT: popl %ebx<br>
+; CHECK-NEXT: retl<br>
+;<br>
+; DISABLED-LABEL: test_conditional_block_ymm:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: pushl %esi<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 8<br>
+; DISABLED-NEXT: .cfi_offset %esi, -8<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; DISABLED-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; DISABLED-NEXT: cmpl $18, {{[0-9]+}}(%esp)<br>
+; DISABLED-NEXT: jl .LBB12_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl $0, 12(%ecx)<br>
+; DISABLED-NEXT: movl $1, 8(%ecx)<br>
+; DISABLED-NEXT: .LBB12_2: # %if.end<br>
+; DISABLED-NEXT: movups (%esi), %xmm0<br>
+; DISABLED-NEXT: movups 16(%esi), %xmm1<br>
+; DISABLED-NEXT: movups %xmm1, 16(%edx)<br>
+; DISABLED-NEXT: movups %xmm0, (%edx)<br>
+; DISABLED-NEXT: movups (%ecx), %xmm0<br>
+; DISABLED-NEXT: movups 16(%ecx), %xmm1<br>
+; DISABLED-NEXT: movups %xmm1, 16(%eax)<br>
+; DISABLED-NEXT: movups %xmm0, (%eax)<br>
+; DISABLED-NEXT: popl %esi<br>
+; DISABLED-NEXT: retl<br>
+;<br>
+; CHECK-AVX2-LABEL: test_conditional_block_ymm:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: pushl %ebx<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX2-NEXT: pushl %edi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 12<br>
+; CHECK-AVX2-NEXT: pushl %esi<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-AVX2-NEXT: .cfi_offset %esi, -16<br>
+; CHECK-AVX2-NEXT: .cfi_offset %edi, -12<br>
+; CHECK-AVX2-NEXT: .cfi_offset %ebx, -8<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX2-NEXT: cmpl $18, {{[0-9]+}}(%esp)<br>
+; CHECK-AVX2-NEXT: jl .LBB12_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl $0, 12(%ecx)<br>
+; CHECK-AVX2-NEXT: movl $1, 8(%ecx)<br>
+; CHECK-AVX2-NEXT: .LBB12_2: # %if.end<br>
+; CHECK-AVX2-NEXT: movups (%esi), %xmm0<br>
+; CHECK-AVX2-NEXT: movups 16(%esi), %xmm1<br>
+; CHECK-AVX2-NEXT: movups %xmm1, 16(%edx)<br>
+; CHECK-AVX2-NEXT: movups %xmm0, (%edx)<br>
+; CHECK-AVX2-NEXT: movl (%ecx), %edx<br>
+; CHECK-AVX2-NEXT: movl 4(%ecx), %esi<br>
+; CHECK-AVX2-NEXT: movl 8(%ecx), %edi<br>
+; CHECK-AVX2-NEXT: movl 12(%ecx), %ebx<br>
+; CHECK-AVX2-NEXT: movups 16(%ecx), %xmm0<br>
+; CHECK-AVX2-NEXT: movups %xmm0, 16(%eax)<br>
+; CHECK-AVX2-NEXT: movl %edx, (%eax)<br>
+; CHECK-AVX2-NEXT: movl %esi, 4(%eax)<br>
+; CHECK-AVX2-NEXT: movl %edi, 8(%eax)<br>
+; CHECK-AVX2-NEXT: movl %ebx, 12(%eax)<br>
+; CHECK-AVX2-NEXT: popl %esi<br>
+; CHECK-AVX2-NEXT: popl %edi<br>
+; CHECK-AVX2-NEXT: popl %ebx<br>
+; CHECK-AVX2-NEXT: retl<br>
+;<br>
+; CHECK-AVX512-LABEL: test_conditional_block_ymm:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: pushl %esi<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 8<br>
+; CHECK-AVX512-NEXT: .cfi_offset %esi, -8<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %esi<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edx<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; CHECK-AVX512-NEXT: cmpl $18, {{[0-9]+}}(%esp)<br>
+; CHECK-AVX512-NEXT: jl .LBB12_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl $0, 12(%ecx)<br>
+; CHECK-AVX512-NEXT: movl $1, 8(%ecx)<br>
+; CHECK-AVX512-NEXT: .LBB12_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%esi), %ymm0<br>
+; CHECK-AVX512-NEXT: vmovups %ymm0, (%edx)<br>
+; CHECK-AVX512-NEXT: movl (%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, (%eax)<br>
+; CHECK-AVX512-NEXT: movl 4(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 4(%eax)<br>
+; CHECK-AVX512-NEXT: movl 8(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 8(%eax)<br>
+; CHECK-AVX512-NEXT: movl 12(%ecx), %edx<br>
+; CHECK-AVX512-NEXT: movl %edx, 12(%eax)<br>
+; CHECK-AVX512-NEXT: vmovups 16(%ecx), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, 16(%eax)<br>
+; CHECK-AVX512-NEXT: popl %esi<br>
+; CHECK-AVX512-NEXT: vzeroupper<br>
+; CHECK-AVX512-NEXT: retl<br>
+entry:<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %b = getelementptr inbounds %struct.S8, %struct.S8* %s1, i64 0, i32 1<br>
+ store i64 1, i64* %b, align 4<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S8* %s3 to i8*<br>
+ %1 = bitcast %struct.S8* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 32, i32 4, i1 false)<br>
+ %2 = bitcast %struct.S8* %s2 to i8*<br>
+ %3 = bitcast %struct.S8* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 32, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
<br>
Added: llvm/trunk/test/CodeGen/X86/fi<wbr>xup-sfb.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/fixup-sfb.ll?rev=325128&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/CodeGen/<wbr>X86/fixup-sfb.ll?rev=325128&vi<wbr>ew=auto</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/fi<wbr>xup-sfb.ll (added)<br>
+++ llvm/trunk/test/CodeGen/X86/fi<wbr>xup-sfb.ll Wed Feb 14 06:58:53 2018<br>
@@ -0,0 +1,1378 @@<br>
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.p<wbr>y<br>
+; RUN: llc < %s -mtriple=x86_64-linux | FileCheck %s -check-prefix=CHECK<br>
+; RUN: llc < %s -mtriple=x86_64-linux --disable-fixup-SFB | FileCheck %s --check-prefix=DISABLED<br>
+; RUN: llc < %s -mtriple=x86_64-linux -mcpu=core-avx2 | FileCheck %s -check-prefix=CHECK-AVX2<br>
+; RUN: llc < %s -mtriple=x86_64-linux -mcpu=skx | FileCheck %s -check-prefix=CHECK-AVX512<br>
+<br>
+; RUN: llc < %s -mtriple=i686-linux<br>
+; RUN: llc < %s -mtriple=i686-linux --disable-fixup-SFB<br>
+; RUN: llc < %s -mtriple=i686-linux -mattr sse4<br>
+; RUN: llc < %s -mtriple=i686-linux -mattr avx512<br>
+<br>
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32<wbr>:64-S128"<br>
+target triple = "x86_64-unknown-linux-gnu"<br>
+<br>
+%struct.S = type { i32, i32, i32, i32 }<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_conditional_block(%struc<wbr>t.S* nocapture %s1, %struct.S* nocapture %s2, i32 %x, %struct.S* nocapture %s3, %struct.S* nocapture readonly %s4) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_conditional_block:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: cmpl $18, %edx<br>
+; CHECK-NEXT: jl .LBB0_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl %edx, 4(%rdi)<br>
+; CHECK-NEXT: .LBB0_2: # %if.end<br>
+; CHECK-NEXT: movups (%r8), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%rcx)<br>
+; CHECK-NEXT: movl (%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, (%rsi)<br>
+; CHECK-NEXT: movl 4(%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, 4(%rsi)<br>
+; CHECK-NEXT: movq 8(%rdi), %rax<br>
+; CHECK-NEXT: movq %rax, 8(%rsi)<br>
+; CHECK-NEXT: retq<br>
+;<br>
+; DISABLED-LABEL: test_conditional_block:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: cmpl $18, %edx<br>
+; DISABLED-NEXT: jl .LBB0_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl %edx, 4(%rdi)<br>
+; DISABLED-NEXT: .LBB0_2: # %if.end<br>
+; DISABLED-NEXT: movups (%r8), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%rcx)<br>
+; DISABLED-NEXT: movups (%rdi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%rsi)<br>
+; DISABLED-NEXT: retq<br>
+;<br>
+; CHECK-AVX2-LABEL: test_conditional_block:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX2-NEXT: jl .LBB0_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl %edx, 4(%rdi)<br>
+; CHECK-AVX2-NEXT: .LBB0_2: # %if.end<br>
+; CHECK-AVX2-NEXT: vmovups (%r8), %xmm0<br>
+; CHECK-AVX2-NEXT: vmovups %xmm0, (%rcx)<br>
+; CHECK-AVX2-NEXT: movl (%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, (%rsi)<br>
+; CHECK-AVX2-NEXT: movl 4(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 4(%rsi)<br>
+; CHECK-AVX2-NEXT: movq 8(%rdi), %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, 8(%rsi)<br>
+; CHECK-AVX2-NEXT: retq<br>
+;<br>
+; CHECK-AVX512-LABEL: test_conditional_block:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX512-NEXT: jl .LBB0_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl %edx, 4(%rdi)<br>
+; CHECK-AVX512-NEXT: .LBB0_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%r8), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%rcx)<br>
+; CHECK-AVX512-NEXT: movl (%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, (%rsi)<br>
+; CHECK-AVX512-NEXT: movl 4(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 4(%rsi)<br>
+; CHECK-AVX512-NEXT: movq 8(%rdi), %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, 8(%rsi)<br>
+; CHECK-AVX512-NEXT: retq<br>
+entry:<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %b = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 1<br>
+ store i32 %x, i32* %b, align 4<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S* %s3 to i8*<br>
+ %1 = bitcast %struct.S* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 4, i1 false)<br>
+ %2 = bitcast %struct.S* %s2 to i8*<br>
+ %3 = bitcast %struct.S* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 16, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_imm_store(%struct.S* nocapture %s1, %struct.S* nocapture %s2, i32 %x, %struct.S* nocapture %s3) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_imm_store:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: movl $0, (%rdi)<br>
+; CHECK-NEXT: movl $1, (%rcx)<br>
+; CHECK-NEXT: movl (%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, (%rsi)<br>
+; CHECK-NEXT: movq 4(%rdi), %rax<br>
+; CHECK-NEXT: movq %rax, 4(%rsi)<br>
+; CHECK-NEXT: movl 12(%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, 12(%rsi)<br>
+; CHECK-NEXT: retq<br>
+;<br>
+; DISABLED-LABEL: test_imm_store:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: movl $0, (%rdi)<br>
+; DISABLED-NEXT: movl $1, (%rcx)<br>
+; DISABLED-NEXT: movups (%rdi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%rsi)<br>
+; DISABLED-NEXT: retq<br>
+;<br>
+; CHECK-AVX2-LABEL: test_imm_store:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: movl $0, (%rdi)<br>
+; CHECK-AVX2-NEXT: movl $1, (%rcx)<br>
+; CHECK-AVX2-NEXT: movl (%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, (%rsi)<br>
+; CHECK-AVX2-NEXT: movq 4(%rdi), %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, 4(%rsi)<br>
+; CHECK-AVX2-NEXT: movl 12(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 12(%rsi)<br>
+; CHECK-AVX2-NEXT: retq<br>
+;<br>
+; CHECK-AVX512-LABEL: test_imm_store:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: movl $0, (%rdi)<br>
+; CHECK-AVX512-NEXT: movl $1, (%rcx)<br>
+; CHECK-AVX512-NEXT: movl (%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, (%rsi)<br>
+; CHECK-AVX512-NEXT: movq 4(%rdi), %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, 4(%rsi)<br>
+; CHECK-AVX512-NEXT: movl 12(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 12(%rsi)<br>
+; CHECK-AVX512-NEXT: retq<br>
+entry:<br>
+ %a = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 0<br>
+ store i32 0, i32* %a, align 4<br>
+ %a1 = getelementptr inbounds %struct.S, %struct.S* %s3, i64 0, i32 0<br>
+ store i32 1, i32* %a1, align 4<br>
+ %0 = bitcast %struct.S* %s2 to i8*<br>
+ %1 = bitcast %struct.S* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_nondirect_br(%struct.S* nocapture %s1, %struct.S* nocapture %s2, i32 %x, %struct.S* nocapture %s3, %struct.S* nocapture readonly %s4, i32 %x2) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_nondirect_br:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: cmpl $18, %edx<br>
+; CHECK-NEXT: jl .LBB2_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl %edx, 4(%rdi)<br>
+; CHECK-NEXT: .LBB2_2: # %if.end<br>
+; CHECK-NEXT: cmpl $14, %r9d<br>
+; CHECK-NEXT: jl .LBB2_4<br>
+; CHECK-NEXT: # %bb.3: # %if.then2<br>
+; CHECK-NEXT: movl %r9d, 12(%rdi)<br>
+; CHECK-NEXT: .LBB2_4: # %if.end3<br>
+; CHECK-NEXT: movups (%r8), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%rcx)<br>
+; CHECK-NEXT: movq (%rdi), %rax<br>
+; CHECK-NEXT: movq %rax, (%rsi)<br>
+; CHECK-NEXT: movl 8(%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, 8(%rsi)<br>
+; CHECK-NEXT: movl 12(%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, 12(%rsi)<br>
+; CHECK-NEXT: retq<br>
+;<br>
+; DISABLED-LABEL: test_nondirect_br:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: cmpl $18, %edx<br>
+; DISABLED-NEXT: jl .LBB2_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl %edx, 4(%rdi)<br>
+; DISABLED-NEXT: .LBB2_2: # %if.end<br>
+; DISABLED-NEXT: cmpl $14, %r9d<br>
+; DISABLED-NEXT: jl .LBB2_4<br>
+; DISABLED-NEXT: # %bb.3: # %if.then2<br>
+; DISABLED-NEXT: movl %r9d, 12(%rdi)<br>
+; DISABLED-NEXT: .LBB2_4: # %if.end3<br>
+; DISABLED-NEXT: movups (%r8), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%rcx)<br>
+; DISABLED-NEXT: movups (%rdi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%rsi)<br>
+; DISABLED-NEXT: retq<br>
+;<br>
+; CHECK-AVX2-LABEL: test_nondirect_br:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX2-NEXT: jl .LBB2_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl %edx, 4(%rdi)<br>
+; CHECK-AVX2-NEXT: .LBB2_2: # %if.end<br>
+; CHECK-AVX2-NEXT: cmpl $14, %r9d<br>
+; CHECK-AVX2-NEXT: jl .LBB2_4<br>
+; CHECK-AVX2-NEXT: # %bb.3: # %if.then2<br>
+; CHECK-AVX2-NEXT: movl %r9d, 12(%rdi)<br>
+; CHECK-AVX2-NEXT: .LBB2_4: # %if.end3<br>
+; CHECK-AVX2-NEXT: vmovups (%r8), %xmm0<br>
+; CHECK-AVX2-NEXT: vmovups %xmm0, (%rcx)<br>
+; CHECK-AVX2-NEXT: movq (%rdi), %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, (%rsi)<br>
+; CHECK-AVX2-NEXT: movl 8(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 8(%rsi)<br>
+; CHECK-AVX2-NEXT: movl 12(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 12(%rsi)<br>
+; CHECK-AVX2-NEXT: retq<br>
+;<br>
+; CHECK-AVX512-LABEL: test_nondirect_br:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX512-NEXT: jl .LBB2_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl %edx, 4(%rdi)<br>
+; CHECK-AVX512-NEXT: .LBB2_2: # %if.end<br>
+; CHECK-AVX512-NEXT: cmpl $14, %r9d<br>
+; CHECK-AVX512-NEXT: jl .LBB2_4<br>
+; CHECK-AVX512-NEXT: # %bb.3: # %if.then2<br>
+; CHECK-AVX512-NEXT: movl %r9d, 12(%rdi)<br>
+; CHECK-AVX512-NEXT: .LBB2_4: # %if.end3<br>
+; CHECK-AVX512-NEXT: vmovups (%r8), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%rcx)<br>
+; CHECK-AVX512-NEXT: movq (%rdi), %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, (%rsi)<br>
+; CHECK-AVX512-NEXT: movl 8(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 8(%rsi)<br>
+; CHECK-AVX512-NEXT: movl 12(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 12(%rsi)<br>
+; CHECK-AVX512-NEXT: retq<br>
+entry:<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %b = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 1<br>
+ store i32 %x, i32* %b, align 4<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %cmp1 = icmp sgt i32 %x2, 13<br>
+ br i1 %cmp1, label %if.then2, label %if.end3<br>
+<br>
+if.then2: ; preds = %if.end<br>
+ %d = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 3<br>
+ store i32 %x2, i32* %d, align 4<br>
+ br label %if.end3<br>
+<br>
+if.end3: ; preds = %if.then2, %if.end<br>
+ %0 = bitcast %struct.S* %s3 to i8*<br>
+ %1 = bitcast %struct.S* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 4, i1 false)<br>
+ %2 = bitcast %struct.S* %s2 to i8*<br>
+ %3 = bitcast %struct.S* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 16, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_2preds_block(%struct.S* nocapture %s1, %struct.S* nocapture %s2, i32 %x, %struct.S* nocapture %s3, %struct.S* nocapture readonly %s4, i32 %x2) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_2preds_block:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: movl %r9d, 12(%rdi)<br>
+; CHECK-NEXT: cmpl $18, %edx<br>
+; CHECK-NEXT: jl .LBB3_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl %edx, 4(%rdi)<br>
+; CHECK-NEXT: .LBB3_2: # %if.end<br>
+; CHECK-NEXT: movups (%r8), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%rcx)<br>
+; CHECK-NEXT: movl (%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, (%rsi)<br>
+; CHECK-NEXT: movl 4(%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, 4(%rsi)<br>
+; CHECK-NEXT: movl 8(%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, 8(%rsi)<br>
+; CHECK-NEXT: movl 12(%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, 12(%rsi)<br>
+; CHECK-NEXT: retq<br>
+;<br>
+; DISABLED-LABEL: test_2preds_block:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: movl %r9d, 12(%rdi)<br>
+; DISABLED-NEXT: cmpl $18, %edx<br>
+; DISABLED-NEXT: jl .LBB3_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl %edx, 4(%rdi)<br>
+; DISABLED-NEXT: .LBB3_2: # %if.end<br>
+; DISABLED-NEXT: movups (%r8), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%rcx)<br>
+; DISABLED-NEXT: movups (%rdi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%rsi)<br>
+; DISABLED-NEXT: retq<br>
+;<br>
+; CHECK-AVX2-LABEL: test_2preds_block:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: movl %r9d, 12(%rdi)<br>
+; CHECK-AVX2-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX2-NEXT: jl .LBB3_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl %edx, 4(%rdi)<br>
+; CHECK-AVX2-NEXT: .LBB3_2: # %if.end<br>
+; CHECK-AVX2-NEXT: vmovups (%r8), %xmm0<br>
+; CHECK-AVX2-NEXT: vmovups %xmm0, (%rcx)<br>
+; CHECK-AVX2-NEXT: movl (%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, (%rsi)<br>
+; CHECK-AVX2-NEXT: movl 4(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 4(%rsi)<br>
+; CHECK-AVX2-NEXT: movl 8(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 8(%rsi)<br>
+; CHECK-AVX2-NEXT: movl 12(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 12(%rsi)<br>
+; CHECK-AVX2-NEXT: retq<br>
+;<br>
+; CHECK-AVX512-LABEL: test_2preds_block:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: movl %r9d, 12(%rdi)<br>
+; CHECK-AVX512-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX512-NEXT: jl .LBB3_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl %edx, 4(%rdi)<br>
+; CHECK-AVX512-NEXT: .LBB3_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%r8), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%rcx)<br>
+; CHECK-AVX512-NEXT: movl (%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, (%rsi)<br>
+; CHECK-AVX512-NEXT: movl 4(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 4(%rsi)<br>
+; CHECK-AVX512-NEXT: movl 8(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 8(%rsi)<br>
+; CHECK-AVX512-NEXT: movl 12(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 12(%rsi)<br>
+; CHECK-AVX512-NEXT: retq<br>
+entry:<br>
+ %d = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 3<br>
+ store i32 %x2, i32* %d, align 4<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %b = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 1<br>
+ store i32 %x, i32* %b, align 4<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S* %s3 to i8*<br>
+ %1 = bitcast %struct.S* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 4, i1 false)<br>
+ %2 = bitcast %struct.S* %s2 to i8*<br>
+ %3 = bitcast %struct.S* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 16, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+%struct.S2 = type { i64, i64 }<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_type64(%struct.S2* nocapture %s1, %struct.S2* nocapture %s2, i32 %x, %struct.S2* nocapture %s3, %struct.S2* nocapture readonly %s4) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_type64:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: cmpl $18, %edx<br>
+; CHECK-NEXT: jl .LBB4_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movslq %edx, %rax<br>
+; CHECK-NEXT: movq %rax, 8(%rdi)<br>
+; CHECK-NEXT: .LBB4_2: # %if.end<br>
+; CHECK-NEXT: movups (%r8), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%rcx)<br>
+; CHECK-NEXT: movq (%rdi), %rax<br>
+; CHECK-NEXT: movq %rax, (%rsi)<br>
+; CHECK-NEXT: movq 8(%rdi), %rax<br>
+; CHECK-NEXT: movq %rax, 8(%rsi)<br>
+; CHECK-NEXT: retq<br>
+;<br>
+; DISABLED-LABEL: test_type64:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: cmpl $18, %edx<br>
+; DISABLED-NEXT: jl .LBB4_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movslq %edx, %rax<br>
+; DISABLED-NEXT: movq %rax, 8(%rdi)<br>
+; DISABLED-NEXT: .LBB4_2: # %if.end<br>
+; DISABLED-NEXT: movups (%r8), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%rcx)<br>
+; DISABLED-NEXT: movups (%rdi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%rsi)<br>
+; DISABLED-NEXT: retq<br>
+;<br>
+; CHECK-AVX2-LABEL: test_type64:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX2-NEXT: jl .LBB4_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movslq %edx, %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, 8(%rdi)<br>
+; CHECK-AVX2-NEXT: .LBB4_2: # %if.end<br>
+; CHECK-AVX2-NEXT: vmovups (%r8), %xmm0<br>
+; CHECK-AVX2-NEXT: vmovups %xmm0, (%rcx)<br>
+; CHECK-AVX2-NEXT: movq (%rdi), %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, (%rsi)<br>
+; CHECK-AVX2-NEXT: movq 8(%rdi), %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, 8(%rsi)<br>
+; CHECK-AVX2-NEXT: retq<br>
+;<br>
+; CHECK-AVX512-LABEL: test_type64:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX512-NEXT: jl .LBB4_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movslq %edx, %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, 8(%rdi)<br>
+; CHECK-AVX512-NEXT: .LBB4_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%r8), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%rcx)<br>
+; CHECK-AVX512-NEXT: movq (%rdi), %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, (%rsi)<br>
+; CHECK-AVX512-NEXT: movq 8(%rdi), %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, 8(%rsi)<br>
+; CHECK-AVX512-NEXT: retq<br>
+entry:<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %conv = sext i32 %x to i64<br>
+ %b = getelementptr inbounds %struct.S2, %struct.S2* %s1, i64 0, i32 1<br>
+ store i64 %conv, i64* %b, align 8<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S2* %s3 to i8*<br>
+ %1 = bitcast %struct.S2* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8, i1 false)<br>
+ %2 = bitcast %struct.S2* %s2 to i8*<br>
+ %3 = bitcast %struct.S2* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 16, i32 8, i1 false)<br>
+ ret void<br>
+}<br>
+%struct.S3 = type { i64, i8, i8, i16, i32 }<br>
+<br>
+; Function Attrs: noinline nounwind uwtable<br>
+define void @test_mixed_type(%struct.S3* nocapture %s1, %struct.S3* nocapture %s2, i32 %x, %struct.S3* nocapture readnone %s3, %struct.S3* nocapture readnone %s4) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_mixed_type:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: cmpl $18, %edx<br>
+; CHECK-NEXT: jl .LBB5_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movslq %edx, %rax<br>
+; CHECK-NEXT: movq %rax, (%rdi)<br>
+; CHECK-NEXT: movb %dl, 8(%rdi)<br>
+; CHECK-NEXT: .LBB5_2: # %if.end<br>
+; CHECK-NEXT: movq (%rdi), %rax<br>
+; CHECK-NEXT: movq %rax, (%rsi)<br>
+; CHECK-NEXT: movb 8(%rdi), %al<br>
+; CHECK-NEXT: movb %al, 8(%rsi)<br>
+; CHECK-NEXT: movl 9(%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, 9(%rsi)<br>
+; CHECK-NEXT: movzwl 13(%rdi), %eax<br>
+; CHECK-NEXT: movw %ax, 13(%rsi)<br>
+; CHECK-NEXT: movb 15(%rdi), %al<br>
+; CHECK-NEXT: movb %al, 15(%rsi)<br>
+; CHECK-NEXT: retq<br>
+;<br>
+; DISABLED-LABEL: test_mixed_type:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: cmpl $18, %edx<br>
+; DISABLED-NEXT: jl .LBB5_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movslq %edx, %rax<br>
+; DISABLED-NEXT: movq %rax, (%rdi)<br>
+; DISABLED-NEXT: movb %dl, 8(%rdi)<br>
+; DISABLED-NEXT: .LBB5_2: # %if.end<br>
+; DISABLED-NEXT: movups (%rdi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%rsi)<br>
+; DISABLED-NEXT: retq<br>
+;<br>
+; CHECK-AVX2-LABEL: test_mixed_type:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX2-NEXT: jl .LBB5_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movslq %edx, %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, (%rdi)<br>
+; CHECK-AVX2-NEXT: movb %dl, 8(%rdi)<br>
+; CHECK-AVX2-NEXT: .LBB5_2: # %if.end<br>
+; CHECK-AVX2-NEXT: movq (%rdi), %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, (%rsi)<br>
+; CHECK-AVX2-NEXT: movb 8(%rdi), %al<br>
+; CHECK-AVX2-NEXT: movb %al, 8(%rsi)<br>
+; CHECK-AVX2-NEXT: movl 9(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 9(%rsi)<br>
+; CHECK-AVX2-NEXT: movzwl 13(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movw %ax, 13(%rsi)<br>
+; CHECK-AVX2-NEXT: movb 15(%rdi), %al<br>
+; CHECK-AVX2-NEXT: movb %al, 15(%rsi)<br>
+; CHECK-AVX2-NEXT: retq<br>
+;<br>
+; CHECK-AVX512-LABEL: test_mixed_type:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX512-NEXT: jl .LBB5_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movslq %edx, %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, (%rdi)<br>
+; CHECK-AVX512-NEXT: movb %dl, 8(%rdi)<br>
+; CHECK-AVX512-NEXT: .LBB5_2: # %if.end<br>
+; CHECK-AVX512-NEXT: movq (%rdi), %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, (%rsi)<br>
+; CHECK-AVX512-NEXT: movb 8(%rdi), %al<br>
+; CHECK-AVX512-NEXT: movb %al, 8(%rsi)<br>
+; CHECK-AVX512-NEXT: movl 9(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 9(%rsi)<br>
+; CHECK-AVX512-NEXT: movzwl 13(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movw %ax, 13(%rsi)<br>
+; CHECK-AVX512-NEXT: movb 15(%rdi), %al<br>
+; CHECK-AVX512-NEXT: movb %al, 15(%rsi)<br>
+; CHECK-AVX512-NEXT: retq<br>
+entry:<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %conv = sext i32 %x to i64<br>
+ %a = getelementptr inbounds %struct.S3, %struct.S3* %s1, i64 0, i32 0<br>
+ store i64 %conv, i64* %a, align 8<br>
+ %conv1 = trunc i32 %x to i8<br>
+ %b = getelementptr inbounds %struct.S3, %struct.S3* %s1, i64 0, i32 1<br>
+ store i8 %conv1, i8* %b, align 8<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S3* %s2 to i8*<br>
+ %1 = bitcast %struct.S3* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 8, i1 false)<br>
+ ret void<br>
+}<br>
+%struct.S4 = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_multiple_blocks(%struct.<wbr>S4* nocapture %s1, %struct.S4* nocapture %s2) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_multiple_blocks:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: movl $0, 4(%rdi)<br>
+; CHECK-NEXT: movl $0, 36(%rdi)<br>
+; CHECK-NEXT: movups 16(%rdi), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, 16(%rsi)<br>
+; CHECK-NEXT: movl 32(%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, 32(%rsi)<br>
+; CHECK-NEXT: movl 36(%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, 36(%rsi)<br>
+; CHECK-NEXT: movq 40(%rdi), %rax<br>
+; CHECK-NEXT: movq %rax, 40(%rsi)<br>
+; CHECK-NEXT: movl (%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, (%rsi)<br>
+; CHECK-NEXT: movl 4(%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, 4(%rsi)<br>
+; CHECK-NEXT: movq 8(%rdi), %rax<br>
+; CHECK-NEXT: movq %rax, 8(%rsi)<br>
+; CHECK-NEXT: retq<br>
+;<br>
+; DISABLED-LABEL: test_multiple_blocks:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: movl $0, 4(%rdi)<br>
+; DISABLED-NEXT: movl $0, 36(%rdi)<br>
+; DISABLED-NEXT: movups 16(%rdi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, 16(%rsi)<br>
+; DISABLED-NEXT: movups 32(%rdi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, 32(%rsi)<br>
+; DISABLED-NEXT: movups (%rdi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%rsi)<br>
+; DISABLED-NEXT: retq<br>
+;<br>
+; CHECK-AVX2-LABEL: test_multiple_blocks:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: movl $0, 4(%rdi)<br>
+; CHECK-AVX2-NEXT: movl $0, 36(%rdi)<br>
+; CHECK-AVX2-NEXT: vmovups 16(%rdi), %xmm0<br>
+; CHECK-AVX2-NEXT: vmovups %xmm0, 16(%rsi)<br>
+; CHECK-AVX2-NEXT: movl 32(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 32(%rsi)<br>
+; CHECK-AVX2-NEXT: movl 36(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 36(%rsi)<br>
+; CHECK-AVX2-NEXT: movq 40(%rdi), %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, 40(%rsi)<br>
+; CHECK-AVX2-NEXT: movl (%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, (%rsi)<br>
+; CHECK-AVX2-NEXT: movl 4(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 4(%rsi)<br>
+; CHECK-AVX2-NEXT: vmovups 8(%rdi), %xmm0<br>
+; CHECK-AVX2-NEXT: vmovups %xmm0, 8(%rsi)<br>
+; CHECK-AVX2-NEXT: movq 24(%rdi), %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, 24(%rsi)<br>
+; CHECK-AVX2-NEXT: retq<br>
+;<br>
+; CHECK-AVX512-LABEL: test_multiple_blocks:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: movl $0, 4(%rdi)<br>
+; CHECK-AVX512-NEXT: movl $0, 36(%rdi)<br>
+; CHECK-AVX512-NEXT: vmovups 16(%rdi), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, 16(%rsi)<br>
+; CHECK-AVX512-NEXT: movl 32(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 32(%rsi)<br>
+; CHECK-AVX512-NEXT: movl 36(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 36(%rsi)<br>
+; CHECK-AVX512-NEXT: movq 40(%rdi), %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, 40(%rsi)<br>
+; CHECK-AVX512-NEXT: movl (%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, (%rsi)<br>
+; CHECK-AVX512-NEXT: movl 4(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 4(%rsi)<br>
+; CHECK-AVX512-NEXT: vmovups 8(%rdi), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, 8(%rsi)<br>
+; CHECK-AVX512-NEXT: movq 24(%rdi), %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, 24(%rsi)<br>
+; CHECK-AVX512-NEXT: retq<br>
+entry:<br>
+ %b = getelementptr inbounds %struct.S4, %struct.S4* %s1, i64 0, i32 1<br>
+ store i32 0, i32* %b, align 4<br>
+ %b3 = getelementptr inbounds %struct.S4, %struct.S4* %s1, i64 0, i32 9<br>
+ store i32 0, i32* %b3, align 4<br>
+ %0 = bitcast %struct.S4* %s2 to i8*<br>
+ %1 = bitcast %struct.S4* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 48, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+%struct.S5 = type { i16, i16, i16, i16, i16, i16, i16, i16 }<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_type16(%struct.S5* nocapture %s1, %struct.S5* nocapture %s2, i32 %x, %struct.S5* nocapture %s3, %struct.S5* nocapture readonly %s4) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_type16:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: cmpl $18, %edx<br>
+; CHECK-NEXT: jl .LBB7_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movw %dx, 2(%rdi)<br>
+; CHECK-NEXT: .LBB7_2: # %if.end<br>
+; CHECK-NEXT: movups (%r8), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%rcx)<br>
+; CHECK-NEXT: movzwl (%rdi), %eax<br>
+; CHECK-NEXT: movw %ax, (%rsi)<br>
+; CHECK-NEXT: movzwl 2(%rdi), %eax<br>
+; CHECK-NEXT: movw %ax, 2(%rsi)<br>
+; CHECK-NEXT: movq 4(%rdi), %rax<br>
+; CHECK-NEXT: movq %rax, 4(%rsi)<br>
+; CHECK-NEXT: movl 12(%rdi), %eax<br>
+; CHECK-NEXT: movl %eax, 12(%rsi)<br>
+; CHECK-NEXT: retq<br>
+;<br>
+; DISABLED-LABEL: test_type16:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: cmpl $18, %edx<br>
+; DISABLED-NEXT: jl .LBB7_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movw %dx, 2(%rdi)<br>
+; DISABLED-NEXT: .LBB7_2: # %if.end<br>
+; DISABLED-NEXT: movups (%r8), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%rcx)<br>
+; DISABLED-NEXT: movups (%rdi), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%rsi)<br>
+; DISABLED-NEXT: retq<br>
+;<br>
+; CHECK-AVX2-LABEL: test_type16:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX2-NEXT: jl .LBB7_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movw %dx, 2(%rdi)<br>
+; CHECK-AVX2-NEXT: .LBB7_2: # %if.end<br>
+; CHECK-AVX2-NEXT: vmovups (%r8), %xmm0<br>
+; CHECK-AVX2-NEXT: vmovups %xmm0, (%rcx)<br>
+; CHECK-AVX2-NEXT: movzwl (%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movw %ax, (%rsi)<br>
+; CHECK-AVX2-NEXT: movzwl 2(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movw %ax, 2(%rsi)<br>
+; CHECK-AVX2-NEXT: movq 4(%rdi), %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, 4(%rsi)<br>
+; CHECK-AVX2-NEXT: movl 12(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 12(%rsi)<br>
+; CHECK-AVX2-NEXT: retq<br>
+;<br>
+; CHECK-AVX512-LABEL: test_type16:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX512-NEXT: jl .LBB7_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movw %dx, 2(%rdi)<br>
+; CHECK-AVX512-NEXT: .LBB7_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%r8), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%rcx)<br>
+; CHECK-AVX512-NEXT: movzwl (%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movw %ax, (%rsi)<br>
+; CHECK-AVX512-NEXT: movzwl 2(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movw %ax, 2(%rsi)<br>
+; CHECK-AVX512-NEXT: movq 4(%rdi), %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, 4(%rsi)<br>
+; CHECK-AVX512-NEXT: movl 12(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 12(%rsi)<br>
+; CHECK-AVX512-NEXT: retq<br>
+entry:<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %conv = trunc i32 %x to i16<br>
+ %b = getelementptr inbounds %struct.S5, %struct.S5* %s1, i64 0, i32 1<br>
+ store i16 %conv, i16* %b, align 2<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S5* %s3 to i8*<br>
+ %1 = bitcast %struct.S5* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 2, i1 false)<br>
+ %2 = bitcast %struct.S5* %s2 to i8*<br>
+ %3 = bitcast %struct.S5* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 16, i32 2, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+%struct.S6 = type { [4 x i32], i32, i32, i32, i32 }<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_stack(%struct.S6* noalias nocapture sret %agg.result, %struct.S6* byval nocapture readnone align 8 %s1, %struct.S6* byval nocapture align 8 %s2, i32 %x) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_stack:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: movl %esi, {{[0-9]+}}(%rsp)<br>
+; CHECK-NEXT: movaps {{[0-9]+}}(%rsp), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%rdi)<br>
+; CHECK-NEXT: movq {{[0-9]+}}(%rsp), %rax<br>
+; CHECK-NEXT: movq %rax, 16(%rdi)<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%rsp), %eax<br>
+; CHECK-NEXT: movl %eax, 24(%rdi)<br>
+; CHECK-NEXT: movl {{[0-9]+}}(%rsp), %eax<br>
+; CHECK-NEXT: movl %eax, 28(%rdi)<br>
+; CHECK-NEXT: movq %rdi, %rax<br>
+; CHECK-NEXT: retq<br>
+;<br>
+; DISABLED-LABEL: test_stack:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: movl %esi, {{[0-9]+}}(%rsp)<br>
+; DISABLED-NEXT: movaps {{[0-9]+}}(%rsp), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%rdi)<br>
+; DISABLED-NEXT: movaps {{[0-9]+}}(%rsp), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, 16(%rdi)<br>
+; DISABLED-NEXT: movq %rdi, %rax<br>
+; DISABLED-NEXT: retq<br>
+;<br>
+; CHECK-AVX2-LABEL: test_stack:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: movl %esi, {{[0-9]+}}(%rsp)<br>
+; CHECK-AVX2-NEXT: vmovups {{[0-9]+}}(%rsp), %xmm0<br>
+; CHECK-AVX2-NEXT: vmovups %xmm0, (%rdi)<br>
+; CHECK-AVX2-NEXT: movq {{[0-9]+}}(%rsp), %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, 16(%rdi)<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%rsp), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 24(%rdi)<br>
+; CHECK-AVX2-NEXT: movl {{[0-9]+}}(%rsp), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 28(%rdi)<br>
+; CHECK-AVX2-NEXT: movq %rdi, %rax<br>
+; CHECK-AVX2-NEXT: retq<br>
+;<br>
+; CHECK-AVX512-LABEL: test_stack:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: movl %esi, {{[0-9]+}}(%rsp)<br>
+; CHECK-AVX512-NEXT: vmovups {{[0-9]+}}(%rsp), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%rdi)<br>
+; CHECK-AVX512-NEXT: movq {{[0-9]+}}(%rsp), %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, 16(%rdi)<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%rsp), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 24(%rdi)<br>
+; CHECK-AVX512-NEXT: movl {{[0-9]+}}(%rsp), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 28(%rdi)<br>
+; CHECK-AVX512-NEXT: movq %rdi, %rax<br>
+; CHECK-AVX512-NEXT: retq<br>
+entry:<br>
+ %s6.sroa.0.0..sroa_cast1 = bitcast %struct.S6* %s2 to i8*<br>
+ %s6.sroa.3.0..sroa_idx4 = getelementptr inbounds %struct.S6, %struct.S6* %s2, i64 0, i32 3<br>
+ store i32 %x, i32* %s6.sroa.3.0..sroa_idx4, align 8<br>
+ %0 = bitcast %struct.S6* %agg.result to i8*<br>
+ call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* nonnull %s6.sroa.0.0..sroa_cast1, i64 32, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_limit_all(%struct.S* %s1, %struct.S* nocapture %s2, i32 %x, %struct.S* nocapture %s3, %struct.S* nocapture readonly %s4, i32 %x2) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_limit_all:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: pushq %rbp<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-NEXT: pushq %r15<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 24<br>
+; CHECK-NEXT: pushq %r14<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 32<br>
+; CHECK-NEXT: pushq %r12<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 40<br>
+; CHECK-NEXT: pushq %rbx<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 48<br>
+; CHECK-NEXT: .cfi_offset %rbx, -48<br>
+; CHECK-NEXT: .cfi_offset %r12, -40<br>
+; CHECK-NEXT: .cfi_offset %r14, -32<br>
+; CHECK-NEXT: .cfi_offset %r15, -24<br>
+; CHECK-NEXT: .cfi_offset %rbp, -16<br>
+; CHECK-NEXT: movq %r8, %r15<br>
+; CHECK-NEXT: movq %rcx, %r14<br>
+; CHECK-NEXT: movl %edx, %ebp<br>
+; CHECK-NEXT: movq %rsi, %r12<br>
+; CHECK-NEXT: movq %rdi, %rbx<br>
+; CHECK-NEXT: movl %r9d, 12(%rdi)<br>
+; CHECK-NEXT: callq bar<br>
+; CHECK-NEXT: cmpl $18, %ebp<br>
+; CHECK-NEXT: jl .LBB9_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl %ebp, 4(%rbx)<br>
+; CHECK-NEXT: movq %rbx, %rdi<br>
+; CHECK-NEXT: callq bar<br>
+; CHECK-NEXT: .LBB9_2: # %if.end<br>
+; CHECK-NEXT: movups (%r15), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%r14)<br>
+; CHECK-NEXT: movups (%rbx), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%r12)<br>
+; CHECK-NEXT: popq %rbx<br>
+; CHECK-NEXT: popq %r12<br>
+; CHECK-NEXT: popq %r14<br>
+; CHECK-NEXT: popq %r15<br>
+; CHECK-NEXT: popq %rbp<br>
+; CHECK-NEXT: retq<br>
+;<br>
+; DISABLED-LABEL: test_limit_all:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: pushq %rbp<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 16<br>
+; DISABLED-NEXT: pushq %r15<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 24<br>
+; DISABLED-NEXT: pushq %r14<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 32<br>
+; DISABLED-NEXT: pushq %r12<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 40<br>
+; DISABLED-NEXT: pushq %rbx<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 48<br>
+; DISABLED-NEXT: .cfi_offset %rbx, -48<br>
+; DISABLED-NEXT: .cfi_offset %r12, -40<br>
+; DISABLED-NEXT: .cfi_offset %r14, -32<br>
+; DISABLED-NEXT: .cfi_offset %r15, -24<br>
+; DISABLED-NEXT: .cfi_offset %rbp, -16<br>
+; DISABLED-NEXT: movq %r8, %r15<br>
+; DISABLED-NEXT: movq %rcx, %r14<br>
+; DISABLED-NEXT: movl %edx, %ebp<br>
+; DISABLED-NEXT: movq %rsi, %r12<br>
+; DISABLED-NEXT: movq %rdi, %rbx<br>
+; DISABLED-NEXT: movl %r9d, 12(%rdi)<br>
+; DISABLED-NEXT: callq bar<br>
+; DISABLED-NEXT: cmpl $18, %ebp<br>
+; DISABLED-NEXT: jl .LBB9_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl %ebp, 4(%rbx)<br>
+; DISABLED-NEXT: movq %rbx, %rdi<br>
+; DISABLED-NEXT: callq bar<br>
+; DISABLED-NEXT: .LBB9_2: # %if.end<br>
+; DISABLED-NEXT: movups (%r15), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%r14)<br>
+; DISABLED-NEXT: movups (%rbx), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%r12)<br>
+; DISABLED-NEXT: popq %rbx<br>
+; DISABLED-NEXT: popq %r12<br>
+; DISABLED-NEXT: popq %r14<br>
+; DISABLED-NEXT: popq %r15<br>
+; DISABLED-NEXT: popq %rbp<br>
+; DISABLED-NEXT: retq<br>
+;<br>
+; CHECK-AVX2-LABEL: test_limit_all:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: pushq %rbp<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-AVX2-NEXT: pushq %r15<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 24<br>
+; CHECK-AVX2-NEXT: pushq %r14<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 32<br>
+; CHECK-AVX2-NEXT: pushq %r12<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 40<br>
+; CHECK-AVX2-NEXT: pushq %rbx<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 48<br>
+; CHECK-AVX2-NEXT: .cfi_offset %rbx, -48<br>
+; CHECK-AVX2-NEXT: .cfi_offset %r12, -40<br>
+; CHECK-AVX2-NEXT: .cfi_offset %r14, -32<br>
+; CHECK-AVX2-NEXT: .cfi_offset %r15, -24<br>
+; CHECK-AVX2-NEXT: .cfi_offset %rbp, -16<br>
+; CHECK-AVX2-NEXT: movq %r8, %r15<br>
+; CHECK-AVX2-NEXT: movq %rcx, %r14<br>
+; CHECK-AVX2-NEXT: movl %edx, %ebp<br>
+; CHECK-AVX2-NEXT: movq %rsi, %r12<br>
+; CHECK-AVX2-NEXT: movq %rdi, %rbx<br>
+; CHECK-AVX2-NEXT: movl %r9d, 12(%rdi)<br>
+; CHECK-AVX2-NEXT: callq bar<br>
+; CHECK-AVX2-NEXT: cmpl $18, %ebp<br>
+; CHECK-AVX2-NEXT: jl .LBB9_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl %ebp, 4(%rbx)<br>
+; CHECK-AVX2-NEXT: movq %rbx, %rdi<br>
+; CHECK-AVX2-NEXT: callq bar<br>
+; CHECK-AVX2-NEXT: .LBB9_2: # %if.end<br>
+; CHECK-AVX2-NEXT: vmovups (%r15), %xmm0<br>
+; CHECK-AVX2-NEXT: vmovups %xmm0, (%r14)<br>
+; CHECK-AVX2-NEXT: vmovups (%rbx), %xmm0<br>
+; CHECK-AVX2-NEXT: vmovups %xmm0, (%r12)<br>
+; CHECK-AVX2-NEXT: popq %rbx<br>
+; CHECK-AVX2-NEXT: popq %r12<br>
+; CHECK-AVX2-NEXT: popq %r14<br>
+; CHECK-AVX2-NEXT: popq %r15<br>
+; CHECK-AVX2-NEXT: popq %rbp<br>
+; CHECK-AVX2-NEXT: retq<br>
+;<br>
+; CHECK-AVX512-LABEL: test_limit_all:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: pushq %rbp<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-AVX512-NEXT: pushq %r15<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 24<br>
+; CHECK-AVX512-NEXT: pushq %r14<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 32<br>
+; CHECK-AVX512-NEXT: pushq %r12<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 40<br>
+; CHECK-AVX512-NEXT: pushq %rbx<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 48<br>
+; CHECK-AVX512-NEXT: .cfi_offset %rbx, -48<br>
+; CHECK-AVX512-NEXT: .cfi_offset %r12, -40<br>
+; CHECK-AVX512-NEXT: .cfi_offset %r14, -32<br>
+; CHECK-AVX512-NEXT: .cfi_offset %r15, -24<br>
+; CHECK-AVX512-NEXT: .cfi_offset %rbp, -16<br>
+; CHECK-AVX512-NEXT: movq %r8, %r15<br>
+; CHECK-AVX512-NEXT: movq %rcx, %r14<br>
+; CHECK-AVX512-NEXT: movl %edx, %ebp<br>
+; CHECK-AVX512-NEXT: movq %rsi, %r12<br>
+; CHECK-AVX512-NEXT: movq %rdi, %rbx<br>
+; CHECK-AVX512-NEXT: movl %r9d, 12(%rdi)<br>
+; CHECK-AVX512-NEXT: callq bar<br>
+; CHECK-AVX512-NEXT: cmpl $18, %ebp<br>
+; CHECK-AVX512-NEXT: jl .LBB9_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl %ebp, 4(%rbx)<br>
+; CHECK-AVX512-NEXT: movq %rbx, %rdi<br>
+; CHECK-AVX512-NEXT: callq bar<br>
+; CHECK-AVX512-NEXT: .LBB9_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%r15), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%r14)<br>
+; CHECK-AVX512-NEXT: vmovups (%rbx), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%r12)<br>
+; CHECK-AVX512-NEXT: popq %rbx<br>
+; CHECK-AVX512-NEXT: popq %r12<br>
+; CHECK-AVX512-NEXT: popq %r14<br>
+; CHECK-AVX512-NEXT: popq %r15<br>
+; CHECK-AVX512-NEXT: popq %rbp<br>
+; CHECK-AVX512-NEXT: retq<br>
+entry:<br>
+ %d = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 3<br>
+ store i32 %x2, i32* %d, align 4<br>
+ tail call void @bar(%struct.S* %s1) #3<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %b = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 1<br>
+ store i32 %x, i32* %b, align 4<br>
+ tail call void @bar(%struct.S* nonnull %s1) #3<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S* %s3 to i8*<br>
+ %1 = bitcast %struct.S* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 4, i1 false)<br>
+ %2 = bitcast %struct.S* %s2 to i8*<br>
+ %3 = bitcast %struct.S* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 16, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_limit_one_pred(%struct.S<wbr>* %s1, %struct.S* nocapture %s2, i32 %x, %struct.S* nocapture %s3, %struct.S* nocapture readonly %s4, i32 %x2) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_limit_one_pred:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: pushq %r15<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-NEXT: pushq %r14<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 24<br>
+; CHECK-NEXT: pushq %r12<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 32<br>
+; CHECK-NEXT: pushq %rbx<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 40<br>
+; CHECK-NEXT: pushq %rax<br>
+; CHECK-NEXT: .cfi_def_cfa_offset 48<br>
+; CHECK-NEXT: .cfi_offset %rbx, -40<br>
+; CHECK-NEXT: .cfi_offset %r12, -32<br>
+; CHECK-NEXT: .cfi_offset %r14, -24<br>
+; CHECK-NEXT: .cfi_offset %r15, -16<br>
+; CHECK-NEXT: movq %r8, %r12<br>
+; CHECK-NEXT: movq %rcx, %r15<br>
+; CHECK-NEXT: movq %rsi, %r14<br>
+; CHECK-NEXT: movq %rdi, %rbx<br>
+; CHECK-NEXT: movl %r9d, 12(%rdi)<br>
+; CHECK-NEXT: cmpl $18, %edx<br>
+; CHECK-NEXT: jl .LBB10_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl %edx, 4(%rbx)<br>
+; CHECK-NEXT: movq %rbx, %rdi<br>
+; CHECK-NEXT: callq bar<br>
+; CHECK-NEXT: .LBB10_2: # %if.end<br>
+; CHECK-NEXT: movups (%r12), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, (%r15)<br>
+; CHECK-NEXT: movq (%rbx), %rax<br>
+; CHECK-NEXT: movq %rax, (%r14)<br>
+; CHECK-NEXT: movl 8(%rbx), %eax<br>
+; CHECK-NEXT: movl %eax, 8(%r14)<br>
+; CHECK-NEXT: movl 12(%rbx), %eax<br>
+; CHECK-NEXT: movl %eax, 12(%r14)<br>
+; CHECK-NEXT: addq $8, %rsp<br>
+; CHECK-NEXT: popq %rbx<br>
+; CHECK-NEXT: popq %r12<br>
+; CHECK-NEXT: popq %r14<br>
+; CHECK-NEXT: popq %r15<br>
+; CHECK-NEXT: retq<br>
+;<br>
+; DISABLED-LABEL: test_limit_one_pred:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: pushq %r15<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 16<br>
+; DISABLED-NEXT: pushq %r14<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 24<br>
+; DISABLED-NEXT: pushq %r12<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 32<br>
+; DISABLED-NEXT: pushq %rbx<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 40<br>
+; DISABLED-NEXT: pushq %rax<br>
+; DISABLED-NEXT: .cfi_def_cfa_offset 48<br>
+; DISABLED-NEXT: .cfi_offset %rbx, -40<br>
+; DISABLED-NEXT: .cfi_offset %r12, -32<br>
+; DISABLED-NEXT: .cfi_offset %r14, -24<br>
+; DISABLED-NEXT: .cfi_offset %r15, -16<br>
+; DISABLED-NEXT: movq %r8, %r15<br>
+; DISABLED-NEXT: movq %rcx, %r14<br>
+; DISABLED-NEXT: movq %rsi, %r12<br>
+; DISABLED-NEXT: movq %rdi, %rbx<br>
+; DISABLED-NEXT: movl %r9d, 12(%rdi)<br>
+; DISABLED-NEXT: cmpl $18, %edx<br>
+; DISABLED-NEXT: jl .LBB10_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl %edx, 4(%rbx)<br>
+; DISABLED-NEXT: movq %rbx, %rdi<br>
+; DISABLED-NEXT: callq bar<br>
+; DISABLED-NEXT: .LBB10_2: # %if.end<br>
+; DISABLED-NEXT: movups (%r15), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%r14)<br>
+; DISABLED-NEXT: movups (%rbx), %xmm0<br>
+; DISABLED-NEXT: movups %xmm0, (%r12)<br>
+; DISABLED-NEXT: addq $8, %rsp<br>
+; DISABLED-NEXT: popq %rbx<br>
+; DISABLED-NEXT: popq %r12<br>
+; DISABLED-NEXT: popq %r14<br>
+; DISABLED-NEXT: popq %r15<br>
+; DISABLED-NEXT: retq<br>
+;<br>
+; CHECK-AVX2-LABEL: test_limit_one_pred:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: pushq %r15<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-AVX2-NEXT: pushq %r14<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 24<br>
+; CHECK-AVX2-NEXT: pushq %r12<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 32<br>
+; CHECK-AVX2-NEXT: pushq %rbx<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 40<br>
+; CHECK-AVX2-NEXT: pushq %rax<br>
+; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 48<br>
+; CHECK-AVX2-NEXT: .cfi_offset %rbx, -40<br>
+; CHECK-AVX2-NEXT: .cfi_offset %r12, -32<br>
+; CHECK-AVX2-NEXT: .cfi_offset %r14, -24<br>
+; CHECK-AVX2-NEXT: .cfi_offset %r15, -16<br>
+; CHECK-AVX2-NEXT: movq %r8, %r12<br>
+; CHECK-AVX2-NEXT: movq %rcx, %r15<br>
+; CHECK-AVX2-NEXT: movq %rsi, %r14<br>
+; CHECK-AVX2-NEXT: movq %rdi, %rbx<br>
+; CHECK-AVX2-NEXT: movl %r9d, 12(%rdi)<br>
+; CHECK-AVX2-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX2-NEXT: jl .LBB10_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl %edx, 4(%rbx)<br>
+; CHECK-AVX2-NEXT: movq %rbx, %rdi<br>
+; CHECK-AVX2-NEXT: callq bar<br>
+; CHECK-AVX2-NEXT: .LBB10_2: # %if.end<br>
+; CHECK-AVX2-NEXT: vmovups (%r12), %xmm0<br>
+; CHECK-AVX2-NEXT: vmovups %xmm0, (%r15)<br>
+; CHECK-AVX2-NEXT: movq (%rbx), %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, (%r14)<br>
+; CHECK-AVX2-NEXT: movl 8(%rbx), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 8(%r14)<br>
+; CHECK-AVX2-NEXT: movl 12(%rbx), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 12(%r14)<br>
+; CHECK-AVX2-NEXT: addq $8, %rsp<br>
+; CHECK-AVX2-NEXT: popq %rbx<br>
+; CHECK-AVX2-NEXT: popq %r12<br>
+; CHECK-AVX2-NEXT: popq %r14<br>
+; CHECK-AVX2-NEXT: popq %r15<br>
+; CHECK-AVX2-NEXT: retq<br>
+;<br>
+; CHECK-AVX512-LABEL: test_limit_one_pred:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: pushq %r15<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 16<br>
+; CHECK-AVX512-NEXT: pushq %r14<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 24<br>
+; CHECK-AVX512-NEXT: pushq %r12<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 32<br>
+; CHECK-AVX512-NEXT: pushq %rbx<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 40<br>
+; CHECK-AVX512-NEXT: pushq %rax<br>
+; CHECK-AVX512-NEXT: .cfi_def_cfa_offset 48<br>
+; CHECK-AVX512-NEXT: .cfi_offset %rbx, -40<br>
+; CHECK-AVX512-NEXT: .cfi_offset %r12, -32<br>
+; CHECK-AVX512-NEXT: .cfi_offset %r14, -24<br>
+; CHECK-AVX512-NEXT: .cfi_offset %r15, -16<br>
+; CHECK-AVX512-NEXT: movq %r8, %r12<br>
+; CHECK-AVX512-NEXT: movq %rcx, %r15<br>
+; CHECK-AVX512-NEXT: movq %rsi, %r14<br>
+; CHECK-AVX512-NEXT: movq %rdi, %rbx<br>
+; CHECK-AVX512-NEXT: movl %r9d, 12(%rdi)<br>
+; CHECK-AVX512-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX512-NEXT: jl .LBB10_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl %edx, 4(%rbx)<br>
+; CHECK-AVX512-NEXT: movq %rbx, %rdi<br>
+; CHECK-AVX512-NEXT: callq bar<br>
+; CHECK-AVX512-NEXT: .LBB10_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%r12), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, (%r15)<br>
+; CHECK-AVX512-NEXT: movq (%rbx), %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, (%r14)<br>
+; CHECK-AVX512-NEXT: movl 8(%rbx), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 8(%r14)<br>
+; CHECK-AVX512-NEXT: movl 12(%rbx), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 12(%r14)<br>
+; CHECK-AVX512-NEXT: addq $8, %rsp<br>
+; CHECK-AVX512-NEXT: popq %rbx<br>
+; CHECK-AVX512-NEXT: popq %r12<br>
+; CHECK-AVX512-NEXT: popq %r14<br>
+; CHECK-AVX512-NEXT: popq %r15<br>
+; CHECK-AVX512-NEXT: retq<br>
+entry:<br>
+ %d = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 3<br>
+ store i32 %x2, i32* %d, align 4<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %b = getelementptr inbounds %struct.S, %struct.S* %s1, i64 0, i32 1<br>
+ store i32 %x, i32* %b, align 4<br>
+ tail call void @bar(%struct.S* nonnull %s1) #3<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S* %s3 to i8*<br>
+ %1 = bitcast %struct.S* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 4, i1 false)<br>
+ %2 = bitcast %struct.S* %s2 to i8*<br>
+ %3 = bitcast %struct.S* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 16, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+<br>
+declare void @bar(%struct.S*) local_unnamed_addr #1<br>
+<br>
+<br>
+; Function Attrs: argmemonly nounwind<br>
+declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i32, i1) #1<br>
+<br>
+attributes #0 = { nounwind uwtable "target-cpu"="x86-64" }<br>
+<br>
+%struct.S7 = type { float, float, float , float, float, float, float, float }<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_conditional_block_float(<wbr>%struct.S7* nocapture %s1, %struct.S7* nocapture %s2, i32 %x, %struct.S7* nocapture %s3, %struct.S7* nocapture readonly %s4, float %y) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_conditional_block_float:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: cmpl $18, %edx<br>
+; CHECK-NEXT: jl .LBB11_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movl $1065353216, 4(%rdi) # imm = 0x3F800000<br>
+; CHECK-NEXT: .LBB11_2: # %if.end<br>
+; CHECK-NEXT: movups (%r8), %xmm0<br>
+; CHECK-NEXT: movups 16(%r8), %xmm1<br>
+; CHECK-NEXT: movups %xmm1, 16(%rcx)<br>
+; CHECK-NEXT: movups %xmm0, (%rcx)<br>
+; CHECK-NEXT: movl (%rdi), %eax<br>
+; CHECK-NEXT: movl 4(%rdi), %ecx<br>
+; CHECK-NEXT: movq 8(%rdi), %rdx<br>
+; CHECK-NEXT: movups 16(%rdi), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, 16(%rsi)<br>
+; CHECK-NEXT: movl %eax, (%rsi)<br>
+; CHECK-NEXT: movl %ecx, 4(%rsi)<br>
+; CHECK-NEXT: movq %rdx, 8(%rsi)<br>
+; CHECK-NEXT: retq<br>
+;<br>
+; DISABLED-LABEL: test_conditional_block_float:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: cmpl $18, %edx<br>
+; DISABLED-NEXT: jl .LBB11_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movl $1065353216, 4(%rdi) # imm = 0x3F800000<br>
+; DISABLED-NEXT: .LBB11_2: # %if.end<br>
+; DISABLED-NEXT: movups (%r8), %xmm0<br>
+; DISABLED-NEXT: movups 16(%r8), %xmm1<br>
+; DISABLED-NEXT: movups %xmm1, 16(%rcx)<br>
+; DISABLED-NEXT: movups %xmm0, (%rcx)<br>
+; DISABLED-NEXT: movups (%rdi), %xmm0<br>
+; DISABLED-NEXT: movups 16(%rdi), %xmm1<br>
+; DISABLED-NEXT: movups %xmm1, 16(%rsi)<br>
+; DISABLED-NEXT: movups %xmm0, (%rsi)<br>
+; DISABLED-NEXT: retq<br>
+;<br>
+; CHECK-AVX2-LABEL: test_conditional_block_float:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX2-NEXT: jl .LBB11_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movl $1065353216, 4(%rdi) # imm = 0x3F800000<br>
+; CHECK-AVX2-NEXT: .LBB11_2: # %if.end<br>
+; CHECK-AVX2-NEXT: vmovups (%r8), %ymm0<br>
+; CHECK-AVX2-NEXT: vmovups %ymm0, (%rcx)<br>
+; CHECK-AVX2-NEXT: movl (%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, (%rsi)<br>
+; CHECK-AVX2-NEXT: movl 4(%rdi), %eax<br>
+; CHECK-AVX2-NEXT: movl %eax, 4(%rsi)<br>
+; CHECK-AVX2-NEXT: vmovups 8(%rdi), %xmm0<br>
+; CHECK-AVX2-NEXT: vmovups %xmm0, 8(%rsi)<br>
+; CHECK-AVX2-NEXT: movq 24(%rdi), %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, 24(%rsi)<br>
+; CHECK-AVX2-NEXT: vzeroupper<br>
+; CHECK-AVX2-NEXT: retq<br>
+;<br>
+; CHECK-AVX512-LABEL: test_conditional_block_float:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX512-NEXT: jl .LBB11_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movl $1065353216, 4(%rdi) # imm = 0x3F800000<br>
+; CHECK-AVX512-NEXT: .LBB11_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%r8), %ymm0<br>
+; CHECK-AVX512-NEXT: vmovups %ymm0, (%rcx)<br>
+; CHECK-AVX512-NEXT: movl (%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, (%rsi)<br>
+; CHECK-AVX512-NEXT: movl 4(%rdi), %eax<br>
+; CHECK-AVX512-NEXT: movl %eax, 4(%rsi)<br>
+; CHECK-AVX512-NEXT: vmovups 8(%rdi), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, 8(%rsi)<br>
+; CHECK-AVX512-NEXT: movq 24(%rdi), %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, 24(%rsi)<br>
+; CHECK-AVX512-NEXT: vzeroupper<br>
+; CHECK-AVX512-NEXT: retq<br>
+entry:<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %b = getelementptr inbounds %struct.S7, %struct.S7* %s1, i64 0, i32 1<br>
+ store float 1.0, float* %b, align 4<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S7* %s3 to i8*<br>
+ %1 = bitcast %struct.S7* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 32, i32 4, i1 false)<br>
+ %2 = bitcast %struct.S7* %s2 to i8*<br>
+ %3 = bitcast %struct.S7* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 32, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+%struct.S8 = type { i64, i64, i64, i64, i64, i64 }<br>
+<br>
+; Function Attrs: nounwind uwtable<br>
+define void @test_conditional_block_ymm(%s<wbr>truct.S8* nocapture %s1, %struct.S8* nocapture %s2, i32 %x, %struct.S8* nocapture %s3, %struct.S8* nocapture readonly %s4) local_unnamed_addr #0 {<br>
+; CHECK-LABEL: test_conditional_block_ymm:<br>
+; CHECK: # %bb.0: # %entry<br>
+; CHECK-NEXT: cmpl $18, %edx<br>
+; CHECK-NEXT: jl .LBB12_2<br>
+; CHECK-NEXT: # %bb.1: # %if.then<br>
+; CHECK-NEXT: movq $1, 8(%rdi)<br>
+; CHECK-NEXT: .LBB12_2: # %if.end<br>
+; CHECK-NEXT: movups (%r8), %xmm0<br>
+; CHECK-NEXT: movups 16(%r8), %xmm1<br>
+; CHECK-NEXT: movups %xmm1, 16(%rcx)<br>
+; CHECK-NEXT: movups %xmm0, (%rcx)<br>
+; CHECK-NEXT: movq (%rdi), %rax<br>
+; CHECK-NEXT: movq 8(%rdi), %rcx<br>
+; CHECK-NEXT: movups 16(%rdi), %xmm0<br>
+; CHECK-NEXT: movups %xmm0, 16(%rsi)<br>
+; CHECK-NEXT: movq %rax, (%rsi)<br>
+; CHECK-NEXT: movq %rcx, 8(%rsi)<br>
+; CHECK-NEXT: retq<br>
+;<br>
+; DISABLED-LABEL: test_conditional_block_ymm:<br>
+; DISABLED: # %bb.0: # %entry<br>
+; DISABLED-NEXT: cmpl $18, %edx<br>
+; DISABLED-NEXT: jl .LBB12_2<br>
+; DISABLED-NEXT: # %bb.1: # %if.then<br>
+; DISABLED-NEXT: movq $1, 8(%rdi)<br>
+; DISABLED-NEXT: .LBB12_2: # %if.end<br>
+; DISABLED-NEXT: movups (%r8), %xmm0<br>
+; DISABLED-NEXT: movups 16(%r8), %xmm1<br>
+; DISABLED-NEXT: movups %xmm1, 16(%rcx)<br>
+; DISABLED-NEXT: movups %xmm0, (%rcx)<br>
+; DISABLED-NEXT: movups (%rdi), %xmm0<br>
+; DISABLED-NEXT: movups 16(%rdi), %xmm1<br>
+; DISABLED-NEXT: movups %xmm1, 16(%rsi)<br>
+; DISABLED-NEXT: movups %xmm0, (%rsi)<br>
+; DISABLED-NEXT: retq<br>
+;<br>
+; CHECK-AVX2-LABEL: test_conditional_block_ymm:<br>
+; CHECK-AVX2: # %bb.0: # %entry<br>
+; CHECK-AVX2-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX2-NEXT: jl .LBB12_2<br>
+; CHECK-AVX2-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX2-NEXT: movq $1, 8(%rdi)<br>
+; CHECK-AVX2-NEXT: .LBB12_2: # %if.end<br>
+; CHECK-AVX2-NEXT: vmovups (%r8), %ymm0<br>
+; CHECK-AVX2-NEXT: vmovups %ymm0, (%rcx)<br>
+; CHECK-AVX2-NEXT: movq (%rdi), %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, (%rsi)<br>
+; CHECK-AVX2-NEXT: movq 8(%rdi), %rax<br>
+; CHECK-AVX2-NEXT: movq %rax, 8(%rsi)<br>
+; CHECK-AVX2-NEXT: vmovups 16(%rdi), %xmm0<br>
+; CHECK-AVX2-NEXT: vmovups %xmm0, 16(%rsi)<br>
+; CHECK-AVX2-NEXT: vzeroupper<br>
+; CHECK-AVX2-NEXT: retq<br>
+;<br>
+; CHECK-AVX512-LABEL: test_conditional_block_ymm:<br>
+; CHECK-AVX512: # %bb.0: # %entry<br>
+; CHECK-AVX512-NEXT: cmpl $18, %edx<br>
+; CHECK-AVX512-NEXT: jl .LBB12_2<br>
+; CHECK-AVX512-NEXT: # %bb.1: # %if.then<br>
+; CHECK-AVX512-NEXT: movq $1, 8(%rdi)<br>
+; CHECK-AVX512-NEXT: .LBB12_2: # %if.end<br>
+; CHECK-AVX512-NEXT: vmovups (%r8), %ymm0<br>
+; CHECK-AVX512-NEXT: vmovups %ymm0, (%rcx)<br>
+; CHECK-AVX512-NEXT: movq (%rdi), %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, (%rsi)<br>
+; CHECK-AVX512-NEXT: movq 8(%rdi), %rax<br>
+; CHECK-AVX512-NEXT: movq %rax, 8(%rsi)<br>
+; CHECK-AVX512-NEXT: vmovups 16(%rdi), %xmm0<br>
+; CHECK-AVX512-NEXT: vmovups %xmm0, 16(%rsi)<br>
+; CHECK-AVX512-NEXT: vzeroupper<br>
+; CHECK-AVX512-NEXT: retq<br>
+entry:<br>
+ %cmp = icmp sgt i32 %x, 17<br>
+ br i1 %cmp, label %if.then, label %if.end<br>
+<br>
+if.then: ; preds = %entry<br>
+ %b = getelementptr inbounds %struct.S8, %struct.S8* %s1, i64 0, i32 1<br>
+ store i64 1, i64* %b, align 4<br>
+ br label %if.end<br>
+<br>
+if.end: ; preds = %if.then, %entry<br>
+ %0 = bitcast %struct.S8* %s3 to i8*<br>
+ %1 = bitcast %struct.S8* %s4 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 32, i32 4, i1 false)<br>
+ %2 = bitcast %struct.S8* %s2 to i8*<br>
+ %3 = bitcast %struct.S8* %s1 to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 32, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
<br>
<br>
______________________________<wbr>_________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>
</blockquote></div><br></div></div>
</blockquote></div><br></div></div>
</blockquote></div><br></div></div>