<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:"Malgun Gothic";
panose-1:2 11 5 3 2 0 0 2 0 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@Malgun Gothic";
panose-1:2 11 5 3 2 0 0 2 0 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'>Hi James, <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'>Now I can reproduce it. Just aware of that scilab.s is in ref. <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'>Thanks,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'>Jun<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><b><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'>From:</span></b><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'> James Molloy [mailto:james@jamesmolloy.co.uk] <br><b>Sent:</b> Friday, October 23, 2015 6:48 AM<br><b>To:</b> Jun Bum Lim; llvm-commits@lists.llvm.org<br><b>Subject:</b> Re: [llvm] r250719 - [AArch64]Merge halfword loads into a 32-bit load<o:p></o:p></span></p><p class=MsoNormal><o:p> </o:p></p><div><p class=MsoNormal>Hi Jun,<o:p></o:p></p><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>This commit caused a codegen fault in spec2000::173.gcc, but only with -mcpu=cortex-a53. The difference is in scilab.s, and seems deterministically reproducable (although SPEC's official test driver appears to sometimes not detect it, which is why this bug report is so late :/)<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>I have reverted this in r251108 - feel free to recommit when the bug has been fixed.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>Cheers,<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>James<o:p></o:p></p></div></div><p class=MsoNormal><o:p> </o:p></p><div><div><p class=MsoNormal>On Mon, 19 Oct 2015 at 19:36 Jun Bum Lim via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>> wrote:<o:p></o:p></p></div><blockquote style='border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in'><p class=MsoNormal>Author: junbuml<br>Date: Mon Oct 19 13:34:53 2015<br>New Revision: 250719<br><br>URL: <a href="http://llvm.org/viewvc/llvm-project?rev=250719&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=250719&view=rev</a><br>Log:<br>[AArch64]Merge halfword loads into a 32-bit load<br><br>Convert two halfword loads into a single 32-bit word load with bitfield extract<br>instructions. For example :<br> ldrh w0, [x2]<br> ldrh w1, [x2, #2]<br>becomes<br> ldr w0, [x2]<br> ubfx w1, w0, #16, #16<br> and w0, w0, #ffff<br><br>Modified:<br> llvm/trunk/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp<br> llvm/trunk/test/CodeGen/AArch64/arm64-ldp.ll<br><br>Modified: llvm/trunk/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp?rev=250719&r1=250718&r2=250719&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp?rev=250719&r1=250718&r2=250719&view=diff</a><br>==============================================================================<br>--- llvm/trunk/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp (original)<br>+++ llvm/trunk/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp Mon Oct 19 13:34:53 2015<br>@@ -41,6 +41,7 @@ STATISTIC(NumPostFolded, "Number of post<br> STATISTIC(NumPreFolded, "Number of pre-index updates folded");<br> STATISTIC(NumUnscaledPairCreated,<br> "Number of load/store from unscaled generated");<br>+STATISTIC(NumSmallTypeMerged, "Number of small type loads merged");<br><br> static cl::opt<unsigned> ScanLimit("aarch64-load-store-scan-limit",<br> cl::init(20), cl::Hidden);<br>@@ -77,12 +78,13 @@ typedef struct LdStPairFlags {<br><br> struct AArch64LoadStoreOpt : public MachineFunctionPass {<br> static char ID;<br>- AArch64LoadStoreOpt() : MachineFunctionPass(ID) {<br>+ AArch64LoadStoreOpt() : MachineFunctionPass(ID), IsStrictAlign(false) {<br> initializeAArch64LoadStoreOptPass(*PassRegistry::getPassRegistry());<br> }<br><br> const AArch64InstrInfo *TII;<br> const TargetRegisterInfo *TRI;<br>+ bool IsStrictAlign;<br><br> // Scan the instructions looking for a load/store that can be combined<br> // with the current instruction into a load/store pair.<br>@@ -122,6 +124,9 @@ struct AArch64LoadStoreOpt : public Mach<br> mergeUpdateInsn(MachineBasicBlock::iterator I,<br> MachineBasicBlock::iterator Update, bool IsPreIdx);<br><br>+ // Find and merge foldable ldr/str instructions.<br>+ bool tryToMergeLdStInst(MachineBasicBlock::iterator &MBBI);<br>+<br> bool optimizeBlock(MachineBasicBlock &MBB);<br><br> bool runOnMachineFunction(MachineFunction &Fn) override;<br>@@ -151,6 +156,7 @@ static bool isUnscaledLdSt(unsigned Opc)<br> case AArch64::LDURWi:<br> case AArch64::LDURXi:<br> case AArch64::LDURSWi:<br>+ case AArch64::LDURHHi:<br> return true;<br> }<br> }<br>@@ -159,6 +165,20 @@ static bool isUnscaledLdSt(MachineInstr<br> return isUnscaledLdSt(MI->getOpcode());<br> }<br><br>+static bool isSmallTypeLdMerge(unsigned Opc) {<br>+ switch (Opc) {<br>+ default:<br>+ return false;<br>+ case AArch64::LDRHHui:<br>+ case AArch64::LDURHHi:<br>+ return true;<br>+ // FIXME: Add other instructions (e.g, LDRBBui, LDURSHWi, LDRSHWui, etc.).<br>+ }<br>+}<br>+static bool isSmallTypeLdMerge(MachineInstr *MI) {<br>+ return isSmallTypeLdMerge(MI->getOpcode());<br>+}<br>+<br> // Scaling factor for unscaled load or store.<br> static int getMemScale(MachineInstr *MI) {<br> switch (MI->getOpcode()) {<br>@@ -168,6 +188,7 @@ static int getMemScale(MachineInstr *MI)<br> case AArch64::STRBBui:<br> return 1;<br> case AArch64::LDRHHui:<br>+ case AArch64::LDURHHi:<br> case AArch64::STRHHui:<br> return 2;<br> case AArch64::LDRSui:<br>@@ -238,6 +259,8 @@ static unsigned getMatchingNonSExtOpcode<br> case AArch64::STURSi:<br> case AArch64::LDRSui:<br> case AArch64::LDURSi:<br>+ case AArch64::LDRHHui:<br>+ case AArch64::LDURHHi:<br> return Opc;<br> case AArch64::LDRSWui:<br> return AArch64::LDRWui;<br>@@ -283,6 +306,10 @@ static unsigned getMatchingPairOpcode(un<br> case AArch64::LDRSWui:<br> case AArch64::LDURSWi:<br> return AArch64::LDPSWi;<br>+ case AArch64::LDRHHui:<br>+ return AArch64::LDRWui;<br>+ case AArch64::LDURHHi:<br>+ return AArch64::LDURWi;<br> }<br> }<br><br>@@ -440,6 +467,21 @@ static const MachineOperand &getLdStOffs<br> return MI->getOperand(Idx);<br> }<br><br>+// Copy MachineMemOperands from Op0 and Op1 to a new array assigned to MI.<br>+static void concatenateMemOperands(MachineInstr *MI, MachineInstr *Op0,<br>+ MachineInstr *Op1) {<br>+ assert(MI->memoperands_empty() && "expected a new machineinstr");<br>+ size_t numMemRefs = (Op0->memoperands_end() - Op0->memoperands_begin()) +<br>+ (Op1->memoperands_end() - Op1->memoperands_begin());<br>+<br>+ MachineFunction *MF = MI->getParent()->getParent();<br>+ MachineSDNode::mmo_iterator MemBegin = MF->allocateMemRefsArray(numMemRefs);<br>+ MachineSDNode::mmo_iterator MemEnd =<br>+ std::copy(Op0->memoperands_begin(), Op0->memoperands_end(), MemBegin);<br>+ MemEnd = std::copy(Op1->memoperands_begin(), Op1->memoperands_end(), MemEnd);<br>+ MI->setMemRefs(MemBegin, MemEnd);<br>+}<br>+<br> MachineBasicBlock::iterator<br> AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,<br> MachineBasicBlock::iterator Paired,<br>@@ -484,8 +526,78 @@ AArch64LoadStoreOpt::mergePairedInsns(Ma<br> RtMI = I;<br> Rt2MI = Paired;<br> }<br>- // Handle Unscaled<br>+<br> int OffsetImm = getLdStOffsetOp(RtMI).getImm();<br>+<br>+ if (isSmallTypeLdMerge(Opc)) {<br>+ // Change the scaled offset from small to large type.<br>+ if (!IsUnscaled)<br>+ OffsetImm /= 2;<br>+ MachineInstr *RtNewDest = MergeForward ? I : Paired;<br>+ // Construct the new load instruction.<br>+ // FIXME: currently we support only halfword unsigned load. We need to<br>+ // handle byte type, signed, and store instructions as well.<br>+ MachineInstr *NewMemMI, *BitExtMI1, *BitExtMI2;<br>+ NewMemMI = BuildMI(*I->getParent(), I, I->getDebugLoc(), TII->get(NewOpc))<br>+ .addOperand(getLdStRegOp(RtNewDest))<br>+ .addOperand(BaseRegOp)<br>+ .addImm(OffsetImm);<br>+<br>+ // Copy MachineMemOperands from the original loads.<br>+ concatenateMemOperands(NewMemMI, I, Paired);<br>+<br>+ DEBUG(<br>+ dbgs()<br>+ << "Creating the new load and extract. Replacing instructions:\n ");<br>+ DEBUG(I->print(dbgs()));<br>+ DEBUG(dbgs() << " ");<br>+ DEBUG(Paired->print(dbgs()));<br>+ DEBUG(dbgs() << " with instructions:\n ");<br>+ DEBUG((NewMemMI)->print(dbgs()));<br>+<br>+ MachineInstr *ExtDestMI = MergeForward ? Paired : I;<br>+ if (ExtDestMI == Rt2MI) {<br>+ // Create the bitfield extract for high half.<br>+ BitExtMI1 = BuildMI(*I->getParent(), InsertionPoint, I->getDebugLoc(),<br>+ TII->get(AArch64::UBFMWri))<br>+ .addOperand(getLdStRegOp(Rt2MI))<br>+ .addReg(getLdStRegOp(RtNewDest).getReg())<br>+ .addImm(16)<br>+ .addImm(31);<br>+ // Create the bitfield extract for low half.<br>+ BitExtMI2 = BuildMI(*I->getParent(), InsertionPoint, I->getDebugLoc(),<br>+ TII->get(AArch64::ANDWri))<br>+ .addOperand(getLdStRegOp(RtMI))<br>+ .addReg(getLdStRegOp(RtNewDest).getReg())<br>+ .addImm(15);<br>+ } else {<br>+ // Create the bitfield extract for low half.<br>+ BitExtMI1 = BuildMI(*I->getParent(), InsertionPoint, I->getDebugLoc(),<br>+ TII->get(AArch64::ANDWri))<br>+ .addOperand(getLdStRegOp(RtMI))<br>+ .addReg(getLdStRegOp(RtNewDest).getReg())<br>+ .addImm(15);<br>+ // Create the bitfield extract for high half.<br>+ BitExtMI2 = BuildMI(*I->getParent(), InsertionPoint, I->getDebugLoc(),<br>+ TII->get(AArch64::UBFMWri))<br>+ .addOperand(getLdStRegOp(Rt2MI))<br>+ .addReg(getLdStRegOp(RtNewDest).getReg())<br>+ .addImm(16)<br>+ .addImm(31);<br>+ }<br>+ DEBUG(dbgs() << " ");<br>+ DEBUG((BitExtMI1)->print(dbgs()));<br>+ DEBUG(dbgs() << " ");<br>+ DEBUG((BitExtMI2)->print(dbgs()));<br>+ DEBUG(dbgs() << "\n");<br>+<br>+ // Erase the old instructions.<br>+ I->eraseFromParent();<br>+ Paired->eraseFromParent();<br>+ return NextI;<br>+ }<br>+<br>+ // Handle Unscaled<br> if (IsUnscaled)<br> OffsetImm /= OffsetStride;<br><br>@@ -622,8 +734,7 @@ static bool mayAlias(MachineInstr *MIa,<br> /// be combined with the current instruction into a load/store pair.<br> MachineBasicBlock::iterator<br> AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,<br>- LdStPairFlags &Flags,<br>- unsigned Limit) {<br>+ LdStPairFlags &Flags, unsigned Limit) {<br> MachineBasicBlock::iterator E = I->getParent()->end();<br> MachineBasicBlock::iterator MBBI = I;<br> MachineInstr *FirstMI = I;<br>@@ -645,7 +756,8 @@ AArch64LoadStoreOpt::findMatchingInsn(Ma<br> // range, plus allow an extra one in case we find a later insn that matches<br> // with Offset-1)<br> int OffsetStride = IsUnscaled ? getMemScale(FirstMI) : 1;<br>- if (!inBoundsForPair(IsUnscaled, Offset, OffsetStride))<br>+ if (!isSmallTypeLdMerge(Opc) &&<br>+ !inBoundsForPair(IsUnscaled, Offset, OffsetStride))<br> return E;<br><br> // Track which registers have been modified and used between the first insn<br>@@ -704,18 +816,32 @@ AArch64LoadStoreOpt::findMatchingInsn(Ma<br> // If the resultant immediate offset of merging these instructions<br> // is out of range for a pairwise instruction, bail and keep looking.<br> bool MIIsUnscaled = isUnscaledLdSt(MI);<br>- if (!inBoundsForPair(MIIsUnscaled, MinOffset, OffsetStride)) {<br>+ bool IsSmallTypeLd = isSmallTypeLdMerge(MI->getOpcode());<br>+ if (!IsSmallTypeLd &&<br>+ !inBoundsForPair(MIIsUnscaled, MinOffset, OffsetStride)) {<br> trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);<br> MemInsns.push_back(MI);<br> continue;<br> }<br>- // If the alignment requirements of the paired (scaled) instruction<br>- // can't express the offset of the unscaled input, bail and keep<br>- // looking.<br>- if (IsUnscaled && (alignTo(MinOffset, OffsetStride) != MinOffset)) {<br>- trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);<br>- MemInsns.push_back(MI);<br>- continue;<br>+<br>+ if (IsSmallTypeLd) {<br>+ // If the alignment requirements of the larger type scaled load<br>+ // instruction can't express the scaled offset of the smaller type<br>+ // input, bail and keep looking.<br>+ if (!IsUnscaled && alignTo(MinOffset, 2) != MinOffset) {<br>+ trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);<br>+ MemInsns.push_back(MI);<br>+ continue;<br>+ }<br>+ } else {<br>+ // If the alignment requirements of the paired (scaled) instruction<br>+ // can't express the offset of the unscaled input, bail and keep<br>+ // looking.<br>+ if (IsUnscaled && (alignTo(MinOffset, OffsetStride) != MinOffset)) {<br>+ trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);<br>+ MemInsns.push_back(MI);<br>+ continue;<br>+ }<br> }<br> // If the destination register of the loads is the same register, bail<br> // and keep looking. A load-pair instruction with both destination<br>@@ -996,17 +1122,64 @@ MachineBasicBlock::iterator AArch64LoadS<br> return E;<br> }<br><br>+bool AArch64LoadStoreOpt::tryToMergeLdStInst(<br>+ MachineBasicBlock::iterator &MBBI) {<br>+ MachineInstr *MI = MBBI;<br>+ MachineBasicBlock::iterator E = MI->getParent()->end();<br>+ // If this is a volatile load/store, don't mess with it.<br>+ if (MI->hasOrderedMemoryRef())<br>+ return false;<br>+<br>+ // Make sure this is a reg+imm (as opposed to an address reloc).<br>+ if (!getLdStOffsetOp(MI).isImm())<br>+ return false;<br>+<br>+ // Check if this load/store has a hint to avoid pair formation.<br>+ // MachineMemOperands hints are set by the AArch64StorePairSuppress pass.<br>+ if (TII->isLdStPairSuppressed(MI))<br>+ return false;<br>+<br>+ // Look ahead up to ScanLimit instructions for a pairable instruction.<br>+ LdStPairFlags Flags;<br>+ MachineBasicBlock::iterator Paired = findMatchingInsn(MBBI, Flags, ScanLimit);<br>+ if (Paired != E) {<br>+ if (isSmallTypeLdMerge(MI)) {<br>+ ++NumSmallTypeMerged;<br>+ } else {<br>+ ++NumPairCreated;<br>+ if (isUnscaledLdSt(MI))<br>+ ++NumUnscaledPairCreated;<br>+ }<br>+<br>+ // Merge the loads into a pair. Keeping the iterator straight is a<br>+ // pain, so we let the merge routine tell us what the next instruction<br>+ // is after it's done mucking about.<br>+ MBBI = mergePairedInsns(MBBI, Paired, Flags);<br>+ return true;<br>+ }<br>+ return false;<br>+}<br>+<br> bool AArch64LoadStoreOpt::optimizeBlock(MachineBasicBlock &MBB) {<br> bool Modified = false;<br>- // Two tranformations to do here:<br>- // 1) Find loads and stores that can be merged into a single load or store<br>+ // Three tranformations to do here:<br>+ // 1) Find halfword loads that can be merged into a single 32-bit word load<br>+ // with bitfield extract instructions.<br>+ // e.g.,<br>+ // ldrh w0, [x2]<br>+ // ldrh w1, [x2, #2]<br>+ // ; becomes<br>+ // ldr w0, [x2]<br>+ // ubfx w1, w0, #16, #16<br>+ // and w0, w0, #ffff<br>+ // 2) Find loads and stores that can be merged into a single load or store<br> // pair instruction.<br> // e.g.,<br> // ldr x0, [x2]<br> // ldr x1, [x2, #8]<br> // ; becomes<br> // ldp x0, x1, [x2]<br>- // 2) Find base register updates that can be merged into the load or store<br>+ // 3) Find base register updates that can be merged into the load or store<br> // as a base-reg writeback.<br> // e.g.,<br> // ldr x0, [x2]<br>@@ -1015,6 +1188,29 @@ bool AArch64LoadStoreOpt::optimizeBlock(<br> // ldr x0, [x2], #4<br><br> for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();<br>+ !IsStrictAlign && MBBI != E;) {<br>+ MachineInstr *MI = MBBI;<br>+ switch (MI->getOpcode()) {<br>+ default:<br>+ // Just move on to the next instruction.<br>+ ++MBBI;<br>+ break;<br>+ // Scaled instructions.<br>+ case AArch64::LDRHHui:<br>+ // Unscaled instructions.<br>+ case AArch64::LDURHHi: {<br>+ if (tryToMergeLdStInst(MBBI)) {<br>+ Modified = true;<br>+ break;<br>+ }<br>+ ++MBBI;<br>+ break;<br>+ }<br>+ // FIXME: Do the other instructions.<br>+ }<br>+ }<br>+<br>+ for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();<br> MBBI != E;) {<br> MachineInstr *MI = MBBI;<br> switch (MI->getOpcode()) {<br>@@ -1046,35 +1242,7 @@ bool AArch64LoadStoreOpt::optimizeBlock(<br> case AArch64::LDURWi:<br> case AArch64::LDURXi:<br> case AArch64::LDURSWi: {<br>- // If this is a volatile load/store, don't mess with it.<br>- if (MI->hasOrderedMemoryRef()) {<br>- ++MBBI;<br>- break;<br>- }<br>- // Make sure this is a reg+imm (as opposed to an address reloc).<br>- if (!getLdStOffsetOp(MI).isImm()) {<br>- ++MBBI;<br>- break;<br>- }<br>- // Check if this load/store has a hint to avoid pair formation.<br>- // MachineMemOperands hints are set by the AArch64StorePairSuppress pass.<br>- if (TII->isLdStPairSuppressed(MI)) {<br>- ++MBBI;<br>- break;<br>- }<br>- // Look ahead up to ScanLimit instructions for a pairable instruction.<br>- LdStPairFlags Flags;<br>- MachineBasicBlock::iterator Paired =<br>- findMatchingInsn(MBBI, Flags, ScanLimit);<br>- if (Paired != E) {<br>- ++NumPairCreated;<br>- if (isUnscaledLdSt(MI))<br>- ++NumUnscaledPairCreated;<br>-<br>- // Merge the loads into a pair. Keeping the iterator straight is a<br>- // pain, so we let the merge routine tell us what the next instruction<br>- // is after it's done mucking about.<br>- MBBI = mergePairedInsns(MBBI, Paired, Flags);<br>+ if (tryToMergeLdStInst(MBBI)) {<br> Modified = true;<br> break;<br> }<br>@@ -1206,6 +1374,8 @@ bool AArch64LoadStoreOpt::optimizeBlock(<br> bool AArch64LoadStoreOpt::runOnMachineFunction(MachineFunction &Fn) {<br> TII = static_cast<const AArch64InstrInfo *>(Fn.getSubtarget().getInstrInfo());<br> TRI = Fn.getSubtarget().getRegisterInfo();<br>+ IsStrictAlign = (static_cast<const AArch64Subtarget &>(Fn.getSubtarget()))<br>+ .requiresStrictAlign();<br><br> bool Modified = false;<br> for (auto &MBB : Fn)<br><br>Modified: llvm/trunk/test/CodeGen/AArch64/arm64-ldp.ll<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/arm64-ldp.ll?rev=250719&r1=250718&r2=250719&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/arm64-ldp.ll?rev=250719&r1=250718&r2=250719&view=diff</a><br>==============================================================================<br>--- llvm/trunk/test/CodeGen/AArch64/arm64-ldp.ll (original)<br>+++ llvm/trunk/test/CodeGen/AArch64/arm64-ldp.ll Mon Oct 19 13:34:53 2015<br>@@ -355,3 +355,52 @@ define i64 @ldp_sext_int_post(i32* %p) n<br> %add = add nsw i64 %sexttmp1, %sexttmp<br> ret i64 %add<br> }<br>+<br>+; CHECK-LABEL: Ldrh_merge<br>+; CHECK-NOT: ldrh<br>+; CHECK: ldr [[NEW_DEST:w[0-9]+]]<br>+; CHECK: and w{{[0-9]+}}, [[NEW_DEST]], #0xffff<br>+; CHECK: lsr w{{[0-9]+}}, [[NEW_DEST]]<br>+<br>+define i16 @Ldrh_merge(i16* nocapture readonly %p) {<br>+ %1 = load i16, i16* %p, align 2<br>+ ;%conv = zext i16 %0 to i32<br>+ %arrayidx2 = getelementptr inbounds i16, i16* %p, i64 1<br>+ %2 = load i16, i16* %arrayidx2, align 2<br>+ %add = add nuw nsw i16 %1, %2<br>+ ret i16 %add<br>+}<br>+<br>+; CHECK-LABEL: Ldurh_merge<br>+; CHECK-NOT: ldurh<br>+; CHECK: ldur [[NEW_DEST:w[0-9]+]]<br>+; CHECK: and w{{[0-9]+}}, [[NEW_DEST]], #0xffff<br>+; CHECK: lsr w{{[0-9]+}}, [[NEW_DEST]]<br>+define i16 @Ldurh_merge(i16* nocapture readonly %p) {<br>+entry:<br>+ %arrayidx = getelementptr inbounds i16, i16* %p, i64 -2<br>+ %0 = load i16, i16* %arrayidx<br>+ %arrayidx3 = getelementptr inbounds i16, i16* %p, i64 -1<br>+ %1 = load i16, i16* %arrayidx3<br>+ %add = add nuw nsw i16 %0, %1<br>+ ret i16 %add<br>+}<br>+<br>+; CHECK-LABEL: Ldrh_4_merge<br>+; CHECK-NOT: ldrh<br>+; CHECK: ldp [[NEW_DEST:w[0-9]+]]<br>+define i16 @Ldrh_4_merge(i16* nocapture readonly %P) {<br>+ %arrayidx = getelementptr inbounds i16, i16* %P, i64 0<br>+ %l0 = load i16, i16* %arrayidx<br>+ %arrayidx2 = getelementptr inbounds i16, i16* %P, i64 1<br>+ %l1 = load i16, i16* %arrayidx2<br>+ %arrayidx7 = getelementptr inbounds i16, i16* %P, i64 2<br>+ %l2 = load i16, i16* %arrayidx7<br>+ %arrayidx12 = getelementptr inbounds i16, i16* %P, i64 3<br>+ %l3 = load i16, i16* %arrayidx12<br>+ %add4 = add nuw nsw i16 %l1, %l0<br>+ %add9 = add nuw nsw i16 %add4, %l2<br>+ %add14 = add nuw nsw i16 %add9, %l3<br>+<br>+ ret i16 %add14<br>+}<br><br><br>_______________________________________________<br>llvm-commits mailing list<br><a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><o:p></o:p></p></blockquote></div></div></body></html>