<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:"Malgun Gothic";

        panose-1:2 11 5 3 2 0 0 2 0 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:"\@Malgun Gothic";

        panose-1:2 11 5 3 2 0 0 2 0 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:12.0pt;

        font-family:"Times New Roman",serif;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

span.EmailStyle17

        {mso-style-type:personal-reply;

        font-family:"Calibri",sans-serif;

        color:#1F497D;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-family:"Calibri",sans-serif;}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'>Hi James,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'>I cannot reproduce the same failure in my spec2000/176.gcc with –mcpu=cortex-a53.  Can you give me little more detail about the failure? If possible, any reduced test-case will be helpful for me to reproduce it. <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'>Thanks,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'>Jun<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><b><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'>From:</span></b><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'> James Molloy [mailto:james@jamesmolloy.co.uk] <br><b>Sent:</b> Friday, October 23, 2015 6:48 AM<br><b>To:</b> Jun Bum Lim; llvm-commits@lists.llvm.org<br><b>Subject:</b> Re: [llvm] r250719 - [AArch64]Merge halfword loads into a 32-bit load<o:p></o:p></span></p><p class=MsoNormal><o:p> </o:p></p><div><p class=MsoNormal>Hi Jun,<o:p></o:p></p><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>This commit caused a codegen fault in spec2000::173.gcc, but only with -mcpu=cortex-a53. The difference is in scilab.s, and seems deterministically reproducable (although SPEC's official test driver appears to sometimes not detect it, which is why this bug report is so late :/)<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>I have reverted this in r251108 - feel free to recommit when the bug has been fixed.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>Cheers,<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>James<o:p></o:p></p></div></div><p class=MsoNormal><o:p> </o:p></p><div><div><p class=MsoNormal>On Mon, 19 Oct 2015 at 19:36 Jun Bum Lim via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>> wrote:<o:p></o:p></p></div><blockquote style='border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in'><p class=MsoNormal>Author: junbuml<br>Date: Mon Oct 19 13:34:53 2015<br>New Revision: 250719<br><br>URL: <a href="http://llvm.org/viewvc/llvm-project?rev=250719&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=250719&view=rev</a><br>Log:<br>[AArch64]Merge halfword loads into a 32-bit load<br><br>Convert two halfword loads into a single 32-bit word load with bitfield extract<br>instructions. For example :<br>  ldrh w0, [x2]<br>  ldrh w1, [x2, #2]<br>becomes<br>  ldr w0, [x2]<br>  ubfx w1, w0, #16, #16<br>  and  w0, w0, #ffff<br><br>Modified:<br>    llvm/trunk/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp<br>    llvm/trunk/test/CodeGen/AArch64/arm64-ldp.ll<br><br>Modified: llvm/trunk/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp?rev=250719&r1=250718&r2=250719&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp?rev=250719&r1=250718&r2=250719&view=diff</a><br>==============================================================================<br>--- llvm/trunk/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp (original)<br>+++ llvm/trunk/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp Mon Oct 19 13:34:53 2015<br>@@ -41,6 +41,7 @@ STATISTIC(NumPostFolded, "Number of post<br> STATISTIC(NumPreFolded, "Number of pre-index updates folded");<br> STATISTIC(NumUnscaledPairCreated,<br>           "Number of load/store from unscaled generated");<br>+STATISTIC(NumSmallTypeMerged, "Number of small type loads merged");<br><br> static cl::opt<unsigned> ScanLimit("aarch64-load-store-scan-limit",<br>                                    cl::init(20), cl::Hidden);<br>@@ -77,12 +78,13 @@ typedef struct LdStPairFlags {<br><br> struct AArch64LoadStoreOpt : public MachineFunctionPass {<br>   static char ID;<br>-  AArch64LoadStoreOpt() : MachineFunctionPass(ID) {<br>+  AArch64LoadStoreOpt() : MachineFunctionPass(ID), IsStrictAlign(false) {<br>     initializeAArch64LoadStoreOptPass(*PassRegistry::getPassRegistry());<br>   }<br><br>   const AArch64InstrInfo *TII;<br>   const TargetRegisterInfo *TRI;<br>+  bool IsStrictAlign;<br><br>   // Scan the instructions looking for a load/store that can be combined<br>   // with the current instruction into a load/store pair.<br>@@ -122,6 +124,9 @@ struct AArch64LoadStoreOpt : public Mach<br>   mergeUpdateInsn(MachineBasicBlock::iterator I,<br>                   MachineBasicBlock::iterator Update, bool IsPreIdx);<br><br>+  // Find and merge foldable ldr/str instructions.<br>+  bool tryToMergeLdStInst(MachineBasicBlock::iterator &MBBI);<br>+<br>   bool optimizeBlock(MachineBasicBlock &MBB);<br><br>   bool runOnMachineFunction(MachineFunction &Fn) override;<br>@@ -151,6 +156,7 @@ static bool isUnscaledLdSt(unsigned Opc)<br>   case AArch64::LDURWi:<br>   case AArch64::LDURXi:<br>   case AArch64::LDURSWi:<br>+  case AArch64::LDURHHi:<br>     return true;<br>   }<br> }<br>@@ -159,6 +165,20 @@ static bool isUnscaledLdSt(MachineInstr<br>   return isUnscaledLdSt(MI->getOpcode());<br> }<br><br>+static bool isSmallTypeLdMerge(unsigned Opc) {<br>+  switch (Opc) {<br>+  default:<br>+    return false;<br>+  case AArch64::LDRHHui:<br>+  case AArch64::LDURHHi:<br>+    return true;<br>+    // FIXME: Add other instructions (e.g, LDRBBui, LDURSHWi, LDRSHWui, etc.).<br>+  }<br>+}<br>+static bool isSmallTypeLdMerge(MachineInstr *MI) {<br>+  return isSmallTypeLdMerge(MI->getOpcode());<br>+}<br>+<br> // Scaling factor for unscaled load or store.<br> static int getMemScale(MachineInstr *MI) {<br>   switch (MI->getOpcode()) {<br>@@ -168,6 +188,7 @@ static int getMemScale(MachineInstr *MI)<br>   case AArch64::STRBBui:<br>     return 1;<br>   case AArch64::LDRHHui:<br>+  case AArch64::LDURHHi:<br>   case AArch64::STRHHui:<br>     return 2;<br>   case AArch64::LDRSui:<br>@@ -238,6 +259,8 @@ static unsigned getMatchingNonSExtOpcode<br>   case AArch64::STURSi:<br>   case AArch64::LDRSui:<br>   case AArch64::LDURSi:<br>+  case AArch64::LDRHHui:<br>+  case AArch64::LDURHHi:<br>     return Opc;<br>   case AArch64::LDRSWui:<br>     return AArch64::LDRWui;<br>@@ -283,6 +306,10 @@ static unsigned getMatchingPairOpcode(un<br>   case AArch64::LDRSWui:<br>   case AArch64::LDURSWi:<br>     return AArch64::LDPSWi;<br>+  case AArch64::LDRHHui:<br>+    return AArch64::LDRWui;<br>+  case AArch64::LDURHHi:<br>+    return AArch64::LDURWi;<br>   }<br> }<br><br>@@ -440,6 +467,21 @@ static const MachineOperand &getLdStOffs<br>   return MI->getOperand(Idx);<br> }<br><br>+// Copy MachineMemOperands from Op0 and Op1 to a new array assigned to MI.<br>+static void concatenateMemOperands(MachineInstr *MI, MachineInstr *Op0,<br>+                                   MachineInstr *Op1) {<br>+  assert(MI->memoperands_empty() && "expected a new machineinstr");<br>+  size_t numMemRefs = (Op0->memoperands_end() - Op0->memoperands_begin()) +<br>+                      (Op1->memoperands_end() - Op1->memoperands_begin());<br>+<br>+  MachineFunction *MF = MI->getParent()->getParent();<br>+  MachineSDNode::mmo_iterator MemBegin = MF->allocateMemRefsArray(numMemRefs);<br>+  MachineSDNode::mmo_iterator MemEnd =<br>+      std::copy(Op0->memoperands_begin(), Op0->memoperands_end(), MemBegin);<br>+  MemEnd = std::copy(Op1->memoperands_begin(), Op1->memoperands_end(), MemEnd);<br>+  MI->setMemRefs(MemBegin, MemEnd);<br>+}<br>+<br> MachineBasicBlock::iterator<br> AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,<br>                                       MachineBasicBlock::iterator Paired,<br>@@ -484,8 +526,78 @@ AArch64LoadStoreOpt::mergePairedInsns(Ma<br>     RtMI = I;<br>     Rt2MI = Paired;<br>   }<br>-  // Handle Unscaled<br>+<br>   int OffsetImm = getLdStOffsetOp(RtMI).getImm();<br>+<br>+  if (isSmallTypeLdMerge(Opc)) {<br>+    // Change the scaled offset from small to large type.<br>+    if (!IsUnscaled)<br>+      OffsetImm /= 2;<br>+    MachineInstr *RtNewDest = MergeForward ? I : Paired;<br>+    // Construct the new load instruction.<br>+    // FIXME: currently we support only halfword unsigned load. We need to<br>+    // handle byte type, signed, and store instructions as well.<br>+    MachineInstr *NewMemMI, *BitExtMI1, *BitExtMI2;<br>+    NewMemMI = BuildMI(*I->getParent(), I, I->getDebugLoc(), TII->get(NewOpc))<br>+                   .addOperand(getLdStRegOp(RtNewDest))<br>+                   .addOperand(BaseRegOp)<br>+                   .addImm(OffsetImm);<br>+<br>+    // Copy MachineMemOperands from the original loads.<br>+    concatenateMemOperands(NewMemMI, I, Paired);<br>+<br>+    DEBUG(<br>+        dbgs()<br>+        << "Creating the new load and extract. Replacing instructions:\n    ");<br>+    DEBUG(I->print(dbgs()));<br>+    DEBUG(dbgs() << "    ");<br>+    DEBUG(Paired->print(dbgs()));<br>+    DEBUG(dbgs() << "  with instructions:\n    ");<br>+    DEBUG((NewMemMI)->print(dbgs()));<br>+<br>+    MachineInstr *ExtDestMI = MergeForward ? Paired : I;<br>+    if (ExtDestMI == Rt2MI) {<br>+      // Create the bitfield extract for high half.<br>+      BitExtMI1 = BuildMI(*I->getParent(), InsertionPoint, I->getDebugLoc(),<br>+                          TII->get(AArch64::UBFMWri))<br>+                      .addOperand(getLdStRegOp(Rt2MI))<br>+                      .addReg(getLdStRegOp(RtNewDest).getReg())<br>+                      .addImm(16)<br>+                      .addImm(31);<br>+      // Create the bitfield extract for low half.<br>+      BitExtMI2 = BuildMI(*I->getParent(), InsertionPoint, I->getDebugLoc(),<br>+                          TII->get(AArch64::ANDWri))<br>+                      .addOperand(getLdStRegOp(RtMI))<br>+                      .addReg(getLdStRegOp(RtNewDest).getReg())<br>+                      .addImm(15);<br>+    } else {<br>+      // Create the bitfield extract for low half.<br>+      BitExtMI1 = BuildMI(*I->getParent(), InsertionPoint, I->getDebugLoc(),<br>+                          TII->get(AArch64::ANDWri))<br>+                      .addOperand(getLdStRegOp(RtMI))<br>+                      .addReg(getLdStRegOp(RtNewDest).getReg())<br>+                      .addImm(15);<br>+      // Create the bitfield extract for high half.<br>+      BitExtMI2 = BuildMI(*I->getParent(), InsertionPoint, I->getDebugLoc(),<br>+                          TII->get(AArch64::UBFMWri))<br>+                      .addOperand(getLdStRegOp(Rt2MI))<br>+                      .addReg(getLdStRegOp(RtNewDest).getReg())<br>+                      .addImm(16)<br>+                      .addImm(31);<br>+    }<br>+    DEBUG(dbgs() << "    ");<br>+    DEBUG((BitExtMI1)->print(dbgs()));<br>+    DEBUG(dbgs() << "    ");<br>+    DEBUG((BitExtMI2)->print(dbgs()));<br>+    DEBUG(dbgs() << "\n");<br>+<br>+    // Erase the old instructions.<br>+    I->eraseFromParent();<br>+    Paired->eraseFromParent();<br>+    return NextI;<br>+  }<br>+<br>+  // Handle Unscaled<br>   if (IsUnscaled)<br>     OffsetImm /= OffsetStride;<br><br>@@ -622,8 +734,7 @@ static bool mayAlias(MachineInstr *MIa,<br> /// be combined with the current instruction into a load/store pair.<br> MachineBasicBlock::iterator<br> AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,<br>-                                      LdStPairFlags &Flags,<br>-                                      unsigned Limit) {<br>+                                      LdStPairFlags &Flags, unsigned Limit) {<br>   MachineBasicBlock::iterator E = I->getParent()->end();<br>   MachineBasicBlock::iterator MBBI = I;<br>   MachineInstr *FirstMI = I;<br>@@ -645,7 +756,8 @@ AArch64LoadStoreOpt::findMatchingInsn(Ma<br>   // range, plus allow an extra one in case we find a later insn that matches<br>   // with Offset-1)<br>   int OffsetStride = IsUnscaled ? getMemScale(FirstMI) : 1;<br>-  if (!inBoundsForPair(IsUnscaled, Offset, OffsetStride))<br>+  if (!isSmallTypeLdMerge(Opc) &&<br>+      !inBoundsForPair(IsUnscaled, Offset, OffsetStride))<br>     return E;<br><br>   // Track which registers have been modified and used between the first insn<br>@@ -704,18 +816,32 @@ AArch64LoadStoreOpt::findMatchingInsn(Ma<br>         // If the resultant immediate offset of merging these instructions<br>         // is out of range for a pairwise instruction, bail and keep looking.<br>         bool MIIsUnscaled = isUnscaledLdSt(MI);<br>-        if (!inBoundsForPair(MIIsUnscaled, MinOffset, OffsetStride)) {<br>+        bool IsSmallTypeLd = isSmallTypeLdMerge(MI->getOpcode());<br>+        if (!IsSmallTypeLd &&<br>+            !inBoundsForPair(MIIsUnscaled, MinOffset, OffsetStride)) {<br>           trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);<br>           MemInsns.push_back(MI);<br>           continue;<br>         }<br>-        // If the alignment requirements of the paired (scaled) instruction<br>-        // can't express the offset of the unscaled input, bail and keep<br>-        // looking.<br>-        if (IsUnscaled && (alignTo(MinOffset, OffsetStride) != MinOffset)) {<br>-          trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);<br>-          MemInsns.push_back(MI);<br>-          continue;<br>+<br>+        if (IsSmallTypeLd) {<br>+          // If the alignment requirements of the larger type scaled load<br>+          // instruction can't express the scaled offset of the smaller type<br>+          // input, bail and keep looking.<br>+          if (!IsUnscaled && alignTo(MinOffset, 2) != MinOffset) {<br>+            trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);<br>+            MemInsns.push_back(MI);<br>+            continue;<br>+          }<br>+        } else {<br>+          // If the alignment requirements of the paired (scaled) instruction<br>+          // can't express the offset of the unscaled input, bail and keep<br>+          // looking.<br>+          if (IsUnscaled && (alignTo(MinOffset, OffsetStride) != MinOffset)) {<br>+            trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI);<br>+            MemInsns.push_back(MI);<br>+            continue;<br>+          }<br>         }<br>         // If the destination register of the loads is the same register, bail<br>         // and keep looking. A load-pair instruction with both destination<br>@@ -996,17 +1122,64 @@ MachineBasicBlock::iterator AArch64LoadS<br>   return E;<br> }<br><br>+bool AArch64LoadStoreOpt::tryToMergeLdStInst(<br>+    MachineBasicBlock::iterator &MBBI) {<br>+  MachineInstr *MI = MBBI;<br>+  MachineBasicBlock::iterator E = MI->getParent()->end();<br>+  // If this is a volatile load/store, don't mess with it.<br>+  if (MI->hasOrderedMemoryRef())<br>+    return false;<br>+<br>+  // Make sure this is a reg+imm (as opposed to an address reloc).<br>+  if (!getLdStOffsetOp(MI).isImm())<br>+    return false;<br>+<br>+  // Check if this load/store has a hint to avoid pair formation.<br>+  // MachineMemOperands hints are set by the AArch64StorePairSuppress pass.<br>+  if (TII->isLdStPairSuppressed(MI))<br>+    return false;<br>+<br>+  // Look ahead up to ScanLimit instructions for a pairable instruction.<br>+  LdStPairFlags Flags;<br>+  MachineBasicBlock::iterator Paired = findMatchingInsn(MBBI, Flags, ScanLimit);<br>+  if (Paired != E) {<br>+    if (isSmallTypeLdMerge(MI)) {<br>+      ++NumSmallTypeMerged;<br>+    } else {<br>+      ++NumPairCreated;<br>+      if (isUnscaledLdSt(MI))<br>+        ++NumUnscaledPairCreated;<br>+    }<br>+<br>+    // Merge the loads into a pair. Keeping the iterator straight is a<br>+    // pain, so we let the merge routine tell us what the next instruction<br>+    // is after it's done mucking about.<br>+    MBBI = mergePairedInsns(MBBI, Paired, Flags);<br>+    return true;<br>+  }<br>+  return false;<br>+}<br>+<br> bool AArch64LoadStoreOpt::optimizeBlock(MachineBasicBlock &MBB) {<br>   bool Modified = false;<br>-  // Two tranformations to do here:<br>-  // 1) Find loads and stores that can be merged into a single load or store<br>+  // Three tranformations to do here:<br>+  // 1) Find halfword loads that can be merged into a single 32-bit word load<br>+  //    with bitfield extract instructions.<br>+  //      e.g.,<br>+  //        ldrh w0, [x2]<br>+  //        ldrh w1, [x2, #2]<br>+  //        ; becomes<br>+  //        ldr w0, [x2]<br>+  //        ubfx w1, w0, #16, #16<br>+  //        and w0, w0, #ffff<br>+  // 2) Find loads and stores that can be merged into a single load or store<br>   //    pair instruction.<br>   //      e.g.,<br>   //        ldr x0, [x2]<br>   //        ldr x1, [x2, #8]<br>   //        ; becomes<br>   //        ldp x0, x1, [x2]<br>-  // 2) Find base register updates that can be merged into the load or store<br>+  // 3) Find base register updates that can be merged into the load or store<br>   //    as a base-reg writeback.<br>   //      e.g.,<br>   //        ldr x0, [x2]<br>@@ -1015,6 +1188,29 @@ bool AArch64LoadStoreOpt::optimizeBlock(<br>   //        ldr x0, [x2], #4<br><br>   for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();<br>+       !IsStrictAlign && MBBI != E;) {<br>+    MachineInstr *MI = MBBI;<br>+    switch (MI->getOpcode()) {<br>+    default:<br>+      // Just move on to the next instruction.<br>+      ++MBBI;<br>+      break;<br>+    // Scaled instructions.<br>+    case AArch64::LDRHHui:<br>+    // Unscaled instructions.<br>+    case AArch64::LDURHHi: {<br>+      if (tryToMergeLdStInst(MBBI)) {<br>+        Modified = true;<br>+        break;<br>+      }<br>+      ++MBBI;<br>+      break;<br>+    }<br>+      // FIXME: Do the other instructions.<br>+    }<br>+  }<br>+<br>+  for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();<br>        MBBI != E;) {<br>     MachineInstr *MI = MBBI;<br>     switch (MI->getOpcode()) {<br>@@ -1046,35 +1242,7 @@ bool AArch64LoadStoreOpt::optimizeBlock(<br>     case AArch64::LDURWi:<br>     case AArch64::LDURXi:<br>     case AArch64::LDURSWi: {<br>-      // If this is a volatile load/store, don't mess with it.<br>-      if (MI->hasOrderedMemoryRef()) {<br>-        ++MBBI;<br>-        break;<br>-      }<br>-      // Make sure this is a reg+imm (as opposed to an address reloc).<br>-      if (!getLdStOffsetOp(MI).isImm()) {<br>-        ++MBBI;<br>-        break;<br>-      }<br>-      // Check if this load/store has a hint to avoid pair formation.<br>-      // MachineMemOperands hints are set by the AArch64StorePairSuppress pass.<br>-      if (TII->isLdStPairSuppressed(MI)) {<br>-        ++MBBI;<br>-        break;<br>-      }<br>-      // Look ahead up to ScanLimit instructions for a pairable instruction.<br>-      LdStPairFlags Flags;<br>-      MachineBasicBlock::iterator Paired =<br>-          findMatchingInsn(MBBI, Flags, ScanLimit);<br>-      if (Paired != E) {<br>-        ++NumPairCreated;<br>-        if (isUnscaledLdSt(MI))<br>-          ++NumUnscaledPairCreated;<br>-<br>-        // Merge the loads into a pair. Keeping the iterator straight is a<br>-        // pain, so we let the merge routine tell us what the next instruction<br>-        // is after it's done mucking about.<br>-        MBBI = mergePairedInsns(MBBI, Paired, Flags);<br>+      if (tryToMergeLdStInst(MBBI)) {<br>         Modified = true;<br>         break;<br>       }<br>@@ -1206,6 +1374,8 @@ bool AArch64LoadStoreOpt::optimizeBlock(<br> bool AArch64LoadStoreOpt::runOnMachineFunction(MachineFunction &Fn) {<br>   TII = static_cast<const AArch64InstrInfo *>(Fn.getSubtarget().getInstrInfo());<br>   TRI = Fn.getSubtarget().getRegisterInfo();<br>+  IsStrictAlign = (static_cast<const AArch64Subtarget &>(Fn.getSubtarget()))<br>+                      .requiresStrictAlign();<br><br>   bool Modified = false;<br>   for (auto &MBB : Fn)<br><br>Modified: llvm/trunk/test/CodeGen/AArch64/arm64-ldp.ll<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/arm64-ldp.ll?rev=250719&r1=250718&r2=250719&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/arm64-ldp.ll?rev=250719&r1=250718&r2=250719&view=diff</a><br>==============================================================================<br>--- llvm/trunk/test/CodeGen/AArch64/arm64-ldp.ll (original)<br>+++ llvm/trunk/test/CodeGen/AArch64/arm64-ldp.ll Mon Oct 19 13:34:53 2015<br>@@ -355,3 +355,52 @@ define i64 @ldp_sext_int_post(i32* %p) n<br>   %add = add nsw i64 %sexttmp1, %sexttmp<br>   ret i64 %add<br> }<br>+<br>+; CHECK-LABEL: Ldrh_merge<br>+; CHECK-NOT: ldrh<br>+; CHECK: ldr [[NEW_DEST:w[0-9]+]]<br>+; CHECK: and w{{[0-9]+}}, [[NEW_DEST]], #0xffff<br>+; CHECK: lsr  w{{[0-9]+}}, [[NEW_DEST]]<br>+<br>+define i16 @Ldrh_merge(i16* nocapture readonly %p) {<br>+  %1 = load i16, i16* %p, align 2<br>+  ;%conv = zext i16 %0 to i32<br>+  %arrayidx2 = getelementptr inbounds i16, i16* %p, i64 1<br>+  %2 = load i16, i16* %arrayidx2, align 2<br>+  %add = add nuw nsw i16 %1, %2<br>+  ret i16 %add<br>+}<br>+<br>+; CHECK-LABEL: Ldurh_merge<br>+; CHECK-NOT: ldurh<br>+; CHECK: ldur [[NEW_DEST:w[0-9]+]]<br>+; CHECK: and w{{[0-9]+}}, [[NEW_DEST]], #0xffff<br>+; CHECK: lsr  w{{[0-9]+}}, [[NEW_DEST]]<br>+define i16 @Ldurh_merge(i16* nocapture readonly %p)  {<br>+entry:<br>+  %arrayidx = getelementptr inbounds i16, i16* %p, i64 -2<br>+  %0 = load i16, i16* %arrayidx<br>+  %arrayidx3 = getelementptr inbounds i16, i16* %p, i64 -1<br>+  %1 = load i16, i16* %arrayidx3<br>+  %add = add nuw nsw i16 %0, %1<br>+  ret i16 %add<br>+}<br>+<br>+; CHECK-LABEL: Ldrh_4_merge<br>+; CHECK-NOT: ldrh<br>+; CHECK: ldp [[NEW_DEST:w[0-9]+]]<br>+define i16 @Ldrh_4_merge(i16* nocapture readonly %P) {<br>+  %arrayidx = getelementptr inbounds i16, i16* %P, i64 0<br>+  %l0 = load i16, i16* %arrayidx<br>+  %arrayidx2 = getelementptr inbounds i16, i16* %P, i64 1<br>+  %l1 = load i16, i16* %arrayidx2<br>+  %arrayidx7 = getelementptr inbounds i16, i16* %P, i64 2<br>+  %l2 = load i16, i16* %arrayidx7<br>+  %arrayidx12 = getelementptr inbounds i16, i16* %P, i64 3<br>+  %l3 = load i16, i16* %arrayidx12<br>+  %add4 = add nuw nsw i16 %l1, %l0<br>+  %add9 = add nuw nsw i16 %add4, %l2<br>+  %add14 = add nuw nsw i16 %add9, %l3<br>+<br>+  ret i16 %add14<br>+}<br><br><br>_______________________________________________<br>llvm-commits mailing list<br><a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><o:p></o:p></p></blockquote></div></div></body></html>