[llvm] [ARM] Lower unaligned loads/stores to aeabi functions. (PR #172672)
Simi Pallipurath via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 17 07:03:57 PST 2025
https://github.com/simpal01 created https://github.com/llvm/llvm-project/pull/172672
When targeting architectures that do not support unaligned memory accesses or when explictly pass -mno-unaligned-access, it requires the compiler to expand each unaligned load/store into an inline sequences. For 32-bit operations this typically involves:
1. 4× LDRB (or 2× LDRH),
2. multiple shift/or instructions
These sequences are emitted at every unaligned access site, and therefore contribute significant code size in workloads that touch packed or misaligned structures.
When compiling with -Os or -Oz in combination with -mno-unaligned-access, this patch lowers unaligned 32 bit and 64 bit loads and stores to below AEABI heper calls:
aeabi_uread4
aeabi_uread8
aeabi_uwrite4
aeabi_uwrite8
And it provide a way to perform unaligned memory accesses on targets that do not support them, such as ARMv6-M or when compiling with -mno-unaligned-access. Although each use introduces a function call making it less straightforward than using raw loads and stores the call itself is often much smaller than the compiler emitted sequence of multiple ldrb/strb operations. As a result, these helpers can greatly reduce code-size providing they are invoked more than once across a program.
1. Functions become smaller in AEABI mode once they contain more than a few unaligned accesses.
2. The total image .text size becomes smaller whenever multiple functions call the same helpers.
This PR is derived from https://reviews.llvm.org/D57595, with some minor changes.
Co-authored-by: David Green
>From 702beef035969dfd8c202bd245bf2feccf7b81cf Mon Sep 17 00:00:00 2001
From: Simi Pallipurath <simi.pallipurath at arm.com>
Date: Thu, 27 Nov 2025 20:11:59 +0000
Subject: [PATCH] [ARM] Lower unaligned loads/stores to aeabi functions.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When targeting architectures that do not support unaligned
memory accesses or when explictly pass -mno-unaligned-access,
it requires the compiler to expand each unaligned load/store
into an inline sequences. For 32-bit operations this
typically involves:
1. 4× LDRB (or 2× LDRH),
2. multiple shift/or instructions
These sequences are emitted at every unaligned access site, and
therefore contribute significant code size in workloads that touch
packed or misaligned structures.
When compiling with -Os or -Oz in combination with
-mno-unaligned-access, this patch lowers unaligned 32 bit and 64 bit
loads and stores to below AEABI heper calls:
aeabi_uread4
aeabi_uread8
aeabi_uwrite4
aeabi_uwrite8
And it provide a way to perform unaligned memory accesses on
targets that do not support them, such as ARMv6-M or when
compiling with -mno-unaligned-access. Although each use
introduces a function call making it less straightforward
than using raw loads and stores the call itself is often
much smaller than the compiler emitted sequence of multiple
ldrb/strb operations. As a result, these helpers can greatly
reduce code-size providing they are invoked more than
once across a program.
1. Functions become smaller in AEABI mode once they contain more
than a few unaligned accesses.
2. The total image .text size becomes smaller whenever multiple
functions call the same helpers.
This PR is derived from https://reviews.llvm.org/D57595, with additional changes.
Co-authored-by: David Green
---
llvm/lib/Target/ARM/ARMISelLowering.cpp | 181 +++++++-
llvm/lib/Target/ARM/ARMISelLowering.h | 6 +-
.../CodeGen/ARM/i64_volatile_load_store.ll | 122 +++--
.../CodeGen/ARM/unaligned_load_store_aeabi.ll | 425 ++++++++++++++++++
4 files changed, 663 insertions(+), 71 deletions(-)
create mode 100644 llvm/test/CodeGen/ARM/unaligned_load_store_aeabi.ll
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index f28640ce7b107..f9d1c8f451f4c 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -993,6 +993,14 @@ ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM_,
setIndexedStoreAction(ISD::POST_INC, MVT::i32, Legal);
}
+ // Custom loads/stores to possible use __aeabi_uread/write*
+ if (Subtarget->isTargetAEABI() && !Subtarget->allowsUnalignedMem()) {
+ setOperationAction(ISD::STORE, MVT::i32, Custom);
+ setOperationAction(ISD::STORE, MVT::i64, Custom);
+ setOperationAction(ISD::LOAD, MVT::i32, Custom);
+ setOperationAction(ISD::LOAD, MVT::i64, Custom);
+ }
+
setOperationAction(ISD::SADDO, MVT::i32, Custom);
setOperationAction(ISD::UADDO, MVT::i32, Custom);
setOperationAction(ISD::SSUBO, MVT::i32, Custom);
@@ -10012,6 +10020,130 @@ void ARMTargetLowering::ExpandDIV_Windows(
Results.push_back(DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64, Lower, Upper));
}
+std::pair<SDValue, SDValue>
+ARMTargetLowering::LowerAEABIUnalignedLoad(SDValue Op,
+ SelectionDAG &DAG) const {
+ // If we have an unaligned load from a i32 or i64 that would normally be
+ // split into separate ldrb's, we can use the __aeabi_uread4/__aeabi_uread8
+ // functions instead.
+ LoadSDNode *LD = cast<LoadSDNode>(Op.getNode());
+ EVT MemVT = LD->getMemoryVT();
+ if (MemVT != MVT::i32 && MemVT != MVT::i64)
+ return std::make_pair(SDValue(), SDValue());
+
+ const auto &MF = DAG.getMachineFunction();
+ unsigned AS = LD->getAddressSpace();
+ Align Alignment = LD->getAlign();
+ const DataLayout &DL = DAG.getDataLayout();
+ bool AllowsUnaligned = Subtarget->allowsUnalignedMem();
+
+ const char *LibcallName = nullptr;
+ if ((MF.getFunction().hasMinSize() || MF.getFunction().hasOptSize()) &&
+ !AllowsUnaligned) {
+ if (MemVT == MVT::i32 && Alignment <= llvm::Align(2))
+ LibcallName = "__aeabi_uread4";
+ else if (MemVT == MVT::i64 && Alignment <= llvm::Align(2))
+ LibcallName = "__aeabi_uread8";
+ }
+
+ if (LibcallName) {
+ LLVM_DEBUG(dbgs() << "Expanding unsupported unaligned load to "
+ << LibcallName << "\n");
+ CallingConv::ID CC = CallingConv::ARM_AAPCS;
+ SDValue Callee = DAG.getExternalSymbol(LibcallName, getPointerTy(DL));
+ TargetLowering::ArgListTy Args;
+ TargetLowering::ArgListEntry Entry(
+ LD->getBasePtr(),
+ LD->getBasePtr().getValueType().getTypeForEVT(*DAG.getContext()));
+ SDLoc dl(Op);
+
+ Args.push_back(Entry);
+
+ Type *RetTy = MemVT.getTypeForEVT(*DAG.getContext());
+ TargetLowering::CallLoweringInfo CLI(DAG);
+ CLI.setDebugLoc(dl)
+ .setChain(LD->getChain())
+ .setCallee(CC, RetTy, Callee, std::move(Args));
+ auto Pair = LowerCallTo(CLI);
+
+ // If necessary, extend the node to 64bit
+ if (LD->getExtensionType() != ISD::NON_EXTLOAD) {
+ unsigned ExtType = LD->getExtensionType() == ISD::SEXTLOAD
+ ? ISD::SIGN_EXTEND
+ : ISD::ZERO_EXTEND;
+ SDValue EN = DAG.getNode(ExtType, dl, LD->getValueType(0), Pair.first);
+ Pair.first = EN;
+ }
+ return Pair;
+ }
+
+ // Default expand to individual loads
+ if (!allowsMemoryAccess(*DAG.getContext(), DL, MemVT, AS, Alignment))
+ return expandUnalignedLoad(LD, DAG);
+ return std::make_pair(SDValue(), SDValue());
+}
+
+SDValue ARMTargetLowering::LowerAEABIUnalignedStore(SDValue Op,
+ SelectionDAG &DAG) const {
+ // If we have an unaligned store to a i32 or i64 that would normally be
+ // split into separate ldrb's, we can use the __aeabi_uwrite4/__aeabi_uwrite8
+ // functions instead.
+ StoreSDNode *ST = cast<StoreSDNode>(Op.getNode());
+ EVT MemVT = ST->getMemoryVT();
+ if (MemVT != MVT::i32 && MemVT != MVT::i64)
+ return SDValue();
+
+ const auto &MF = DAG.getMachineFunction();
+ unsigned AS = ST->getAddressSpace();
+ Align Alignment = ST->getAlign();
+ const DataLayout &DL = DAG.getDataLayout();
+ bool AllowsUnaligned = Subtarget->allowsUnalignedMem();
+
+ const char *LibcallName = nullptr;
+ if ((MF.getFunction().hasMinSize() || MF.getFunction().hasOptSize()) &&
+ !AllowsUnaligned) {
+ if (MemVT == MVT::i32 && Alignment <= llvm::Align(2))
+ LibcallName = "__aeabi_uwrite4";
+ else if (MemVT == MVT::i64 && Alignment <= llvm::Align(2))
+ LibcallName = "__aeabi_uwrite8";
+ }
+
+ if (LibcallName) {
+ LLVM_DEBUG(dbgs() << "Expanding unsupported unaligned store to "
+ << LibcallName << "\n");
+ CallingConv::ID CC = CallingConv::ARM_AAPCS;
+ SDValue Callee = DAG.getExternalSymbol(LibcallName, getPointerTy(DL));
+ TargetLowering::ArgListTy Args;
+ SDLoc dl(Op);
+
+ // If necessary, trunc the value to 32bit
+ SDValue StoreVal = ST->getOperand(1);
+ if (ST->isTruncatingStore())
+ StoreVal = DAG.getNode(ISD::TRUNCATE, dl, MemVT, ST->getOperand(1));
+
+ TargetLowering::ArgListEntry Entry(
+ StoreVal, StoreVal.getValueType().getTypeForEVT(*DAG.getContext()));
+ Args.push_back(Entry);
+
+ Entry.Node = ST->getBasePtr();
+ Entry.Ty = ST->getBasePtr().getValueType().getTypeForEVT(*DAG.getContext());
+ Args.push_back(Entry);
+
+ Type *RetTy = Type::getVoidTy(*DAG.getContext());
+ TargetLowering::CallLoweringInfo CLI(DAG);
+ CLI.setDebugLoc(dl)
+ .setChain(ST->getChain())
+ .setCallee(CC, RetTy, Callee, std::move(Args));
+ std::pair<SDValue, SDValue> CallResult = LowerCallTo(CLI);
+ return CallResult.second;
+ }
+
+ // Default expand to individual stores
+ if (!allowsMemoryAccess(*DAG.getContext(), DL, MemVT, AS, Alignment))
+ return expandUnalignedStore(ST, DAG);
+ return SDValue();
+}
+
static SDValue LowerPredicateLoad(SDValue Op, SelectionDAG &DAG) {
LoadSDNode *LD = cast<LoadSDNode>(Op.getNode());
EVT MemVT = LD->getMemoryVT();
@@ -10054,11 +10186,11 @@ void ARMTargetLowering::LowerLOAD(SDNode *N, SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) const {
LoadSDNode *LD = cast<LoadSDNode>(N);
EVT MemVT = LD->getMemoryVT();
- assert(LD->isUnindexed() && "Loads should be unindexed at this point.");
if (MemVT == MVT::i64 && Subtarget->hasV5TEOps() &&
!Subtarget->isThumb1Only() && LD->isVolatile() &&
LD->getAlign() >= Subtarget->getDualLoadStoreAlignment()) {
+ assert(LD->isUnindexed() && "Loads should be unindexed at this point.");
SDLoc dl(N);
SDValue Result = DAG.getMemIntrinsicNode(
ARMISD::LDRD, dl, DAG.getVTList({MVT::i32, MVT::i32, MVT::Other}),
@@ -10067,6 +10199,12 @@ void ARMTargetLowering::LowerLOAD(SDNode *N, SmallVectorImpl<SDValue> &Results,
SDValue Hi = Result.getValue(DAG.getDataLayout().isLittleEndian() ? 1 : 0);
SDValue Pair = DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64, Lo, Hi);
Results.append({Pair, Result.getValue(2)});
+ } else if ((MemVT == MVT::i32 || MemVT == MVT::i64)) {
+ auto Pair = LowerAEABIUnalignedLoad(SDValue(N, 0), DAG);
+ if (Pair.first) {
+ Results.push_back(Pair.first);
+ Results.push_back(Pair.second);
+ }
}
}
@@ -10108,15 +10246,15 @@ static SDValue LowerPredicateStore(SDValue Op, SelectionDAG &DAG) {
ST->getMemOperand());
}
-static SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG,
- const ARMSubtarget *Subtarget) {
+SDValue ARMTargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG,
+ const ARMSubtarget *Subtarget) const {
StoreSDNode *ST = cast<StoreSDNode>(Op.getNode());
EVT MemVT = ST->getMemoryVT();
- assert(ST->isUnindexed() && "Stores should be unindexed at this point.");
if (MemVT == MVT::i64 && Subtarget->hasV5TEOps() &&
!Subtarget->isThumb1Only() && ST->isVolatile() &&
ST->getAlign() >= Subtarget->getDualLoadStoreAlignment()) {
+ assert(ST->isUnindexed() && "Stores should be unindexed at this point.");
SDNode *N = Op.getNode();
SDLoc dl(N);
@@ -10136,8 +10274,9 @@ static SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG,
((MemVT == MVT::v2i1 || MemVT == MVT::v4i1 || MemVT == MVT::v8i1 ||
MemVT == MVT::v16i1))) {
return LowerPredicateStore(Op, DAG);
+ } else if ((MemVT == MVT::i32 || MemVT == MVT::i64)) {
+ return LowerAEABIUnalignedStore(Op, DAG);
}
-
return SDValue();
}
@@ -10669,8 +10808,19 @@ SDValue ARMTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::UADDSAT:
case ISD::USUBSAT:
return LowerADDSUBSAT(Op, DAG, Subtarget);
- case ISD::LOAD:
- return LowerPredicateLoad(Op, DAG);
+ case ISD::LOAD: {
+ auto *LD = cast<LoadSDNode>(Op);
+ EVT MemVT = LD->getMemoryVT();
+ if (Subtarget->hasMVEIntegerOps() &&
+ ((MemVT == MVT::v2i1 || MemVT == MVT::v4i1 || MemVT == MVT::v8i1 ||
+ MemVT == MVT::v16i1)))
+ return LowerPredicateLoad(Op, DAG);
+
+ auto Pair = LowerAEABIUnalignedLoad(Op, DAG);
+ if (Pair.first)
+ return DAG.getMergeValues({Pair.first, Pair.second}, SDLoc(Pair.first));
+ return SDValue();
+ }
case ISD::STORE:
return LowerSTORE(Op, DAG, Subtarget);
case ISD::MLOAD:
@@ -10811,6 +10961,9 @@ void ARMTargetLowering::ReplaceNodeResults(SDNode *N,
case ISD::LOAD:
LowerLOAD(N, Results, DAG);
break;
+ case ISD::STORE:
+ Res = LowerAEABIUnalignedStore(SDValue(N, 0), DAG);
+ break;
case ISD::TRUNCATE:
Res = LowerTruncate(N, DAG, Subtarget);
break;
@@ -19859,31 +20012,45 @@ ARMTargetLowering::getPreIndexedAddressParts(SDNode *N, SDValue &Base,
EVT VT;
SDValue Ptr;
Align Alignment;
+ unsigned AS = 0;
bool isSEXTLoad = false;
bool IsMasked = false;
if (LoadSDNode *LD = dyn_cast<LoadSDNode>(N)) {
Ptr = LD->getBasePtr();
VT = LD->getMemoryVT();
Alignment = LD->getAlign();
+ AS = LD->getAddressSpace();
isSEXTLoad = LD->getExtensionType() == ISD::SEXTLOAD;
} else if (StoreSDNode *ST = dyn_cast<StoreSDNode>(N)) {
Ptr = ST->getBasePtr();
VT = ST->getMemoryVT();
Alignment = ST->getAlign();
+ AS = ST->getAddressSpace();
} else if (MaskedLoadSDNode *LD = dyn_cast<MaskedLoadSDNode>(N)) {
Ptr = LD->getBasePtr();
VT = LD->getMemoryVT();
Alignment = LD->getAlign();
+ AS = LD->getAddressSpace();
isSEXTLoad = LD->getExtensionType() == ISD::SEXTLOAD;
IsMasked = true;
} else if (MaskedStoreSDNode *ST = dyn_cast<MaskedStoreSDNode>(N)) {
Ptr = ST->getBasePtr();
VT = ST->getMemoryVT();
Alignment = ST->getAlign();
+ AS = ST->getAddressSpace();
IsMasked = true;
} else
return false;
+ unsigned Fast = 0;
+ if (!allowsMisalignedMemoryAccesses(VT, AS, Alignment,
+ MachineMemOperand::MONone, &Fast)) {
+ // Only generate post-increment or pre-increment forms when a real
+ // hardware instruction exists for them. Do not emit postinc/preinc
+ // if the operation will end up as a libcall.
+ return false;
+ }
+
bool isInc;
bool isLegal = false;
if (VT.isVector())
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.h b/llvm/lib/Target/ARM/ARMISelLowering.h
index bc2fec3c1bdb5..ae93fdf6d619b 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.h
+++ b/llvm/lib/Target/ARM/ARMISelLowering.h
@@ -919,10 +919,14 @@ class VectorType;
SDValue LowerSPONENTRY(SDValue Op, SelectionDAG &DAG) const;
void LowerLOAD(SDNode *N, SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) const;
+ SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG,
+ const ARMSubtarget *Subtarget) const;
+ std::pair<SDValue, SDValue>
+ LowerAEABIUnalignedLoad(SDValue Op, SelectionDAG &DAG) const;
+ SDValue LowerAEABIUnalignedStore(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFP_TO_BF16(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCMP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerABS(SDValue Op, SelectionDAG &DAG) const;
-
Register getRegisterByName(const char* RegName, LLT VT,
const MachineFunction &MF) const override;
diff --git a/llvm/test/CodeGen/ARM/i64_volatile_load_store.ll b/llvm/test/CodeGen/ARM/i64_volatile_load_store.ll
index ca5fd2bc14f40..125326fd754fa 100644
--- a/llvm/test/CodeGen/ARM/i64_volatile_load_store.ll
+++ b/llvm/test/CodeGen/ARM/i64_volatile_load_store.ll
@@ -121,23 +121,22 @@ define void @test_unaligned() {
; CHECK-ARMV5TE-NEXT: push {r4, r5, r6, lr}
; CHECK-ARMV5TE-NEXT: ldr r0, .LCPI1_0
; CHECK-ARMV5TE-NEXT: ldr r6, .LCPI1_1
-; CHECK-ARMV5TE-NEXT: mov r1, r0
-; CHECK-ARMV5TE-NEXT: ldrb lr, [r1, #4]!
-; CHECK-ARMV5TE-NEXT: ldrb r3, [r1, #2]
-; CHECK-ARMV5TE-NEXT: ldrb r12, [r1, #3]
-; CHECK-ARMV5TE-NEXT: ldrb r1, [r0]
-; CHECK-ARMV5TE-NEXT: ldrb r2, [r0, #1]
-; CHECK-ARMV5TE-NEXT: ldrb r4, [r0, #2]
-; CHECK-ARMV5TE-NEXT: ldrb r5, [r0, #3]
-; CHECK-ARMV5TE-NEXT: ldrb r0, [r0, #5]
-; CHECK-ARMV5TE-NEXT: strb r0, [r6, #5]
-; CHECK-ARMV5TE-NEXT: strb r4, [r6, #2]
-; CHECK-ARMV5TE-NEXT: strb r5, [r6, #3]
-; CHECK-ARMV5TE-NEXT: strb r1, [r6]
-; CHECK-ARMV5TE-NEXT: strb r2, [r6, #1]
-; CHECK-ARMV5TE-NEXT: strb lr, [r6, #4]!
+; CHECK-ARMV5TE-NEXT: ldrb r12, [r0]
+; CHECK-ARMV5TE-NEXT: ldrb lr, [r0, #1]
+; CHECK-ARMV5TE-NEXT: ldrb r3, [r0, #2]
+; CHECK-ARMV5TE-NEXT: ldrb r1, [r0, #3]
+; CHECK-ARMV5TE-NEXT: ldrb r2, [r0, #5]
+; CHECK-ARMV5TE-NEXT: ldrb r4, [r0, #4]
+; CHECK-ARMV5TE-NEXT: ldrb r5, [r0, #7]
+; CHECK-ARMV5TE-NEXT: ldrb r0, [r0, #6]
+; CHECK-ARMV5TE-NEXT: strb r0, [r6, #6]
+; CHECK-ARMV5TE-NEXT: strb r5, [r6, #7]
+; CHECK-ARMV5TE-NEXT: strb r4, [r6, #4]
+; CHECK-ARMV5TE-NEXT: strb r2, [r6, #5]
; CHECK-ARMV5TE-NEXT: strb r3, [r6, #2]
-; CHECK-ARMV5TE-NEXT: strb r12, [r6, #3]
+; CHECK-ARMV5TE-NEXT: strb r1, [r6, #3]
+; CHECK-ARMV5TE-NEXT: strb r12, [r6]
+; CHECK-ARMV5TE-NEXT: strb lr, [r6, #1]
; CHECK-ARMV5TE-NEXT: pop {r4, r5, r6, pc}
; CHECK-ARMV5TE-NEXT: .p2align 2
; CHECK-ARMV5TE-NEXT: @ %bb.1:
@@ -164,23 +163,22 @@ define void @test_unaligned() {
; CHECK-ARMV4T-NEXT: push {r4, r5, r6, lr}
; CHECK-ARMV4T-NEXT: ldr r0, .LCPI1_0
; CHECK-ARMV4T-NEXT: ldr r6, .LCPI1_1
-; CHECK-ARMV4T-NEXT: mov r1, r0
-; CHECK-ARMV4T-NEXT: ldrb lr, [r1, #4]!
-; CHECK-ARMV4T-NEXT: ldrb r3, [r1, #2]
-; CHECK-ARMV4T-NEXT: ldrb r12, [r1, #3]
-; CHECK-ARMV4T-NEXT: ldrb r1, [r0]
-; CHECK-ARMV4T-NEXT: ldrb r2, [r0, #1]
-; CHECK-ARMV4T-NEXT: ldrb r4, [r0, #2]
-; CHECK-ARMV4T-NEXT: ldrb r5, [r0, #3]
-; CHECK-ARMV4T-NEXT: ldrb r0, [r0, #5]
-; CHECK-ARMV4T-NEXT: strb r0, [r6, #5]
-; CHECK-ARMV4T-NEXT: strb r4, [r6, #2]
-; CHECK-ARMV4T-NEXT: strb r5, [r6, #3]
-; CHECK-ARMV4T-NEXT: strb r1, [r6]
-; CHECK-ARMV4T-NEXT: strb r2, [r6, #1]
-; CHECK-ARMV4T-NEXT: strb lr, [r6, #4]!
+; CHECK-ARMV4T-NEXT: ldrb r12, [r0]
+; CHECK-ARMV4T-NEXT: ldrb lr, [r0, #1]
+; CHECK-ARMV4T-NEXT: ldrb r3, [r0, #2]
+; CHECK-ARMV4T-NEXT: ldrb r1, [r0, #3]
+; CHECK-ARMV4T-NEXT: ldrb r2, [r0, #5]
+; CHECK-ARMV4T-NEXT: ldrb r4, [r0, #4]
+; CHECK-ARMV4T-NEXT: ldrb r5, [r0, #7]
+; CHECK-ARMV4T-NEXT: ldrb r0, [r0, #6]
+; CHECK-ARMV4T-NEXT: strb r0, [r6, #6]
+; CHECK-ARMV4T-NEXT: strb r5, [r6, #7]
+; CHECK-ARMV4T-NEXT: strb r4, [r6, #4]
+; CHECK-ARMV4T-NEXT: strb r2, [r6, #5]
; CHECK-ARMV4T-NEXT: strb r3, [r6, #2]
-; CHECK-ARMV4T-NEXT: strb r12, [r6, #3]
+; CHECK-ARMV4T-NEXT: strb r1, [r6, #3]
+; CHECK-ARMV4T-NEXT: strb r12, [r6]
+; CHECK-ARMV4T-NEXT: strb lr, [r6, #1]
; CHECK-ARMV4T-NEXT: pop {r4, r5, r6, lr}
; CHECK-ARMV4T-NEXT: bx lr
; CHECK-ARMV4T-NEXT: .p2align 2
@@ -210,23 +208,22 @@ define void @test_unaligned() {
; CHECK-ARMV7-STRICT-NEXT: movw r6, :lower16:y_unaligned
; CHECK-ARMV7-STRICT-NEXT: movt r0, :upper16:x_unaligned
; CHECK-ARMV7-STRICT-NEXT: movt r6, :upper16:y_unaligned
-; CHECK-ARMV7-STRICT-NEXT: mov r1, r0
-; CHECK-ARMV7-STRICT-NEXT: ldrb r12, [r1, #4]!
-; CHECK-ARMV7-STRICT-NEXT: ldrb r3, [r0]
+; CHECK-ARMV7-STRICT-NEXT: ldrb r12, [r0]
; CHECK-ARMV7-STRICT-NEXT: ldrb lr, [r0, #1]
-; CHECK-ARMV7-STRICT-NEXT: ldrb r2, [r0, #2]
-; CHECK-ARMV7-STRICT-NEXT: ldrb r4, [r0, #3]
-; CHECK-ARMV7-STRICT-NEXT: ldrb r0, [r0, #5]
-; CHECK-ARMV7-STRICT-NEXT: ldrb r5, [r1, #2]
-; CHECK-ARMV7-STRICT-NEXT: ldrb r1, [r1, #3]
-; CHECK-ARMV7-STRICT-NEXT: strb r0, [r6, #5]
-; CHECK-ARMV7-STRICT-NEXT: strb r2, [r6, #2]
-; CHECK-ARMV7-STRICT-NEXT: strb r4, [r6, #3]
-; CHECK-ARMV7-STRICT-NEXT: strb r3, [r6]
-; CHECK-ARMV7-STRICT-NEXT: strb lr, [r6, #1]
-; CHECK-ARMV7-STRICT-NEXT: strb r12, [r6, #4]!
-; CHECK-ARMV7-STRICT-NEXT: strb r5, [r6, #2]
+; CHECK-ARMV7-STRICT-NEXT: ldrb r3, [r0, #2]
+; CHECK-ARMV7-STRICT-NEXT: ldrb r1, [r0, #3]
+; CHECK-ARMV7-STRICT-NEXT: ldrb r2, [r0, #5]
+; CHECK-ARMV7-STRICT-NEXT: ldrb r4, [r0, #4]
+; CHECK-ARMV7-STRICT-NEXT: ldrb r5, [r0, #7]
+; CHECK-ARMV7-STRICT-NEXT: ldrb r0, [r0, #6]
+; CHECK-ARMV7-STRICT-NEXT: strb r0, [r6, #6]
+; CHECK-ARMV7-STRICT-NEXT: strb r5, [r6, #7]
+; CHECK-ARMV7-STRICT-NEXT: strb r4, [r6, #4]
+; CHECK-ARMV7-STRICT-NEXT: strb r2, [r6, #5]
+; CHECK-ARMV7-STRICT-NEXT: strb r3, [r6, #2]
; CHECK-ARMV7-STRICT-NEXT: strb r1, [r6, #3]
+; CHECK-ARMV7-STRICT-NEXT: strb r12, [r6]
+; CHECK-ARMV7-STRICT-NEXT: strb lr, [r6, #1]
; CHECK-ARMV7-STRICT-NEXT: pop {r4, r5, r6, pc}
;
; CHECK-ARMV6-LABEL: test_unaligned:
@@ -251,23 +248,22 @@ define void @test_unaligned() {
; CHECK-ARMV6-STRICT-NEXT: push {r4, r5, r6, lr}
; CHECK-ARMV6-STRICT-NEXT: ldr r0, .LCPI1_0
; CHECK-ARMV6-STRICT-NEXT: ldr r6, .LCPI1_1
-; CHECK-ARMV6-STRICT-NEXT: mov r1, r0
-; CHECK-ARMV6-STRICT-NEXT: ldrb lr, [r1, #4]!
-; CHECK-ARMV6-STRICT-NEXT: ldrb r3, [r1, #2]
-; CHECK-ARMV6-STRICT-NEXT: ldrb r12, [r1, #3]
-; CHECK-ARMV6-STRICT-NEXT: ldrb r1, [r0]
-; CHECK-ARMV6-STRICT-NEXT: ldrb r2, [r0, #1]
-; CHECK-ARMV6-STRICT-NEXT: ldrb r4, [r0, #2]
-; CHECK-ARMV6-STRICT-NEXT: ldrb r5, [r0, #3]
-; CHECK-ARMV6-STRICT-NEXT: ldrb r0, [r0, #5]
-; CHECK-ARMV6-STRICT-NEXT: strb r0, [r6, #5]
-; CHECK-ARMV6-STRICT-NEXT: strb r4, [r6, #2]
-; CHECK-ARMV6-STRICT-NEXT: strb r5, [r6, #3]
-; CHECK-ARMV6-STRICT-NEXT: strb r1, [r6]
-; CHECK-ARMV6-STRICT-NEXT: strb r2, [r6, #1]
-; CHECK-ARMV6-STRICT-NEXT: strb lr, [r6, #4]!
+; CHECK-ARMV6-STRICT-NEXT: ldrb r12, [r0]
+; CHECK-ARMV6-STRICT-NEXT: ldrb lr, [r0, #1]
+; CHECK-ARMV6-STRICT-NEXT: ldrb r3, [r0, #2]
+; CHECK-ARMV6-STRICT-NEXT: ldrb r1, [r0, #3]
+; CHECK-ARMV6-STRICT-NEXT: ldrb r2, [r0, #5]
+; CHECK-ARMV6-STRICT-NEXT: ldrb r4, [r0, #4]
+; CHECK-ARMV6-STRICT-NEXT: ldrb r5, [r0, #7]
+; CHECK-ARMV6-STRICT-NEXT: ldrb r0, [r0, #6]
+; CHECK-ARMV6-STRICT-NEXT: strb r0, [r6, #6]
+; CHECK-ARMV6-STRICT-NEXT: strb r5, [r6, #7]
+; CHECK-ARMV6-STRICT-NEXT: strb r4, [r6, #4]
+; CHECK-ARMV6-STRICT-NEXT: strb r2, [r6, #5]
; CHECK-ARMV6-STRICT-NEXT: strb r3, [r6, #2]
-; CHECK-ARMV6-STRICT-NEXT: strb r12, [r6, #3]
+; CHECK-ARMV6-STRICT-NEXT: strb r1, [r6, #3]
+; CHECK-ARMV6-STRICT-NEXT: strb r12, [r6]
+; CHECK-ARMV6-STRICT-NEXT: strb lr, [r6, #1]
; CHECK-ARMV6-STRICT-NEXT: pop {r4, r5, r6, pc}
; CHECK-ARMV6-STRICT-NEXT: .p2align 2
; CHECK-ARMV6-STRICT-NEXT: @ %bb.1:
diff --git a/llvm/test/CodeGen/ARM/unaligned_load_store_aeabi.ll b/llvm/test/CodeGen/ARM/unaligned_load_store_aeabi.ll
new file mode 100644
index 0000000000000..0f1adc55139c7
--- /dev/null
+++ b/llvm/test/CodeGen/ARM/unaligned_load_store_aeabi.ll
@@ -0,0 +1,425 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=thumbv6m-eabi -mattr=+strict-align %s -o - | FileCheck %s -check-prefix=CHECK-V6M
+; RUN: llc -mtriple=thumbv7m-eabi -mattr=+strict-align %s -o - | FileCheck %s -check-prefix=CHECK-V7M
+; RUN: llc -mtriple=thumbv7m-eabi -mattr=-strict-align %s -o - | FileCheck %s -check-prefix=CHECK-ALIGNED
+
+define void @loadstore4_align1(i32* %a, i32* %b) nounwind optsize minsize {
+; CHECK-V6M-LABEL: loadstore4_align1:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r4, lr}
+; CHECK-V6M-NEXT: push {r4, lr}
+; CHECK-V6M-NEXT: mov r4, r1
+; CHECK-V6M-NEXT: bl __aeabi_uread4
+; CHECK-V6M-NEXT: mov r1, r4
+; CHECK-V6M-NEXT: bl __aeabi_uwrite4
+; CHECK-V6M-NEXT: pop {r4, pc}
+;
+; CHECK-V7M-LABEL: loadstore4_align1:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r4, lr}
+; CHECK-V7M-NEXT: push {r4, lr}
+; CHECK-V7M-NEXT: mov r4, r1
+; CHECK-V7M-NEXT: bl __aeabi_uread4
+; CHECK-V7M-NEXT: mov r1, r4
+; CHECK-V7M-NEXT: bl __aeabi_uwrite4
+; CHECK-V7M-NEXT: pop {r4, pc}
+;
+; CHECK-ALIGNED-LABEL: loadstore4_align1:
+; CHECK-ALIGNED: @ %bb.0: @ %entry
+; CHECK-ALIGNED-NEXT: ldr r0, [r0]
+; CHECK-ALIGNED-NEXT: str r0, [r1]
+; CHECK-ALIGNED-NEXT: bx lr
+entry:
+ %tmp = load i32, i32* %a, align 1
+ store i32 %tmp, i32* %b, align 1
+ ret void
+}
+
+define i32 @load4_align1(i32* %a) nounwind optsize minsize {
+; CHECK-V6M-LABEL: load4_align1:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r7, lr}
+; CHECK-V6M-NEXT: push {r7, lr}
+; CHECK-V6M-NEXT: bl __aeabi_uread4
+; CHECK-V6M-NEXT: pop {r7, pc}
+;
+; CHECK-V7M-LABEL: load4_align1:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r7, lr}
+; CHECK-V7M-NEXT: push {r7, lr}
+; CHECK-V7M-NEXT: bl __aeabi_uread4
+; CHECK-V7M-NEXT: pop {r7, pc}
+;
+; CHECK-ALIGNED-LABEL: load4_align1:
+; CHECK-ALIGNED: @ %bb.0: @ %entry
+; CHECK-ALIGNED-NEXT: ldr r0, [r0]
+; CHECK-ALIGNED-NEXT: bx lr
+entry:
+ %tmp = load i32, i32* %a, align 1
+ ret i32 %tmp
+}
+
+define i64 @load4_align1_zext(i32* %a) nounwind optsize minsize {
+; CHECK-V6M-LABEL: load4_align1_zext:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r7, lr}
+; CHECK-V6M-NEXT: push {r7, lr}
+; CHECK-V6M-NEXT: bl __aeabi_uread4
+; CHECK-V6M-NEXT: movs r1, #0
+; CHECK-V6M-NEXT: pop {r7, pc}
+;
+; CHECK-V7M-LABEL: load4_align1_zext:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r7, lr}
+; CHECK-V7M-NEXT: push {r7, lr}
+; CHECK-V7M-NEXT: bl __aeabi_uread4
+; CHECK-V7M-NEXT: movs r1, #0
+; CHECK-V7M-NEXT: pop {r7, pc}
+;
+; CHECK-ALIGNED-LABEL: load4_align1_zext:
+; CHECK-ALIGNED: @ %bb.0: @ %entry
+; CHECK-ALIGNED-NEXT: ldr r0, [r0]
+; CHECK-ALIGNED-NEXT: movs r1, #0
+; CHECK-ALIGNED-NEXT: bx lr
+entry:
+ %tmp = load i32, i32* %a, align 1
+ %ext = zext i32 %tmp to i64
+ ret i64 %ext
+}
+
+define i64 @load4_align1_sext(i32* %a) nounwind optsize minsize {
+; CHECK-V6M-LABEL: load4_align1_sext:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r7, lr}
+; CHECK-V6M-NEXT: push {r7, lr}
+; CHECK-V6M-NEXT: bl __aeabi_uread4
+; CHECK-V6M-NEXT: asrs r1, r0, #31
+; CHECK-V6M-NEXT: pop {r7, pc}
+;
+; CHECK-V7M-LABEL: load4_align1_sext:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r7, lr}
+; CHECK-V7M-NEXT: push {r7, lr}
+; CHECK-V7M-NEXT: bl __aeabi_uread4
+; CHECK-V7M-NEXT: asrs r1, r0, #31
+; CHECK-V7M-NEXT: pop {r7, pc}
+;
+; CHECK-ALIGNED-LABEL: load4_align1_sext:
+; CHECK-ALIGNED: @ %bb.0: @ %entry
+; CHECK-ALIGNED-NEXT: ldr r0, [r0]
+; CHECK-ALIGNED-NEXT: asrs r1, r0, #31
+; CHECK-ALIGNED-NEXT: bx lr
+entry:
+ %tmp = load i32, i32* %a, align 1
+ %ext = sext i32 %tmp to i64
+ ret i64 %ext
+}
+
+define void @store4_align1(i32* %a, i32 %b) nounwind optsize minsize {
+; CHECK-V6M-LABEL: store4_align1:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r7, lr}
+; CHECK-V6M-NEXT: push {r7, lr}
+; CHECK-V6M-NEXT: mov r2, r0
+; CHECK-V6M-NEXT: mov r0, r1
+; CHECK-V6M-NEXT: mov r1, r2
+; CHECK-V6M-NEXT: bl __aeabi_uwrite4
+; CHECK-V6M-NEXT: pop {r7, pc}
+;
+; CHECK-V7M-LABEL: store4_align1:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r7, lr}
+; CHECK-V7M-NEXT: push {r7, lr}
+; CHECK-V7M-NEXT: mov r2, r0
+; CHECK-V7M-NEXT: mov r0, r1
+; CHECK-V7M-NEXT: mov r1, r2
+; CHECK-V7M-NEXT: bl __aeabi_uwrite4
+; CHECK-V7M-NEXT: pop {r7, pc}
+;
+; CHECK-ALIGNED-LABEL: store4_align1:
+; CHECK-ALIGNED: @ %bb.0: @ %entry
+; CHECK-ALIGNED-NEXT: str r1, [r0]
+; CHECK-ALIGNED-NEXT: bx lr
+entry:
+ store i32 %b, i32* %a, align 1
+ ret void
+}
+
+define i32 @load4_align2(i32* %a) nounwind optsize minsize {
+; CHECK-V6M-LABEL: load4_align2:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r7, lr}
+; CHECK-V6M-NEXT: push {r7, lr}
+; CHECK-V6M-NEXT: bl __aeabi_uread4
+; CHECK-V6M-NEXT: pop {r7, pc}
+;
+; CHECK-V7M-LABEL: load4_align2:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r7, lr}
+; CHECK-V7M-NEXT: push {r7, lr}
+; CHECK-V7M-NEXT: bl __aeabi_uread4
+; CHECK-V7M-NEXT: pop {r7, pc}
+;
+; CHECK-ALIGNED-LABEL: load4_align2:
+; CHECK-ALIGNED: @ %bb.0: @ %entry
+; CHECK-ALIGNED-NEXT: ldr r0, [r0]
+; CHECK-ALIGNED-NEXT: bx lr
+entry:
+ %tmp = load i32, i32* %a, align 2
+ ret i32 %tmp
+}
+
+define i64 @load6_align1_zext(i48* %a) nounwind optsize minsize {
+; CHECK-V6M-LABEL: load6_align1_zext:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r4, lr}
+; CHECK-V6M-NEXT: push {r4, lr}
+; CHECK-V6M-NEXT: ldrb r1, [r0, #4]
+; CHECK-V6M-NEXT: ldrb r2, [r0, #5]
+; CHECK-V6M-NEXT: lsls r2, r2, #8
+; CHECK-V6M-NEXT: adds r4, r2, r1
+; CHECK-V6M-NEXT: bl __aeabi_uread4
+; CHECK-V6M-NEXT: mov r1, r4
+; CHECK-V6M-NEXT: pop {r4, pc}
+;
+; CHECK-V7M-LABEL: load6_align1_zext:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r4, lr}
+; CHECK-V7M-NEXT: push {r4, lr}
+; CHECK-V7M-NEXT: mov r4, r0
+; CHECK-V7M-NEXT: bl __aeabi_uread4
+; CHECK-V7M-NEXT: ldrb r2, [r4, #5]
+; CHECK-V7M-NEXT: ldrb r1, [r4, #4]
+; CHECK-V7M-NEXT: orr.w r1, r1, r2, lsl #8
+; CHECK-V7M-NEXT: pop {r4, pc}
+;
+; CHECK-ALIGNED-LABEL: load6_align1_zext:
+; CHECK-ALIGNED: @ %bb.0: @ %entry
+; CHECK-ALIGNED-NEXT: ldr r2, [r0]
+; CHECK-ALIGNED-NEXT: ldrh r1, [r0, #4]
+; CHECK-ALIGNED-NEXT: mov r0, r2
+; CHECK-ALIGNED-NEXT: bx lr
+entry:
+ %tmp = load i48, i48* %a, align 1
+ %ext = zext i48 %tmp to i64
+ ret i64 %ext
+}
+
+define i64 @load6_align1_sext(i48* %a) nounwind optsize minsize {
+; CHECK-V6M-LABEL: load6_align1_sext:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r4, lr}
+; CHECK-V6M-NEXT: push {r4, lr}
+; CHECK-V6M-NEXT: movs r1, #5
+; CHECK-V6M-NEXT: ldrsb r1, [r0, r1]
+; CHECK-V6M-NEXT: lsls r1, r1, #8
+; CHECK-V6M-NEXT: ldrb r2, [r0, #4]
+; CHECK-V6M-NEXT: adds r4, r1, r2
+; CHECK-V6M-NEXT: bl __aeabi_uread4
+; CHECK-V6M-NEXT: mov r1, r4
+; CHECK-V6M-NEXT: pop {r4, pc}
+;
+; CHECK-V7M-LABEL: load6_align1_sext:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r4, lr}
+; CHECK-V7M-NEXT: push {r4, lr}
+; CHECK-V7M-NEXT: ldrsb.w r1, [r0, #5]
+; CHECK-V7M-NEXT: ldrb r2, [r0, #4]
+; CHECK-V7M-NEXT: orr.w r4, r2, r1, lsl #8
+; CHECK-V7M-NEXT: bl __aeabi_uread4
+; CHECK-V7M-NEXT: mov r1, r4
+; CHECK-V7M-NEXT: pop {r4, pc}
+;
+; CHECK-ALIGNED-LABEL: load6_align1_sext:
+; CHECK-ALIGNED: @ %bb.0: @ %entry
+; CHECK-ALIGNED-NEXT: ldr r2, [r0]
+; CHECK-ALIGNED-NEXT: ldrsh.w r1, [r0, #4]
+; CHECK-ALIGNED-NEXT: mov r0, r2
+; CHECK-ALIGNED-NEXT: bx lr
+entry:
+ %tmp = load i48, i48* %a, align 1
+ %ext = sext i48 %tmp to i64
+ ret i64 %ext
+}
+
+define void @store6_align1(i48* %a, i48 %b) nounwind optsize minsize {
+; CHECK-V6M-LABEL: store6_align1:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r7, lr}
+; CHECK-V6M-NEXT: push {r7, lr}
+; CHECK-V6M-NEXT: mov r1, r0
+; CHECK-V6M-NEXT: strb r3, [r0, #4]
+; CHECK-V6M-NEXT: lsrs r0, r3, #8
+; CHECK-V6M-NEXT: strb r0, [r1, #5]
+; CHECK-V6M-NEXT: mov r0, r2
+; CHECK-V6M-NEXT: bl __aeabi_uwrite4
+; CHECK-V6M-NEXT: pop {r7, pc}
+;
+; CHECK-V7M-LABEL: store6_align1:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r7, lr}
+; CHECK-V7M-NEXT: push {r7, lr}
+; CHECK-V7M-NEXT: mov r1, r0
+; CHECK-V7M-NEXT: strb r3, [r0, #4]
+; CHECK-V7M-NEXT: lsrs r0, r3, #8
+; CHECK-V7M-NEXT: strb r0, [r1, #5]
+; CHECK-V7M-NEXT: mov r0, r2
+; CHECK-V7M-NEXT: bl __aeabi_uwrite4
+; CHECK-V7M-NEXT: pop {r7, pc}
+;
+; CHECK-ALIGNED-LABEL: store6_align1:
+; CHECK-ALIGNED: @ %bb.0: @ %entry
+; CHECK-ALIGNED-NEXT: strh r3, [r0, #4]
+; CHECK-ALIGNED-NEXT: str r2, [r0]
+; CHECK-ALIGNED-NEXT: bx lr
+entry:
+ store i48 %b, i48* %a, align 1
+ ret void
+}
+
+define void @loadstore8_align4(double* %a, double* %b) nounwind optsize minsize {
+; CHECK-V6M-LABEL: loadstore8_align4:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r4, lr}
+; CHECK-V6M-NEXT: push {r4, lr}
+; CHECK-V6M-NEXT: mov r4, r1
+; CHECK-V6M-NEXT: bl __aeabi_uread8
+; CHECK-V6M-NEXT: mov r2, r4
+; CHECK-V6M-NEXT: bl __aeabi_uwrite8
+; CHECK-V6M-NEXT: pop {r4, pc}
+;
+; CHECK-V7M-LABEL: loadstore8_align4:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r4, lr}
+; CHECK-V7M-NEXT: push {r4, lr}
+; CHECK-V7M-NEXT: mov r4, r1
+; CHECK-V7M-NEXT: bl __aeabi_uread8
+; CHECK-V7M-NEXT: mov r2, r4
+; CHECK-V7M-NEXT: bl __aeabi_uwrite8
+; CHECK-V7M-NEXT: pop {r4, pc}
+;
+entry:
+ %tmp = load double, double* %a, align 1
+ store double %tmp, double* %b, align 1
+ ret void
+}
+
+define double @load8_align1(double* %a) nounwind optsize minsize {
+; CHECK-V6M-LABEL: load8_align1:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r7, lr}
+; CHECK-V6M-NEXT: push {r7, lr}
+; CHECK-V6M-NEXT: bl __aeabi_uread8
+; CHECK-V6M-NEXT: pop {r7, pc}
+;
+; CHECK-V7M-LABEL: load8_align1:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r7, lr}
+; CHECK-V7M-NEXT: push {r7, lr}
+; CHECK-V7M-NEXT: bl __aeabi_uread8
+; CHECK-V7M-NEXT: pop {r7, pc}
+;
+entry:
+ %tmp = load double, double* %a, align 1
+ ret double %tmp
+}
+
+define void @store8_align1(double* %a, double %b) nounwind optsize minsize {
+; CHECK-V6M-LABEL: store8_align1:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r7, lr}
+; CHECK-V6M-NEXT: push {r7, lr}
+; CHECK-V6M-NEXT: mov r1, r3
+; CHECK-V6M-NEXT: mov r3, r0
+; CHECK-V6M-NEXT: mov r0, r2
+; CHECK-V6M-NEXT: mov r2, r3
+; CHECK-V6M-NEXT: bl __aeabi_uwrite8
+; CHECK-V6M-NEXT: pop {r7, pc}
+;
+; CHECK-V7M-LABEL: store8_align1:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r7, lr}
+; CHECK-V7M-NEXT: push {r7, lr}
+; CHECK-V7M-NEXT: mov r1, r3
+; CHECK-V7M-NEXT: mov r3, r0
+; CHECK-V7M-NEXT: mov r0, r2
+; CHECK-V7M-NEXT: mov r2, r3
+; CHECK-V7M-NEXT: bl __aeabi_uwrite8
+; CHECK-V7M-NEXT: pop {r7, pc}
+entry:
+ store double %b, double* %a, align 1
+ ret void
+}
+
+define double @load8_align2(double* %a) nounwind optsize minsize {
+; CHECK-V6M-LABEL: load8_align2:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r7, lr}
+; CHECK-V6M-NEXT: push {r7, lr}
+; CHECK-V6M-NEXT: bl __aeabi_uread8
+; CHECK-V6M-NEXT: pop {r7, pc}
+;
+; CHECK-V7M-LABEL: load8_align2:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r7, lr}
+; CHECK-V7M-NEXT: push {r7, lr}
+; CHECK-V7M-NEXT: bl __aeabi_uread8
+; CHECK-V7M-NEXT: pop {r7, pc}
+entry:
+ %tmp = load double, double* %a, align 2
+ ret double %tmp
+}
+
+define i64 @load12_align1_trunc(i96* %a) nounwind optsize minsize {
+; CHECK-V6M-LABEL: load12_align1_trunc:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r7, lr}
+; CHECK-V6M-NEXT: push {r7, lr}
+; CHECK-V6M-NEXT: bl __aeabi_uread8
+; CHECK-V6M-NEXT: pop {r7, pc}
+;
+; CHECK-V7M-LABEL: load12_align1_trunc:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r7, lr}
+; CHECK-V7M-NEXT: push {r7, lr}
+; CHECK-V7M-NEXT: bl __aeabi_uread8
+; CHECK-V7M-NEXT: pop {r7, pc}
+entry:
+ %tmp = load i96, i96* %a, align 1
+ %ext = trunc i96 %tmp to i64
+ ret i64 %ext
+}
+
+define void @store12_align4_trunc(i96* %a, i96 %b) nounwind optsize minsize {
+; CHECK-V6M-LABEL: store12_align4_trunc:
+; CHECK-V6M: @ %bb.0: @ %entry
+; CHECK-V6M-NEXT: .save {r4, lr}
+; CHECK-V6M-NEXT: push {r4, lr}
+; CHECK-V6M-NEXT: mov r1, r3
+; CHECK-V6M-NEXT: mov r4, r0
+; CHECK-V6M-NEXT: mov r0, r2
+; CHECK-V6M-NEXT: mov r2, r4
+; CHECK-V6M-NEXT: bl __aeabi_uwrite8
+; CHECK-V6M-NEXT: adds r4, #8
+; CHECK-V6M-NEXT: ldr r0, [sp, #8]
+; CHECK-V6M-NEXT: mov r1, r4
+; CHECK-V6M-NEXT: bl __aeabi_uwrite4
+; CHECK-V6M-NEXT: pop {r4, pc}
+;
+; CHECK-V7M-LABEL: store12_align4_trunc:
+; CHECK-V7M: @ %bb.0: @ %entry
+; CHECK-V7M-NEXT: .save {r4, lr}
+; CHECK-V7M-NEXT: push {r4, lr}
+; CHECK-V7M-NEXT: mov r4, r0
+; CHECK-V7M-NEXT: mov r0, r2
+; CHECK-V7M-NEXT: mov r1, r3
+; CHECK-V7M-NEXT: mov r2, r4
+; CHECK-V7M-NEXT: bl __aeabi_uwrite8
+; CHECK-V7M-NEXT: ldr r0, [sp, #8]
+; CHECK-V7M-NEXT: add.w r1, r4, #8
+; CHECK-V7M-NEXT: bl __aeabi_uwrite4
+; CHECK-V7M-NEXT: pop {r4, pc}
+entry:
+ store i96 %b, i96* %a, align 1
+ ret void
+}
More information about the llvm-commits
mailing list