[PATCH] D49720: [ARM] Fix over-alignment in arguments that are HA of 128-bit vectors
Petr Pavlu via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 24 01:39:12 PDT 2018
petpav01 created this revision.
petpav01 added reviewers: t.p.northover, olista01, eli.friedman.
Herald added a reviewer: javed.absar.
Herald added subscribers: llvm-commits, chrib, kristof.beyls.
Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up fully on stack, the function tries to pack all resulting items of the aggregate as tightly as possible. Once the first item is laid out, the alignment used for consecutive items is the size of one item.
This logic goes wrong for 128-bit vectors because their alignment is normally only 64 bits, and so can result in inserting unexpected padding between the first and second element.
Example:
$ cat test.c
#include <arm_neon.h>
typedef struct {
double A[4];
} S_d64_4;
typedef struct {
uint32x4_t A[2];
} S_v128_2;
int foo(S_d64_4 P0, S_d64_4 P1, float P2, S_v128_2 P3) {
// * P0 is passed in D0-D3.
// * P1 is passed in D4-D7.
// * P2 is passed in [SP, SP+4).
// * P3.A[0] is passed in [SP+8, SP+24).
// * P3.A[1] should be passed according to AAPCS in [SP+24, SP+40) but the
// code produced by Clang/LLVM expects it in [SP+32, SP+48).
return vgetq_lane_u32(P3.A[0], 0) + vgetq_lane_u32(P3.A[1], 0);
}
$ clang -target arm-none-eabi -mcpu=cortex-a53 -S test.c -o -
[...]
foo:
push {r11, lr}
mov r11, sp
sub sp, sp, #8
bfc sp, #0, #4
ldr r0, [r11, #40] /* load from entry-SP + #32 */
ldr r1, [r11, #16] /* load from entry-SP + #8 */
add r0, r1, r0
mov sp, r11
pop {r11, pc}
The proposed patch fixes the problem by updating the alignment with the item size only if this results in reducing it.
Repository:
rL LLVM
https://reviews.llvm.org/D49720
Files:
lib/Target/ARM/ARMCallingConv.h
test/CodeGen/ARM/aggregate-padding.ll
Index: test/CodeGen/ARM/aggregate-padding.ll
===================================================================
--- test/CodeGen/ARM/aggregate-padding.ll
+++ test/CodeGen/ARM/aggregate-padding.ll
@@ -99,3 +99,19 @@
%sum = add i16 %val0, %val2
ret i16 %sum
}
+
+; [2 x <4 x i32>] should be aligned only on a 64-bit boundary and contiguous.
+; None of the two <4 x i32> elements should introduce any padding to 128 bits.
+define i32 @test_4xi32_64bit_aligned_and_contiguous([8 x double], float, [2 x <4 x i32>] %arg) nounwind {
+; CHECK-LABEL: test_4xi32_64bit_aligned_and_contiguous:
+; CHECK-DAG: ldr [[VAL0_0:r[0-9]+]], [sp, #8]
+; CHECK-DAG: ldr [[VAL1_0:r[0-9]+]], [sp, #24]
+; CHECK: add r0, [[VAL0_0]], [[VAL1_0]]
+
+ %val0 = extractvalue [2 x <4 x i32>] %arg, 0
+ %val0_0 = extractelement <4 x i32> %val0, i32 0
+ %val1 = extractvalue [2 x <4 x i32>] %arg, 1
+ %val1_0 = extractelement <4 x i32> %val1, i32 0
+ %sum = add i32 %val0_0, %val1_0
+ ret i32 %sum
+}
Index: lib/Target/ARM/ARMCallingConv.h
===================================================================
--- lib/Target/ARM/ARMCallingConv.h
+++ lib/Target/ARM/ARMCallingConv.h
@@ -276,7 +276,7 @@
// After the first item has been allocated, the rest are packed as tightly
// as possible. (E.g. an incoming i64 would have starting Align of 8, but
// we'll be allocating a bunch of i32 slots).
- Align = Size;
+ Align = std::min(Align, Size);
}
// All pending members have now been allocated
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D49720.156977.patch
Type: text/x-patch
Size: 1502 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180724/1f29cfe0/attachment.bin>
More information about the llvm-commits
mailing list