[llvm-bugs] [Bug 41245] New: Some NEON load instructions don't use preferred vector type alignment
via llvm-bugs
llvm-bugs at lists.llvm.org
Tue Mar 26 13:02:01 PDT 2019
https://bugs.llvm.org/show_bug.cgi?id=41245
Bug ID: 41245
Summary: Some NEON load instructions don't use preferred vector
type alignment
Product: libraries
Version: 8.0
Hardware: PC
OS: MacOS X
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: ARM
Assignee: unassignedbugs at nondot.org
Reporter: andrey.vihrov at gmail.com
CC: llvm-bugs at lists.llvm.org, peter.smith at linaro.org,
Ties.Stuij at arm.com
Created attachment 21676
--> https://bugs.llvm.org/attachment.cgi?id=21676&action=edit
Sample IR code
Consider the following IR code:
define i32 @bar(<4 x i32>*) {
%2 = load <4 x i32>, <4 x i32>* %0
%3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32
undef, i32 undef, i32 undef>
%4 = extractelement <4 x i32> %3, i32 0
ret i32 %4
}
When compiled to 32-bit ARM assembly code, this generates
_bar:
vld2.32 {d16, d17, d18, d19}, [r0:256]
vmov.32 r0, d18[0]
bx lr
According to https://llvm.org/doxygen/DataLayout_8cpp_source.html#l00534
(DataLayout::getAlignmentInfo()), the preferred alignment for a vector of 4
32-bit integers should be 16 bytes, whereas here the vector is accessed with 32
byte alignment.
The problem disappears if explicit alignment is added to the load IR
instruction or if the shufflevector IR instruction is removed.
A more detailed sample and output assembly are attached. We can see that:
* The alloca in foo resulted into a 16 byte = 128 bit aligned stack allocation
(ok);
* The constant vector (from foo) has got a 16 byte aligned storage (ok);
* The vld1.64 instruction (that loads the constant vector) assumes 16 byte
alignment (ok);
* The vst1.64 instruction in foo (store to the alloca) assumes 16 byte
alignment as well (ok);
* The vld2.32 instruction in bar (combined load + shufflevector IR
instructions) assumes 32 byte alignment (bug?)
The problem gets "fixed" by running InstCombine optimization pass on the IR
before compilation. The optimization pass however just adds an explicit
alignment to the load instruction equal to the preferred alignment for <4 x
i32>.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190326/73f22992/attachment.html>
More information about the llvm-bugs
mailing list