[llvm-bugs] [Bug 41245] New: Some NEON load instructions don't use preferred vector type alignment

Tue Mar 26 13:02:01 PDT 2019

https://bugs.llvm.org/show_bug.cgi?id=41245

            Bug ID: 41245
           Summary: Some NEON load instructions don't use preferred vector
                    type alignment
           Product: libraries
           Version: 8.0
          Hardware: PC
                OS: MacOS X
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: ARM
          Assignee: unassignedbugs at nondot.org
          Reporter: andrey.vihrov at gmail.com
                CC: llvm-bugs at lists.llvm.org, peter.smith at linaro.org,
                    Ties.Stuij at arm.com

Created attachment 21676
  --> https://bugs.llvm.org/attachment.cgi?id=21676&action=edit
Sample IR code

Consider the following IR code:

  define i32 @bar(<4 x i32>*) {
    %2 = load <4 x i32>, <4 x i32>* %0
    %3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32
undef, i32 undef, i32 undef>
    %4 = extractelement <4 x i32> %3, i32 0
    ret i32 %4
  }

When compiled to 32-bit ARM assembly code, this generates

  _bar:
      vld2.32 {d16, d17, d18, d19}, [r0:256]
      vmov.32 r0, d18[0]
      bx      lr

According to https://llvm.org/doxygen/DataLayout_8cpp_source.html#l00534
(DataLayout::getAlignmentInfo()), the preferred alignment for a vector of 4
32-bit integers should be 16 bytes, whereas here the vector is accessed with 32
byte alignment.

The problem disappears if explicit alignment is added to the load IR
instruction or if the shufflevector IR instruction is removed.

A more detailed sample and output assembly are attached. We can see that:

* The alloca in foo resulted into a 16 byte = 128 bit aligned stack allocation
(ok);
* The constant vector (from foo) has got a 16 byte aligned storage (ok);
* The vld1.64 instruction (that loads the constant vector) assumes 16 byte
alignment (ok);
* The vst1.64 instruction in foo (store to the alloca) assumes 16 byte
alignment as well (ok);
* The vld2.32 instruction in bar (combined load + shufflevector IR
instructions) assumes 32 byte alignment (bug?)

The problem gets "fixed" by running InstCombine optimization pass on the IR
before compilation. The optimization pass however just adds an explicit
alignment to the load instruction equal to the preferred alignment for <4 x
i32>.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190326/73f22992/attachment.html>