[llvm-bugs] [Bug 47677] New: VLD2 shuffle patterns gets broken apart by instcombine
via llvm-bugs
llvm-bugs at lists.llvm.org
Tue Sep 29 04:10:35 PDT 2020
https://bugs.llvm.org/show_bug.cgi?id=47677
Bug ID: 47677
Summary: VLD2 shuffle patterns gets broken apart by instcombine
Product: libraries
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: Scalar Optimizations
Assignee: unassignedbugs at nondot.org
Reporter: david.green at arm.com
CC: llvm-bugs at lists.llvm.org
Give code that the vectorizer chooses to create a VLD2 interleaving group for:
https://godbolt.org/z/P6E88r
We generate a loop that does:
; VLD2 group
%wide.vec = load <8 x float>, <8 x float>* %2, align 4, !tbaa !11
%strided.vec = shufflevector <8 x float> %wide.vec, <8 x float> undef, <4 x
i32> <i32 0, i32 2, i32 4, i32 6>
%strided.vec19 = shufflevector <8 x float> %wide.vec, <8 x float> undef, <4 x
i32> <i32 1, i32 3, i32 5, i32 7>
; Operations
%3 = fmul fast <4 x float> %strided.vec, %strided.vec
%4 = fmul fast <4 x float> %strided.vec19, %strided.vec19
%5 = fadd fast <4 x float> %4, %3
; store
store <4 x float> %5, <4 x float>* %7, align 4, !tbaa !11
Currently in instcombine's foldVectorBinop, it will sink the shuffle's past the
fmul (as each side has an equal shuffle mask). This means we end up with
regular non-interleaving loads and potentially expensive shuffles in the middle
of the fmul and fadd.
On NEON for aarch64 and arm this will create zip instructions. For MVE where
the zip/unz are not present we fall back to even more expensive registry moved.
Similar things can happen in other cases:
https://godbolt.org/z/WKnPE9
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200929/e588e093/attachment.html>
More information about the llvm-bugs
mailing list