[LLVMbugs] [Bug 6246] New: Semi-automatic vectorization when performing scalar operations on vector elements
bugzilla-daemon at cs.uiuc.edu
bugzilla-daemon at cs.uiuc.edu
Fri Feb 5 11:46:18 PST 2010
http://llvm.org/bugs/show_bug.cgi?id=6246
Summary: Semi-automatic vectorization when performing scalar
operations on vector elements
Product: libraries
Version: 2.6
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Scalar Optimizations
AssignedTo: unassignedbugs at nondot.org
ReportedBy: llvm at henning-thielemann.de
CC: llvmbugs at cs.uiuc.edu
In my automatically generated code it often happens that scalar operations are
applied to vector elements that could have been written as vector operations as
well. E.g. (due to modularization issues) I generate code like
define <4 x float> @_vadd(<4 x float>, <4 x float>) {
%a0 = extractelement <4 x float> %0, i32 0
%b0 = extractelement <4 x float> %1, i32 0
%c0 = fadd float %a0, %b0
%a1 = extractelement <4 x float> %0, i32 1
%b1 = extractelement <4 x float> %1, i32 1
%c1 = fadd float %a1, %b1
%a2 = extractelement <4 x float> %0, i32 2
%b2 = extractelement <4 x float> %1, i32 2
%c2 = fadd float %a2, %b2
%a3 = extractelement <4 x float> %0, i32 3
%b3 = extractelement <4 x float> %1, i32 3
%c3 = fadd float %a3, %b3
%d0 = insertelement <4 x float> undef, float %c0, i32 0
%d1 = insertelement <4 x float> %d0, float %c1, i32 1
%d2 = insertelement <4 x float> %d1, float %c2, i32 2
%d3 = insertelement <4 x float> %d2, float %c3, i32 3
ret <4 x float> %d3
}
I think it would be both correct and more efficient to swap 'fadd's and
'extractelements' by an optimization pass which would yield:
define <4 x float> @_vadd(<4 x float>, <4 x float>) nounwind readnone {
%c = fadd <4 x float> %0, %1
%c0 = extractelement <4 x float> %c, i32 0
%c1 = extractelement <4 x float> %c, i32 1
%c2 = extractelement <4 x float> %c, i32 2
%c3 = extractelement <4 x float> %c, i32 3
%d0 = insertelement <4 x float> undef, float %c0, i32 0
%d1 = insertelement <4 x float> %d0, float %c1, i32 1
%d2 = insertelement <4 x float> %d1, float %c2, i32 2
%d3 = insertelement <4 x float> %d2, float %c3, i32 3
ret <4 x float> %d3
}
That the remaining extractelements and insertelements are the identity
transform is already correctly detected both by the optimizer and the (X86)
code generator. The optimizer transforms the last piece of code to something
like:
define <4 x float> @_vadd(<4 x float>, <4 x float>) nounwind readnone {
%c = fadd <4 x float> %0, %1
ret <4 x float> %c
}
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list