[LLVMbugs] [Bug 11775] New: [AVX] opportunity for better code by transforming vec w/one element used to scalar
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Mon Jan 16 16:36:10 PST 2012
http://llvm.org/bugs/show_bug.cgi?id=11775
Bug #: 11775
Summary: [AVX] opportunity for better code by transforming vec
w/one element used to scalar
Product: new-bugs
Version: trunk
Platform: PC
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P
Component: new bugs
AssignedTo: unassignedbugs at nondot.org
ReportedBy: matt at pharr.org
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
Created attachment 7887
--> http://llvm.org/bugs/attachment.cgi?id=7887
examples
The attached test case has two versions of a loop over float values in memory,
where each time through the loop an 8-wide vector of floats is loaded and added
to an accumulated <8 x float> sum, which the function returns.
In the first version, foo(), the %iter_val42342 value is a vector that has
value <0,1,2,3,4,5,6,7> the first time through the loop, <8,9,10,...> the
second time through, and so forth. As it turns out, this value is only used in
an extractelement instruction, the result of which is used to index into the
array of floats.
Here is the generated code for the loop body (with top of tree, llc
-mattr=+avx):
LBB0_1: ## %foreach_full_body
## =>This Inner Loop Header: Depth=1
addl $8, %ecx
vmovd %ecx, %xmm3
vinsertf128 $1, %xmm3, %ymm3, %ymm3
vpermilps $0, %ymm3, %ymm3 ## ymm3 = ymm3[0,0,0,0,4,4,4,4]
vmovd %xmm2, %edx
shll $2, %edx
movslq %edx, %rdx
vmovups (%rdi,%rdx), %ymm2
vaddps %ymm2, %ymm0, %ymm0
vextractf128 $1, %ymm3, %xmm2
vextractf128 $1, %ymm1, %xmm4
vpaddd %xmm4, %xmm2, %xmm2
vpaddd %xmm1, %xmm3, %xmm3
cmpl %eax, %ecx
vinsertf128 $1, %xmm2, %ymm3, %ymm2
jl LBB0_1
The code is going through all of the work to maintain all of the vector values,
even though only one is needed (doubly-painful with AVX and only 4-wide integer
instructions.) This is also inhibiting other optimizations.
In the bar() function in the attached, I've manually transformed this vector
into a scalar value. The resulting code is much nicer.
LBB1_1: ## %foreach_full_body
## =>This Inner Loop Header: Depth=1
movslq %ecx, %rcx
vmovups (%rdi,%rcx), %ymm1
vaddps %ymm1, %ymm0, %ymm0
addl $32, %ecx
addl $8, %edx
cmpl %eax, %edx
jl LBB1_1
This suggests that it might be worthwhile to look for computations on vectors
where only one of the elements is used, and to lower these down to the
corresponding scalar computation if possible.
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list