[LLVMbugs] [Bug 6246] New: Semi-automatic vectorization when performing scalar operations on vector elements

Fri Feb 5 11:46:18 PST 2010

http://llvm.org/bugs/show_bug.cgi?id=6246

           Summary: Semi-automatic vectorization when performing scalar
                    operations on vector elements
           Product: libraries
           Version: 2.6
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Scalar Optimizations
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: llvm at henning-thielemann.de
                CC: llvmbugs at cs.uiuc.edu

In my automatically generated code it often happens that scalar operations are
applied to vector elements that could have been written as vector operations as
well. E.g. (due to modularization issues) I generate code like

define <4 x float> @_vadd(<4 x float>, <4 x float>) {
  %a0 = extractelement <4 x float> %0, i32 0
  %b0 = extractelement <4 x float> %1, i32 0
  %c0 = fadd float %a0, %b0
  %a1 = extractelement <4 x float> %0, i32 1
  %b1 = extractelement <4 x float> %1, i32 1
  %c1 = fadd float %a1, %b1
  %a2 = extractelement <4 x float> %0, i32 2
  %b2 = extractelement <4 x float> %1, i32 2
  %c2 = fadd float %a2, %b2
  %a3 = extractelement <4 x float> %0, i32 3
  %b3 = extractelement <4 x float> %1, i32 3
  %c3 = fadd float %a3, %b3
  %d0 = insertelement <4 x float> undef, float %c0, i32 0
  %d1 = insertelement <4 x float> %d0, float %c1, i32 1
  %d2 = insertelement <4 x float> %d1, float %c2, i32 2
  %d3 = insertelement <4 x float> %d2, float %c3, i32 3
  ret <4 x float> %d3
}

I think it would be both correct and more efficient to swap 'fadd's and
'extractelements' by an optimization pass which would yield:

define <4 x float> @_vadd(<4 x float>, <4 x float>) nounwind readnone {
  %c = fadd <4 x float> %0, %1
  %c0 = extractelement <4 x float> %c, i32 0
  %c1 = extractelement <4 x float> %c, i32 1
  %c2 = extractelement <4 x float> %c, i32 2
  %c3 = extractelement <4 x float> %c, i32 3
  %d0 = insertelement <4 x float> undef, float %c0, i32 0
  %d1 = insertelement <4 x float> %d0, float %c1, i32 1
  %d2 = insertelement <4 x float> %d1, float %c2, i32 2
  %d3 = insertelement <4 x float> %d2, float %c3, i32 3
  ret <4 x float> %d3
}

That the remaining extractelements and insertelements are the identity
transform is already correctly detected both by the optimizer and the (X86)
code generator. The optimizer transforms the last piece of code to something
like:

define <4 x float> @_vadd(<4 x float>, <4 x float>) nounwind readnone {
  %c = fadd <4 x float> %0, %1
  ret <4 x float> %c
}

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.