[LLVMdev] movaps being generated despite alignment 1 being specified

Thu Oct 18 13:52:03 PDT 2007

Hello LLVMers,

High order bit:  

Presence of a called function is causing a store on an unrelated vector
to generate an aligned store rather an unaligned one despite unaligned
store being indicated in the associated StoreInst. 

Details:

I pulled down the latest source, so this is something I'm finding with
the current LLVM.  I'm hoping you'll have an idea what's going on or at
least know if it's a new issue I should log.  It's related to the stack
alignment issue that I know is being worked on, but seems sufficiently
different to ask about it here.   I checked the bug database for "align"
and "movaps" and didn't see this issue raised.

Ok, the first bit of code here seems to generate correct assembly for
me.  Basically, it copies the float4 stored at globalV and copies it
into the address pointed to by dependentV.  Along the way, it creates a
<4 x float> and copies globalV into a temporary.  I'm working on
bridging the gap between the outside of our system and the LLVM
generated code, so there is a little extra copying from and to
parameters at the boundaries of this function.  Since this is just a
repro-example, there is very little besides the boundaries here. :-)  I
fully admit the constructions below may not be optimal. 

   ; ModuleID = 'hydra'

   target datalayout =
"E-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:32:32-f32:32:32:32"

   define void @evaluateDependents(float* %dependentV, float* %globalV)
{

   Entry_evaluateDependents:

        %Promoted_dependentV_Ptr = alloca <4 x float>, align 16
; <<4 x float>*> [#uses=2]

        %Promoted_globalV_Ptr = alloca <4 x float>, align 16
; <<4 x float>*> [#uses=2]

        %externalVectorPtrCast = bitcast float* %globalV to <4 x float>*
; <<4 x float>*> [#uses=1]

        %externalVectorLoaded = load <4 x float>*
%externalVectorPtrCast, align 1               ; <<4 x float>> [#uses=1]

        store <4 x float> %externalVectorLoaded, <4 x float>*
%Promoted_globalV_Ptr, align 1

        %globalV1 = load <4 x float>* %Promoted_globalV_Ptr, align 1
; <<4 x float>> [#uses=1]

        br label %Body_evaluateDependents

   Body_evaluateDependents:             ; preds =
%Entry_evaluateDependents

        store <4 x float> %globalV1, <4 x float>*
%Promoted_dependentV_Ptr, align 1

        br label %Exit_evaluateDependents

   Exit_evaluateDependents:             ; preds =
%Body_evaluateDependents

        %vectorToDemote = load <4 x float>* %Promoted_dependentV_Ptr,
align 1           ; <<4 x float>> [#uses=1]

        %externalVectorPtrCast2 = bitcast float* %dependentV to <4 x
float>*            ; <<4 x float>*> [#uses=1]

        store <4 x float> %vectorToDemote, <4 x float>*
%externalVectorPtrCast2, align 1

        ret void

   }

Produces these instructions which obeys all the align 1 directives on
the LoadInsts and StoreInsts..

...

15D10010  sub         esp,2Ch 

15D10013  mov         eax,dword ptr [esp+34h] 

15D10017  movups      xmm0,xmmword ptr [eax] 

15D1001A  movups      xmmword ptr [esp],xmm0 

15D1001E  mov         eax,dword ptr [esp+30h] 

15D10022  movups      xmmword ptr [esp+10h],xmm0 

15D10027  movups      xmm0,xmmword ptr [esp+10h] 

15D1002C  movups      xmmword ptr [eax],xmm0 

15D1002F  add         esp,2Ch 

15D10032  ret              

Here's where it gets weird and confusing to me.  Let's make our
evaluateDependents function do something else.  In addition to copying
globalV into dependentV, it's also going to set a singleton float
pointed to by dependentF.  We'll call a function foo to get the value.
(I tried setting dependentF directly and that did NOT cause the problem
with the generated code).  Here's the LLVM code:

   ; ModuleID = 'hydra'

   target datalayout =
"E-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:32:32-f32:32:32:32"

   define float @foo(float %Y) {

   Entry_foo:

        %_ReturnValuePtr = alloca float         ; <float*> [#uses=2]

        br label %Body_foo

   Body_foo:            ; preds = %Entry_foo

        store float %Y, float* %_ReturnValuePtr, align 1

        br label %Exit_foo

   Exit_foo:            ; preds = %Body_foo

        %finalValue = load float* %_ReturnValuePtr, align 1
; <float> [#uses=1]

        ret float %finalValue

   }

   define void @evaluateDependents(float* %dependentF, float*
%dependentV, float* %globalV) {

   Entry_evaluateDependents:

        %Promoted_dependentV_Ptr = alloca <4 x float>, align 16
; <<4 x float>*> [#uses=2]

        %Promoted_globalV_Ptr = alloca <4 x float>, align 16
; <<4 x float>*> [#uses=2]

        %externalVectorPtrCast = bitcast float* %globalV to <4 x float>*
; <<4 x float>*> [#uses=1]

        %externalVectorLoaded = load <4 x float>*
%externalVectorPtrCast, align 1               ; <<4 x float>> [#uses=1]

        store <4 x float> %externalVectorLoaded, <4 x float>*
%Promoted_globalV_Ptr, align 1

        %globalV1 = load <4 x float>* %Promoted_globalV_Ptr, align 1
; <<4 x float>> [#uses=1]

        br label %Body_evaluateDependents

   Body_evaluateDependents:             ; preds =
%Entry_evaluateDependents

        %fooResult = call float @foo( float 2.000000e+000 )
; <float> [#uses=1]

        store float %fooResult, float* %dependentF, align 1

        store <4 x float> %globalV1, <4 x float>*
%Promoted_dependentV_Ptr, align 1

        br label %Exit_evaluateDependents

   Exit_evaluateDependents:             ; preds =
%Body_evaluateDependents

        %vectorToDemote = load <4 x float>* %Promoted_dependentV_Ptr,
align 1           ; <<4 x float>> [#uses=1]

        %externalVectorPtrCast2 = bitcast float* %dependentV to <4 x
float>*            ; <<4 x float>*> [#uses=1]

        store <4 x float> %vectorToDemote, <4 x float>*
%externalVectorPtrCast2, align 1

        ret void

   }

Here are the instructions for evaluateDependents.  The JITter hasn't
compiled foo yet.  What's confusing to me is why did my movups suddenly
become a movaps?  All the stores and loads have align 1 on them.

...

15D10012  sub         esp,4Ch 

15D10015  mov         eax,dword ptr [esp+60h] 

15D10019  movups      xmm0,xmmword ptr [eax] 

15D1001C  movaps      xmmword ptr [esp+8],xmm0    <-- why did this
become a movaps?

15D10021  movups      xmmword ptr [esp+28h],xmm0 

15D10026  mov         esi,dword ptr [esp+58h] 

15D1002A  mov         edi,dword ptr [esp+5Ch] 

15D1002E  mov         dword ptr [esp],40000000h 

15D10035  call        X86CompilationCallback (1335030h)

Thanks for the help!

Chuck.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20071018/9bd4da0c/attachment.html>