[LLVMdev] movaps being generated despite alignment 1 being specified
Chuck Rose III
cfr at adobe.com
Thu Oct 18 13:52:03 PDT 2007
Hello LLVMers,
High order bit:
Presence of a called function is causing a store on an unrelated vector
to generate an aligned store rather an unaligned one despite unaligned
store being indicated in the associated StoreInst.
Details:
I pulled down the latest source, so this is something I'm finding with
the current LLVM. I'm hoping you'll have an idea what's going on or at
least know if it's a new issue I should log. It's related to the stack
alignment issue that I know is being worked on, but seems sufficiently
different to ask about it here. I checked the bug database for "align"
and "movaps" and didn't see this issue raised.
Ok, the first bit of code here seems to generate correct assembly for
me. Basically, it copies the float4 stored at globalV and copies it
into the address pointed to by dependentV. Along the way, it creates a
<4 x float> and copies globalV into a temporary. I'm working on
bridging the gap between the outside of our system and the LLVM
generated code, so there is a little extra copying from and to
parameters at the boundaries of this function. Since this is just a
repro-example, there is very little besides the boundaries here. :-) I
fully admit the constructions below may not be optimal.
; ModuleID = 'hydra'
target datalayout =
"E-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:32:32-f32:32:32:32"
define void @evaluateDependents(float* %dependentV, float* %globalV)
{
Entry_evaluateDependents:
%Promoted_dependentV_Ptr = alloca <4 x float>, align 16
; <<4 x float>*> [#uses=2]
%Promoted_globalV_Ptr = alloca <4 x float>, align 16
; <<4 x float>*> [#uses=2]
%externalVectorPtrCast = bitcast float* %globalV to <4 x float>*
; <<4 x float>*> [#uses=1]
%externalVectorLoaded = load <4 x float>*
%externalVectorPtrCast, align 1 ; <<4 x float>> [#uses=1]
store <4 x float> %externalVectorLoaded, <4 x float>*
%Promoted_globalV_Ptr, align 1
%globalV1 = load <4 x float>* %Promoted_globalV_Ptr, align 1
; <<4 x float>> [#uses=1]
br label %Body_evaluateDependents
Body_evaluateDependents: ; preds =
%Entry_evaluateDependents
store <4 x float> %globalV1, <4 x float>*
%Promoted_dependentV_Ptr, align 1
br label %Exit_evaluateDependents
Exit_evaluateDependents: ; preds =
%Body_evaluateDependents
%vectorToDemote = load <4 x float>* %Promoted_dependentV_Ptr,
align 1 ; <<4 x float>> [#uses=1]
%externalVectorPtrCast2 = bitcast float* %dependentV to <4 x
float>* ; <<4 x float>*> [#uses=1]
store <4 x float> %vectorToDemote, <4 x float>*
%externalVectorPtrCast2, align 1
ret void
}
Produces these instructions which obeys all the align 1 directives on
the LoadInsts and StoreInsts..
...
15D10010 sub esp,2Ch
15D10013 mov eax,dword ptr [esp+34h]
15D10017 movups xmm0,xmmword ptr [eax]
15D1001A movups xmmword ptr [esp],xmm0
15D1001E mov eax,dword ptr [esp+30h]
15D10022 movups xmmword ptr [esp+10h],xmm0
15D10027 movups xmm0,xmmword ptr [esp+10h]
15D1002C movups xmmword ptr [eax],xmm0
15D1002F add esp,2Ch
15D10032 ret
Here's where it gets weird and confusing to me. Let's make our
evaluateDependents function do something else. In addition to copying
globalV into dependentV, it's also going to set a singleton float
pointed to by dependentF. We'll call a function foo to get the value.
(I tried setting dependentF directly and that did NOT cause the problem
with the generated code). Here's the LLVM code:
; ModuleID = 'hydra'
target datalayout =
"E-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:32:32-f32:32:32:32"
define float @foo(float %Y) {
Entry_foo:
%_ReturnValuePtr = alloca float ; <float*> [#uses=2]
br label %Body_foo
Body_foo: ; preds = %Entry_foo
store float %Y, float* %_ReturnValuePtr, align 1
br label %Exit_foo
Exit_foo: ; preds = %Body_foo
%finalValue = load float* %_ReturnValuePtr, align 1
; <float> [#uses=1]
ret float %finalValue
}
define void @evaluateDependents(float* %dependentF, float*
%dependentV, float* %globalV) {
Entry_evaluateDependents:
%Promoted_dependentV_Ptr = alloca <4 x float>, align 16
; <<4 x float>*> [#uses=2]
%Promoted_globalV_Ptr = alloca <4 x float>, align 16
; <<4 x float>*> [#uses=2]
%externalVectorPtrCast = bitcast float* %globalV to <4 x float>*
; <<4 x float>*> [#uses=1]
%externalVectorLoaded = load <4 x float>*
%externalVectorPtrCast, align 1 ; <<4 x float>> [#uses=1]
store <4 x float> %externalVectorLoaded, <4 x float>*
%Promoted_globalV_Ptr, align 1
%globalV1 = load <4 x float>* %Promoted_globalV_Ptr, align 1
; <<4 x float>> [#uses=1]
br label %Body_evaluateDependents
Body_evaluateDependents: ; preds =
%Entry_evaluateDependents
%fooResult = call float @foo( float 2.000000e+000 )
; <float> [#uses=1]
store float %fooResult, float* %dependentF, align 1
store <4 x float> %globalV1, <4 x float>*
%Promoted_dependentV_Ptr, align 1
br label %Exit_evaluateDependents
Exit_evaluateDependents: ; preds =
%Body_evaluateDependents
%vectorToDemote = load <4 x float>* %Promoted_dependentV_Ptr,
align 1 ; <<4 x float>> [#uses=1]
%externalVectorPtrCast2 = bitcast float* %dependentV to <4 x
float>* ; <<4 x float>*> [#uses=1]
store <4 x float> %vectorToDemote, <4 x float>*
%externalVectorPtrCast2, align 1
ret void
}
Here are the instructions for evaluateDependents. The JITter hasn't
compiled foo yet. What's confusing to me is why did my movups suddenly
become a movaps? All the stores and loads have align 1 on them.
...
15D10012 sub esp,4Ch
15D10015 mov eax,dword ptr [esp+60h]
15D10019 movups xmm0,xmmword ptr [eax]
15D1001C movaps xmmword ptr [esp+8],xmm0 <-- why did this
become a movaps?
15D10021 movups xmmword ptr [esp+28h],xmm0
15D10026 mov esi,dword ptr [esp+58h]
15D1002A mov edi,dword ptr [esp+5Ch]
15D1002E mov dword ptr [esp],40000000h
15D10035 call X86CompilationCallback (1335030h)
Thanks for the help!
Chuck.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20071018/9bd4da0c/attachment.html>
More information about the llvm-dev
mailing list