[LLVMdev] Seg faulting on vector ops
Evan Cheng
evan.cheng at apple.com
Thu Jul 26 00:15:23 PDT 2007
I am fairly certain this is right. Chuck, can you do a quick
experiment for me? Go back to your original code but make sure the
alloca instruction specify 16-byte alignment. The code should work.
If not, please file a bug.
Thanks,
Evan
On Jul 24, 2007, at 1:58 PM, Evan Cheng wrote:
> Hrm. This problem shouldn't be target specific. I am pretty sure
> prologue / epilogue inserter aligns stack correctly if there are
> stack objects with greater than default stack alignment requirement.
> Seems to be the initial alloca() instruction should specify 16 byte
> alignment?
>
> Evan
>
> On Jul 21, 2007, at 2:51 PM, Chris Lattner wrote:
>
>> On Fri, 20 Jul 2007, Chuck Rose III wrote:
>>> I'm looking to make use of the vectorization primitives in the Intel
>>> chip with the code we generate from LLVM and so I've started
>>> experimenting with it. What is the state of the machine code
>>> generated
>>> for vectors? In my tinkering, I seem to be getting some wonky
>>> machine
>>> instructions, but I'm most likely just doing something wrong and I'm
>>> hoping you can set me in the correct course.
>>
>> Hi Chuck,
>>
>> Evan's solution is the right one. However, your code is valid, so it
>> shouldn't crash. I think it dies because linux does not guarantee
>> that
>> the stack is 16 byte aligned, and the vector operations expect
>> this. The
>> code generator should compensate and dynamically align the stack on
>> entry
>> to the function. This should be a relatively straight-forward
>> extension
>> to the x86 backend if you're interested.
>>
>> -Chris
>>
>>> My minimal function creates a float4 vector with a specified
>>> scalar in
>>> all the elements. It then extracts the third element and returns
>>> it.
>>>
>>>
>>>
>>> We are currently using the JIT and I'm currently synced to about a
>>> week
>>> after the 2.0 branch, so I'm admittedly stale by about a month.
>>>
>>>
>>>
>>> In LLVM IR:
>>>
>>>
>>>
>>> ; ModuleID = 'test vectors'
>>>
>>>
>>>
>>> define float @vSelect3(float %x) {
>>>
>>> body:
>>>
>>> %pv = alloca <4 x float> ; <<4 x float>*>
>>> [#uses=1]
>>>
>>> %v = load <4 x float>* %pv ; <<4 x float>>
>>> [#uses=1]
>>>
>>> %v1 = insertelement <4 x float> %v, float %x, i32 0
>>> ; <<4 x
>>>
>>> float>> [#uses=1]
>>>
>>> %v2 = insertelement <4 x float> %v1, float %x, i32 1
>>> ; <<4 x
>>>
>>> float>> [#uses=1]
>>>
>>> %v3 = insertelement <4 x float> %v2, float %x, i32 2
>>> ; <<4 x
>>>
>>> float>> [#uses=1]
>>>
>>> %v4 = insertelement <4 x float> %v3, float %x, i32 3
>>> ; <<4 x
>>>
>>> float>> [#uses=1]
>>>
>>> %s = extractelement <4 x float> %v4, i32 3 ;
>>> <float> [#uses
>>>
>>> =1]
>>>
>>> ret float %s
>>>
>>> }
>>>
>>>
>>>
>>> In Intel assembly, I get the following:
>>>
>>>
>>>
>>> 00000000`01b80010 83ec20 sub esp,20h
>>>
>>> 00000000`01b80013 f30f10442424 movss xmm0,dword ptr [esp
>>> +24h] <--
>>> this loads x into the low float of xmm0
>>>
>>> 00000000`01b80019 0f284c2404 movaps xmm1,xmmword ptr [esp
>>> +4] <--
>>> this seg faults because esp+4 isn't 16-byte aligned
>>>
>>> What is that line trying to achieve? X is at [esp+24]. There
>>> weren't
>>> any other parameters.
>>>
>>>
>>>
>>> 00000000`01b8001e f30f10c8 movss xmm1,xmm0
>>>
>>> 00000000`01b80022 8b442424 mov eax,dword ptr [esp+24h]
>>>
>>> 00000000`01b80026 660fc4c802 pinsrw xmm1,eax,2
>>>
>>> 00000000`01b8002b 89c1 mov ecx,eax
>>>
>>> 00000000`01b8002d c1e910 shr ecx,10h
>>>
>>> 00000000`01b80030 660fc4c903 pinsrw xmm1,ecx,3
>>>
>>> 00000000`01b80035 660fc4c804 pinsrw xmm1,eax,4
>>>
>>> 00000000`01b8003a 660fc4c905 pinsrw xmm1,ecx,5
>>>
>>> 00000000`01b8003f 660fc4c806 pinsrw xmm1,eax,6
>>>
>>> 00000000`01b80044 660fc4c907 pinsrw xmm1,ecx,7
>>>
>>> 00000000`01b80049 0fc6c903 shufps xmm1,xmm1,3
>>>
>>> 00000000`01b8004d f30f110c24 movss dword ptr [esp],xmm1
>>>
>>> 00000000`01b80052 d90424 fld dword ptr [esp]
>>>
>>> 00000000`01b80055 83c420 add esp,20h
>>>
>>> 00000000`01b80058 c3 ret
>>>
>>>
>>>
>>> The code used to generate and run the program was:
>>>
>>>
>>>
>>> #include "llvm/Module.h"
>>>
>>> #include "llvm/DerivedTypes.h"
>>>
>>> #include "llvm/Constants.h"
>>>
>>> #include "llvm/Instructions.h"
>>>
>>> #include "llvm/ModuleProvider.h"
>>>
>>> #include "llvm/Analysis/Verifier.h"
>>>
>>> #include "llvm/System/DynamicLibrary.h"
>>>
>>> #include "llvm/ExecutionEngine/JIT.h"
>>>
>>> #include "llvm/ExecutionEngine/Interpreter.h"
>>>
>>> #include "llvm/ExecutionEngine/GenericValue.h"
>>>
>>> #include "llvm/Support/ManagedStatic.h"
>>>
>>> #include <iostream>
>>>
>>> using namespace llvm;
>>>
>>>
>>>
>>> Value* makeVector(Value* s, unsigned int dim, BasicBlock*
>>> basicBlock)
>>>
>>> {
>>>
>>> AllocaInst* pV = new
>>> AllocaInst(VectorType::get(Type::FloatTy,dim),"pv",basicBlock);
>>>
>>> Value* v = new LoadInst(pV,"v",basicBlock);
>>>
>>>
>>>
>>> for (unsigned int i = 0 ; i < dim ; ++i)
>>>
>>> v = new InsertElementInst(v,s,i,"v",basicBlock);
>>>
>>>
>>>
>>> return v;
>>>
>>> }
>>>
>>>
>>>
>>> Function* generateVectorAndSelect(Module* pModule)
>>>
>>> {
>>>
>>> std::vector<Type const*> params;
>>>
>>>
>>>
>>> params.push_back(Type::FloatTy);
>>>
>>>
>>>
>>> FunctionType* funcType =
>>> FunctionType::get(Type::FloatTy,params,NULL);
>>>
>>> Function* func =
>>> cast<Function>(pModule->getOrInsertFunction("vSelect3",funcType));
>>>
>>>
>>>
>>> BasicBlock* basicBlock = new BasicBlock("body",func);
>>>
>>>
>>>
>>> Function::arg_iterator args = func->arg_begin();
>>>
>>> Argument* x = args;
>>>
>>> x->setName("x");
>>>
>>>
>>>
>>> Value* v1 = makeVector(x,4,basicBlock);
>>>
>>>
>>>
>>> Value* s = new ExtractElementInst(v1,3,"s",basicBlock);
>>>
>>>
>>>
>>> new ReturnInst(s,basicBlock);
>>>
>>>
>>>
>>> return func;
>>>
>>> }
>>>
>>>
>>>
>>> // modified from the fibonacci example
>>>
>>> int main(int argc, char **argv)
>>>
>>> {
>>>
>>> Module* pVectorModule = new Module("test vectors");
>>>
>>>
>>>
>>> Function* pMain = generateVectorAndSelect(pVectorModule);
>>>
>>>
>>>
>>> pVectorModule->print(std::cout);
>>>
>>>
>>>
>>> GenericValue gv1, gv2, gvR;
>>>
>>>
>>>
>>> gv1.FloatVal = 2.0f;
>>>
>>>
>>>
>>> ExistingModuleProvider *pMP = new
>>> ExistingModuleProvider(pVectorModule);
>>>
>>>
>>> pMP->getModule()->setDataLayout("e-p:32:32:32-i1:8:8:8-i8:8:8:8-
>>> i32:32:3
>>> 2:32-f32:32:32:32");
>>>
>>> ExecutionEngine *pEE = ExecutionEngine::create(pMP, false);
>>>
>>>
>>>
>>> std::vector<GenericValue> args;
>>>
>>>
>>>
>>> args.push_back(gv1);
>>>
>>>
>>>
>>> GenericValue result = pEE->runFunction(pMain, args);
>>>
>>>
>>>
>>> return 0;
>>>
>>> }
>>>
>>>
>>>
>>>
>>>
>>> Any help would be appreciated.
>>>
>>> .
>>>
>>> Thanks,
>>>
>>> Chuck.
>>>
>>>
>>
>> -Chris
>>
>> --
>> http://nondot.org/sabre/
>> http://llvm.org/
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list