[LLVMdev] Seg faulting on vector ops

Evan Cheng evan.cheng at apple.com
Fri Jul 20 14:10:49 PDT 2007


Hi Chuck!

On Jul 20, 2007, at 11:36 AM, Chuck Rose III wrote:

> Hola LLVMers,
>
>
>
> I’m looking to make use of the vectorization primitives in the  
> Intel chip with the code we generate from LLVM and so I’ve started  
> experimenting with it.  What is the state of the machine code  
> generated for vectors?  In my tinkering, I seem to be getting some  
> wonky machine instructions, but I’m most likely just doing  
> something wrong and I’m hoping you can set me in the correct course.

Using SSE? The X86 backend is usually doing a pretty good job of it.
>
>
> My minimal function creates a float4 vector with a specified scalar  
> in all the elements.  It then extracts the third element and  
> returns it.
>
>
>
> We are currently using the JIT and I’m currently synced to about a  
> week after the 2.0 branch, so I’m admittedly stale by about a month.
>
>
>
> In LLVM IR:
>
>
>
> ; ModuleID = 'test vectors'
>
>
>
> define float @vSelect3(float %x) {
>
> body:
>
>         %pv = alloca <4 x float>                ; <<4 x float>*>  
> [#uses=1]
>
>         %v = load <4 x float>* %pv              ; <<4 x float>>  
> [#uses=1]
>         %v1 = insertelement <4 x float> %v, float %x, i32  
> 0             ; <<4 x

You are allocating a chunk of memory on the stack then loading the  
undefined value back. I suppose this should be legal. So perhaps  
there is a codegen bug. With tot, I see sub $28, %esp. Maybe that's  
already fixed.

But still, this is not what you want. You want to do this:

         %v1 = insertelement <4 x float> undef, float %x, i32  
0             ; <<4 x float>> [#uses=1]
         %v2 = insertelement <4 x float> %v1, float %x, i32  
1            ; <<4 x float>> [#uses=1]
         %v3 = insertelement <4 x float> %v2, float %x, i32  
2            ; <<4 xfloat>> [#uses=1]
         %v4 = insertelement <4 x float> %v3, float %x, i32 3

Starting from an undef and insert elements to form a vector.

Hope that helps.

Evan


> float>> [#uses=1]
>
>         %v2 = insertelement <4 x float> %v1, float %x, i32  
> 1            ; <<4 x
>
> float>> [#uses=1]
>
>         %v3 = insertelement <4 x float> %v2, float %x, i32  
> 2            ; <<4 x
>
> float>> [#uses=1]
>
>         %v4 = insertelement <4 x float> %v3, float %x, i32  
> 3            ; <<4 x
>
> float>> [#uses=1]
>
>         %s = extractelement <4 x float> %v4, i32 3              ;  
> <float> [#uses
>
> =1]
>
>         ret float %s
>
> }
>
>
>
> In Intel assembly, I get the following:
>
>
>
> 00000000`01b80010 83ec20          sub     esp,20h
>
> 00000000`01b80013 f30f10442424    movss   xmm0,dword ptr [esp 
> +24h]   ß this loads x into the low float of xmm0
>
> 00000000`01b80019 0f284c2404      movaps  xmm1,xmmword ptr [esp 
> +4]   ß this seg faults because esp+4 isn’t 16-byte aligned
>
> What is that line trying to achieve?  X is at [esp+24].  There  
> weren’t any other parameters.
>
>
>
> 00000000`01b8001e f30f10c8        movss   xmm1,xmm0
>
> 00000000`01b80022 8b442424        mov     eax,dword ptr [esp+24h]
>
> 00000000`01b80026 660fc4c802      pinsrw  xmm1,eax,2
>
> 00000000`01b8002b 89c1            mov     ecx,eax
>
> 00000000`01b8002d c1e910          shr     ecx,10h
>
> 00000000`01b80030 660fc4c903      pinsrw  xmm1,ecx,3
>
> 00000000`01b80035 660fc4c804      pinsrw  xmm1,eax,4
>
> 00000000`01b8003a 660fc4c905      pinsrw  xmm1,ecx,5
>
> 00000000`01b8003f 660fc4c806      pinsrw  xmm1,eax,6
>
> 00000000`01b80044 660fc4c907      pinsrw  xmm1,ecx,7
>
> 00000000`01b80049 0fc6c903        shufps  xmm1,xmm1,3
>
> 00000000`01b8004d f30f110c24      movss   dword ptr [esp],xmm1
>
> 00000000`01b80052 d90424          fld     dword ptr [esp]
>
> 00000000`01b80055 83c420          add     esp,20h
>
> 00000000`01b80058 c3              ret
>
>
>
> The code used to generate and run the program was:
>
>
>
> #include "llvm/Module.h"
>
> #include "llvm/DerivedTypes.h"
>
> #include "llvm/Constants.h"
>
> #include "llvm/Instructions.h"
>
> #include "llvm/ModuleProvider.h"
>
> #include "llvm/Analysis/Verifier.h"
>
> #include "llvm/System/DynamicLibrary.h"
>
> #include "llvm/ExecutionEngine/JIT.h"
>
> #include "llvm/ExecutionEngine/Interpreter.h"
>
> #include "llvm/ExecutionEngine/GenericValue.h"
>
> #include "llvm/Support/ManagedStatic.h"
>
> #include <iostream>
>
> using namespace llvm;
>
>
>
> Value* makeVector(Value* s, unsigned int dim, BasicBlock* basicBlock)
>
> {
>
>     AllocaInst* pV = new AllocaInst(VectorType::get 
> (Type::FloatTy,dim),"pv",basicBlock);
>
>     Value* v = new LoadInst(pV,"v",basicBlock);
>
>
>
>     for (unsigned int i = 0 ; i < dim ; ++i)
>
>         v = new InsertElementInst(v,s,i,"v",basicBlock);
>
>
>
>     return v;
>
> }
>
>
>
> Function* generateVectorAndSelect(Module* pModule)
>
> {
>
>     std::vector<Type const*> params;
>
>
>
>     params.push_back(Type::FloatTy);
>
>
>
>     FunctionType* funcType = FunctionType::get 
> (Type::FloatTy,params,NULL);
>
>     Function* func = cast<Function>(pModule->getOrInsertFunction 
> ("vSelect3",funcType));
>
>
>
>     BasicBlock* basicBlock = new BasicBlock("body",func);
>
>
>
>     Function::arg_iterator args = func->arg_begin();
>
>     Argument* x = args;
>
>     x->setName("x");
>
>
>
>     Value* v1 = makeVector(x,4,basicBlock);
>
>
>
>     Value* s = new ExtractElementInst(v1,3,"s",basicBlock);
>
>
>
>     new ReturnInst(s,basicBlock);
>
>
>
>     return func;
>
> }
>
>
>
> // modified from the fibonacci example
>
> int main(int argc, char **argv)
>
> {
>
>     Module* pVectorModule = new Module("test vectors");
>
>
>
>     Function* pMain = generateVectorAndSelect(pVectorModule);
>
>
>
>     pVectorModule->print(std::cout);
>
>
>
>     GenericValue gv1, gv2, gvR;
>
>
>
>     gv1.FloatVal = 2.0f;
>
>
>
>     ExistingModuleProvider *pMP = new ExistingModuleProvider 
> (pVectorModule);
>
>     pMP->getModule()->setDataLayout("e-p:32:32:32-i1:8:8:8-i8:8:8:8- 
> i32:32:32:32-f32:32:32:32");
>
>     ExecutionEngine *pEE = ExecutionEngine::create(pMP, false);
>
>
>
>     std::vector<GenericValue> args;
>
>
>
>     args.push_back(gv1);
>
>
>
>     GenericValue result = pEE->runFunction(pMain, args);
>
>
>
>     return 0;
>
> }
>
>
>
>
>
> Any help would be appreciated.
>
> .
>
> Thanks,
>
> Chuck.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070720/52dca1e7/attachment.html>


More information about the llvm-dev mailing list