[LLVMdev] Vectorized LLVM IR

Stéphane Letz letz at grame.fr
Sat May 29 01:23:41 PDT 2010


> 
> <32 x float> takes up 8 SSE registers; you're likely running into
> issues with register pressure.  Does it work better if you use
> something smaller like <4 x float>?
> 
> Besides that, I don't see any obvious issues.
> 
> -Eli


You are right yes. The code works faster with  <4 x float> types, with still works a bit slower than the scalar version.

Stéphane Letz



More information about the llvm-dev mailing list