[LLVMdev] Vectorized LLVM IR
Stéphane Letz
letz at grame.fr
Sat May 29 01:23:41 PDT 2010
>
> <32 x float> takes up 8 SSE registers; you're likely running into
> issues with register pressure. Does it work better if you use
> something smaller like <4 x float>?
>
> Besides that, I don't see any obvious issues.
>
> -Eli
You are right yes. The code works faster with <4 x float> types, with still works a bit slower than the scalar version.
Stéphane Letz
More information about the llvm-dev
mailing list