[LLVMdev] Vectorized LLVM IR

Sat May 29 01:40:28 PDT 2010

On Sat, May 29, 2010 at 1:23 AM, Stéphane Letz <letz at grame.fr> wrote:
>>
>> <32 x float> takes up 8 SSE registers; you're likely running into
>> issues with register pressure.  Does it work better if you use
>> something smaller like <4 x float>?
>>
>> Besides that, I don't see any obvious issues.
>>
>> -Eli
>
>
> You are right yes. The code works faster with  <4 x float> types, with still works a bit slower than the scalar version.
>
> Stéphane Letz

Huh, that's strange... umm, possibly a stupid question, but are both
versions doing the same amount of work?  (It doesn't look like the
vector version adjusts the loop count in the given code).

Besides that, I can't think of any reason why the vector version would
be slower except possibly memory bandwidth issues.

-Eli