[LLVMdev] Vectorized LLVM IR

Sat May 29 04:02:09 PDT 2010

Le 29 mai 2010 à 10:40, Eli Friedman a écrit :

> On Sat, May 29, 2010 at 1:23 AM, Stéphane Letz <letz at grame.fr> wrote:
>>> 
>>> <32 x float> takes up 8 SSE registers; you're likely running into
>>> issues with register pressure.  Does it work better if you use
>>> something smaller like <4 x float>?
>>> 
>>> Besides that, I don't see any obvious issues.
>>> 
>>> -Eli
>> 
>> 
>> You are right yes. The code works faster with  <4 x float> types, with still works a bit slower than the scalar version.
>> 
>> Stéphane Letz
> 
> Huh, that's strange... umm, possibly a stupid question, but are both
> versions doing the same amount of work?  (It doesn't look like the
> vector version adjusts the loop count in the given code).

Yes, right. This was incorrect, fixed now.

> 
> Besides that, I can't think of any reason why the vector version would
> be slower except possibly memory bandwidth issues.
> 
> -Eli

Now it starts to be comparable.

Thanks

Stéphane Letz