[LLVMdev] Vectorized LLVM IR
Stéphane Letz
letz at grame.fr
Sat May 29 04:02:09 PDT 2010
Le 29 mai 2010 à 10:40, Eli Friedman a écrit :
> On Sat, May 29, 2010 at 1:23 AM, Stéphane Letz <letz at grame.fr> wrote:
>>>
>>> <32 x float> takes up 8 SSE registers; you're likely running into
>>> issues with register pressure. Does it work better if you use
>>> something smaller like <4 x float>?
>>>
>>> Besides that, I don't see any obvious issues.
>>>
>>> -Eli
>>
>>
>> You are right yes. The code works faster with <4 x float> types, with still works a bit slower than the scalar version.
>>
>> Stéphane Letz
>
> Huh, that's strange... umm, possibly a stupid question, but are both
> versions doing the same amount of work? (It doesn't look like the
> vector version adjusts the loop count in the given code).
Yes, right. This was incorrect, fixed now.
>
> Besides that, I can't think of any reason why the vector version would
> be slower except possibly memory bandwidth issues.
>
> -Eli
Now it starts to be comparable.
Thanks
Stéphane Letz
More information about the llvm-dev
mailing list