[LLVMdev] LLVM ARM VMLA instruction

suyog sarda sardask01 at gmail.com
Thu Dec 19 00:00:07 PST 2013


Hi all,


Thanks for the info. Few observations from my side :


LLVM :


cortex-a8 vfpv3 : no vmla or vfma instruction emitted

cortex-a8 vfpv4 : no vmla or vfma instruction emitted (This is invalid
though as cortex-a8 does not have vfpv4)

cortex-a8 vfpv4 with ffp-contract=fast : vfma instruction emitted ( this
seems a bug to me!! If cortex-a8 doesn't come with vfpv4 then vfma
instructions generated will be invalid )


cortex-a15 vfpv4 : vmla instruction emitted (which is a NEON instruction)

cortex-a15 vfpv4 with ffp-contract=fast vfma instruction emitted.


GCC :


cortex-a8 vfpv3 : vmla instruction emitted

cortex-a15 vfpv4 : vfma instruction emitted


I agree to the point that NEON and VFP instructions shouldn't be used
interchangeably.


However, if gcc emits vmla (NEON) instruction with cortex-a8 then shouldn't
LLVM also emit vmla (NEON) instruction? Can someone please clarify on this
point? The performance gain with vmla instruction is huge. Somewhere i read
that LLVM prefers precision accuracy over performance. Is this true and
hence LLVM is not emiting vmla instructions for cortex-a8?



On Thu, Dec 19, 2013 at 6:41 AM, Kay Tiong Khoo <kkhoo at perfwizard.com>wrote:

> Just to clarify: gcc 4.8.1 generates that fma at -O2; no FP relaxation or
> other flags specified.
>
>
> On Wed, Dec 18, 2013 at 6:02 PM, Kay Tiong Khoo <kkhoo at perfwizard.com>wrote:
>
>> Thanks for the explanation, Tim!
>>
>> gcc 4.8.1 *does* generate an fma for your code example for an x86 target
>> that supports fma. I'd bet that the HW vendors' compilers do the same, but
>> I don't have any of those installed at the moment to test that theory. So
>> this is a bug in those compilers? Do you know how they justify it?
>>
>> I see section 6.5 "Expressions" in the C standard, and I can see that
>> 6.5.8 would seem to agree with you assuming that a "floating expression" is
>> a subset of "expression"...is there any other part of the standard that you
>> know of that I can reference?
>>
>> This is made a little weirder by the fact that gcc and clang have a
>> 'fast' setting for fp-contract, but the C standard that I'm looking at
>> states that it is just an "on-off-switch".
>>
>>
>> On Wed, Dec 18, 2013 at 11:17 AM, Tim Northover <t.p.northover at gmail.com>wrote:
>>
>>> > http://llvm.org/bugs/show_bug.cgi?id=17188
>>> > http://llvm.org/bugs/show_bug.cgi?id=17211
>>>
>>> Ah, thanks. That makes a lot more sense now.
>>>
>>> > Correct - clang is different than gcc, icc, msvc, xlc, etc. on this.
>>> Still
>>> > haven't seen any explanation for how this is better though...
>>>
>>> That would be because it follows what C tells us a compiler has to do
>>> by default but provides overrides in either direction if you know what
>>> you're doing.
>>>
>>> The key point is that LLVM (currently) has no notion of statement
>>> boundaries, so it would fuse the operations in this function:
>>>
>>> float foo(float accum, float lhs, float rhs) {
>>>   float product = lhs * rhs;
>>>   return accum + product;
>>> }
>>>
>>> This isn't allowed even under FP_CONTRACT=on (the multiply and add do
>>> not occur within a single expression), so LLVM can't in good
>>> conscience enable these optimisations by default.
>>>
>>> Cheers.
>>>
>>> Tim.
>>>
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>


-- 
With regards,
Suyog Sarda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131219/9ffd9395/attachment.html>


More information about the llvm-dev mailing list