[llvm-commits] LLVM patch to support ARM fused multiply add/subtract instructions

Tue Jan 24 02:39:12 PST 2012

Hi James,

> VFPv4 is a superset of VFPv3+fp16, same with NEONv2.
>
> "VFPv4 and VFPv4U add both the Half-precision Extension and the fused multiply-add instructions to the features of VFPv3."
>
> For the register set: "VFPv4 can be implemented with either thirty-two or sixteen doubleword registers"
> "Where necessary, these implementation options are distinguished using the terms: VFPv4-D32 or VFPv4-D16"
>
> "where the term VFPv4 is used it covers both options".
>
> So, VFP4 should imply VFPv4-D16, i.e. the smaller register file variant. There should be a way to optionally enable the 32 register variant.
Ok... So, we have the following set of features:

1. VFPv2, VFPv3, VFPv4. Each is a superset of the former.
2. NEON, NEONv2. Each is a superset of the former.
3. Additionally we have fp16 feature, which is available for
VPFv3/NEON. VFPv4 implies fp16.
4. VFPv3/VFPv4 (and corresponding NEONs) might be D16 and D32. While
by spec it seems that D16 should be default, this is different from
current defaults. So, I'd suggest to have D32 default.
5. MUL+ADD variant: separate, VMLA, VFMA. VFM should be enabled for
VFPv4/NEONv2 and only (?) if excess precision / fast-math is
requested.

The only problem I'm seeing is supporting of "vfpv4 + neon implies
neonv2", because it's based on combination of features and having just
neonv2 does not imply vfpv4.

Is there anything I missed?

-- 
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University