[PATCH] D38315: [ARM] Armv8.2-A FP16 code generation (part 1/2)

Wed Jan 24 07:06:05 PST 2018

SjoerdMeijer updated this revision to Diff 131252.
SjoerdMeijer retitled this revision from "WIP: [ARM] Add f16 type support and code generation (part 1/2)" to "[ARM] Armv8.2-A FP16 code generation (part 1/2)".
SjoerdMeijer added a comment.

This is a rewrite, implementing the new approach:

1. Clang now passes and returns _Float16 values as floats, together with the required

bitconverts and truncs etc. to implement correct AAPCS behaviour, see also 
https://reviews.llvm.org/D42318. 
We will implement half-precision argument passing/returning lowering
in the ARM backend soon, but for now this means that this:

  _Float16 sub(_Float16 a, _Float16 b) {
     return a + b;
  }

gets lowered to this:

  define float @sub(float %a.coerce, float %b.coerce)  {
  entry:
    %0 = bitcast float %a.coerce to i32
    %tmp.0.extract.trunc = trunc i32 %0 to i16
    %1 = bitcast i16 %tmp.0.extract.trunc to half
    <SNIP>
    %add = fadd half %1, %3
    <SNIP>
  }

2. When FullFP16 is *not* supported, we don't make f16 a

legal type, and we get legalization for "free", i.e. nothing changes
and everything works as before. And also f16 argument passing/returning
is handled (by the Clang patch, see 1. above).

3.1. When FullFP16 is supported, we do make f16 a legal type,
and have 2 places that we need to patch up: f16 argument passing and
returning, which involves minor tweaks to avoid unnecessary code generation 
for some bitcasts.

3.2. As a "demonstrator" that this works for the different FP16, FullFP16, softfp 
modes, etc., I've added match rules to the VSUB instruction description showing that
we can codegen this instruction from IR, but more importantly, also to some 
conversion instructions. These conversions were causing issue before in the FP16
and FullFP16 cases.

3.3 I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
actually compile the entire half-precision sub code example above. This showed that 
these loads and stores had the wrong addressing mode specified: AddrMode5 instead
of AddrMode5FP16, which turned out not be implemented at all, so that has also been added.
Splitting this out in a separate doesn't make sense I think, because if it is not used, as
was the case, we're also not testing it.

Therefore, I think this is the minimal patch that shows all the different moving parts.
Once we are happy with this patch, I would like to commit it first, just to make sure
we are happy with this groundwork. And then in part 2/2, I will add the remaining FP16
instruction descriptions.

https://reviews.llvm.org/D38315

Files:
  lib/Target/ARM/ARMBaseInstrInfo.cpp
  lib/Target/ARM/ARMCallingConv.td
  lib/Target/ARM/ARMISelDAGToDAG.cpp
  lib/Target/ARM/ARMISelLowering.cpp
  lib/Target/ARM/ARMInstrFormats.td
  lib/Target/ARM/ARMInstrVFP.td
  lib/Target/ARM/ARMRegisterInfo.td
  lib/Target/ARM/Disassembler/ARMDisassembler.cpp
  lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h
  test/CodeGen/ARM/GlobalISel/arm-unsupported.ll
  test/CodeGen/ARM/fp16-instructions.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D38315.131252.patch
Type: text/x-patch
Size: 22642 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180124/23ed5236/attachment.bin>