[PATCH] D42580: [ARM] Armv8.2-A FP16 code generation (part 2/3)

Mon Jan 29 03:05:49 PST 2018

olista01 added a comment.

Have you tried adding tablegen  patterns for bitconvert nodes between i16 and f16? That's how it currently works for f32<->i32, see the VMOVRS and VMOVSR instructions in ARMInstrVFP.td.

That should give us a better lowering of bitcasts (not using the default store/load lowering), but we might need some additional optimisations to remove the integer truncations where they are not needed.

I'm also concerned that this code might not be correct if it triggers on code other than that generated by the calling convention lowering. I'm thinking of the case where the source contains bitcasts i32->f16->i32. Would this change optimise away the truncation, changing the behaviour of that code?

================
Comment at: lib/Target/ARM/ARMInstrVFP.td:754
+let Predicates = [HasFullFP16] in {
+  def : Pat<(f16_to_fp GPR:$a),
+            (f32 (COPY_TO_REGCLASS GPR:$a, HPR))>;
----------------
This doesn't look right - f16_to_fp is a conversion from f16 to f32, but COPY_TO_REGCLASS doesn't do that.

https://reviews.llvm.org/D42580