[PATCH] D155601: [AArch64][GISel] Additional FPExt vector lowering

Sun Jul 23 08:54:05 PDT 2023

dmgreen added a comment.

In D155601#4510788 <https://reviews.llvm.org/D155601#4510788>, @arsenm wrote:

> If you also test bfloat you'll find it miscompiles

Yeah. My understanding is that either the s16 type needs to distinguish between different float representations, or the G_FPEXT operation would need to, either with a different G_ opcode or some sort of flag.

================
Comment at: llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp:545
+      .clampNumElements(0, v4s32, v4s32)
+      .clampNumElements(0, v2s64, v2s64)
+      .scalarize(0);
----------------
tschuett wrote:
> .clampMinNumElements(0, s32, 2)
> .clampMaxNumElements(0, s32, 4)
> .clampMinNumElements(0, s64, 1)
> .clampMaxNumElements(0, s64, 2)
> 
> ?
The result type of an fpext needs to be a v4s32 or v2s64, to match a fcvtl. There are no v2f16->v2f32 instructions (unless you count the lower 2 lanes of a v4f16->v4f32 fcvtl, and it would seem better to legalize in the legalize step so we only have to deal with legal operation later, and reuse all the tablegen patterns).

================
Comment at: llvm/test/CodeGen/AArch64/fpext.ll:204
+entry:
+  %c = fpext <3 x half> %a to <3 x float>
+  ret <3 x float> %c
----------------
tschuett wrote:
> arsenm wrote:
> > tschuett wrote:
> > > What is the trick behind the difference?
> > the dag widened the vector and gisel scalarized it
> I missed the *3*.
Yeah the 3 is awkward. This comes from the expansion, adding `undef` lanes that are not yet cleaned up properly. It should be possible to improve it, and we can get back to the same codegen as SDAG

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155601/new/

https://reviews.llvm.org/D155601