[PATCH] D127158: [AArch64] Add intrinsic support for gpr<->fpr flavors of fixed-point converts

Wed Jun 8 14:56:19 PDT 2022

rmcclure added a comment.

> we need to make sure we still have an intrinsic that produces "scvtf h0, h0, #16" etc.

That particular example is easy, since there's only one instruction form that does the `int16_t` -> `float16_t` conversion.
For the cases where there are multiple choices, are you saying there should be a way to force a particular form, even if it is suboptimal? e.g. something to force generation of `scvtf s0, w0, #10` instead of `scvtf s0, s0, #10`, even if the source is already in an FPR?

> For something like "llvm.aarch64.neon.vcvtfp2fxs.i32.f32", I guess there are two possible instructions with effectively equivalent semantics. Ideally, the compiler is just clever enough to figure out the best one from the context. Given the way SelectionDAG isel works, you probably have to do some sort of post-isel fixup like AArch64AdvSIMDScalarPass .

Yes, this was my thought process, too. If `AArch64AdvSIMDScalarPass` is a reasonable place to add that "cleverness", I'm happy to add it there.

> In terms of what clang generates, we have to follow the NEON spec; we can't use an instruction that produces a different result just because it's more useful. So if the spec says the conversion produces a 16-bit integer, we have to produce a 16-bit integer.

Agreed! My claim is that the spec is currently inconsistent here (e.g. `vcvth_n_s64_f16` should return an `int64_t`, but the instruction used in the spec returns an `int16_t`). This change causes LLVM to match the source/return type specified in the spec, rather than the instruction specified in the spec (as mentioned, this causes us to match LLVM's behavior for the integer <-> fp converts, where the spec is similarly inconsistent).

> This patch also seems like it's changing a few too many things at once... can you split the patch into pieces that are more easy to review?

Hm... the patch effectively has 3 components:

1. adding patterns to map the instrinsics to the instructions
2. relocates the `fp_to_si(fmul(x, C))` (and similar) patterns
3. adds tests for the new patterns

All 3 components seem mostly inseparable (the existence of the patterns in #2 are protected by tests, so I'm loathe to remove them).
Having said that, I //could //separate the "fixed-point -> fp" and "fp -> fixed-point" changes, but given how similar the two directions are to each other, I thought it made more sense to keep them together than to separate them.

> My main reason for asking was whether the fp_to_si(fmul(x, C)) form would be acceptable to your use-case.

Unfortunately, that form isn't sufficient for my use-case. As you mentioned, there are ranges of fixed-point placement (2^16, and even higher, since `scvtf h0, x0, #n` supports an immediate up to 64).
Also, my experience is that some optimization pass tends to change the sequence into a form that the backend no longer recognizes (this probably applied more to the `fdiv(si_to_fp(x), C)` patterns).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127158/new/

https://reviews.llvm.org/D127158