[llvm] [AArch64] Create set.fpmr intrinsic and assembly lowering (PR #114248)

Fri Nov 1 06:17:42 PDT 2024

CarolineConcatto wrote:

> Can you explain how these are intended to be used? I am maybe a bit out of touch but I was under the impression that the FP8 intrinsics had fpmr operands to make data-flow analysis possible. Does a stand-along intrinsic not go against that?

Do you mean that the compiler would have a pass to improve the writes into FMPR?Is that what you mean?
ATM this pass/data-flow analysis does not exist and the compiler would generate a write to FMPR to every FP8 intrinsic.
For instance the example I shown :
svfloat16x2_t svcvt1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
Will lower to: 
llvm.set.fpmr(i64 fpm)
{<vscale x 8 x half>,<vscale x 8 x half>}llvm.aarch64.scvt2.nxv8i16(<vscale x 8 x i8> %zn)
One of the reasons to have an llvm-ir  intrinsic for FPMR instead of  lowering to machine instruction late in the pipeline is because the compiler could hoist llvm.set.fprm outside a loop vectorizer, if it is a constant/does not change. AFAIU to hoist a machine instructions outside a loop is more complicated than llvm-i intrinsics. 

https://github.com/llvm/llvm-project/pull/114248