[llvm-commits] Fix sitofp and fpextend codegen for x86/AVX[PR9473]
Bruno Cardoso Lopes
bruno.cardoso at gmail.com
Tue May 31 10:23:06 PDT 2011
Hi Syoyo,
On Tue, May 31, 2011 at 1:27 PM, Syoyo Fujita <syoyofujita at gmail.com> wrote:
> Attached are series of patch to fix sitofp and fpextend instruction
> isel for x86/AVX backend(mattr=+avx).
>
> The problem is reported here
>
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-March/038913.html
>
> and in bugzilla PR9473.
>
>
> Let me briefly explain what this patches do.
>
> sitofp instruction should be mapped to vcvtsitoss(or vcvtsitosd)
> assembler, and fpextend should be vcvtss2sd.
> These AVX instruction takes 2 input and 1 output in asm form, but for
> codegen(isel) form it should be 1 input and 1 output.
> It seems impossible to define DUMMY register in .td.
>
> So my solution is to separate .td definition into asm parser
> case(isAsmParserOnly=1) and codegen case(isCodeGenOnly =1)
>From the intel manual:
VCVTSS2SD- Convert one single-precision floating-point value in
xmm3/m32 to one double-precision floating- point value and merge with
high bits of xmm2.
And, according to your patch:
+let isAsmParserOnly = 1 in {
+ def VCVTSS2SDrm : I<0x5A, MRMSrcMem, (outs FR64:$dst),
+ (ins FR32:$src1, f32mem:$src2),
+ "vcvtss2sd\t{$src2, $src1, $dst|$dst, $src1, $src2}",
+ []>, XS, VEX_4V, Requires<[HasAVX, OptForSize]>;
+}
+
+def VCVTSS2SDrm_alt : I<0x5A, MRMSrcMem, (outs FR64:$dst),
+ (ins f32mem:$src),
+ "vcvtss2sd\t{$src, $src, $dst|$dst, $src, $src}",
+ []>, XS, VEX, Requires<[HasAVX, OptForSize]>;
The "alt" version is using a different encoding, this isn't correct,
since there's only one encoding for the "rm" version, which is the
"VEX_4V" one. There is no need for the "alt" version actually, but to
follow the manual "merge with high bits of xmm2":
Instead of doing:
+def : Pat<(extloadf32 addr:$src),
+ (VCVTSS2SDrm_alt addr:$src)>,
You can do:
def : Pat<(extloadf32 addr:$src2),
(VCVTSS2SDrm 0, addr:$src2)>,
or something like that...
A better solution, since this instruction is dealing with F32 reg
classes and the high bits won't be touched, is to declare VCVTSS2SDrm
as having Constraints = "$src1 = $dst", but keep printing its operands
as usual. Also, you can do the pattern matching inline in the
instruction definition, no need to do it as a Pat here. I believe you
can do something similar to VCVTSI2SD_alt.
--
Bruno Cardoso Lopes
http://www.brunocardoso.cc
More information about the llvm-commits
mailing list