[llvm-commits] Fix sitofp and fpextend codegen for x86/AVX[PR9473]

Sun Jun 12 08:15:05 PDT 2011

Hello Bruno,

Attached is a modified patch for fpextend codegen.

+let Predicates = [HasAVX] in
+ def : Pat<(fextend (loadf32 addr:$src)),
+           (VCVTSS2SDrm (f32 (EXTRACT_SUBREG (AVX_SET0PS), sub_ss)),
addr:$src)>;
+

I've removed VCVTSS2SDrm_alt and use (f32 (EXTRACT_SUBREG
(AVX_SET0PS), sub_ss)) to provide ZERO-filled register.
I know this is redundant, but it works and I cannot find any good way
so far( (f32 0) cause TblGen error btw).

>> A better solution, since this instruction is dealing with F32 reg
>> classes and the high bits won't be touched, is to declare VCVTSS2SDrm
>> as having Constraints = "$src1 = $dst", but keep printing its operands
>> as usual. Also, you can do the pattern matching inline in the
>> instruction definition, no need to do it as a Pat here. I believe you
>> can do something similar to VCVTSI2SD_alt.

Unfortunately, Constraints = "$src1 = $dst" for VCVTSS2SD cause TblGen error.
It seems that with VEX_4 attribute, you must specify different
(virtual) registers.

In the next, I'll send a modified patch of sitofp codegen for x86/AVX codegen.

--
Syoyo

On Fri, Jun 3, 2011 at 1:12 AM, Syoyo Fujita <syoyofujita at gmail.com> wrote:
> Hello Bruno,
>
> Thanks for the advice,
>
>> From the intel manual:
>>
>> VCVTSS2SD- Convert one single-precision floating-point value in
>> xmm3/m32 to one double-precision floating- point value and merge with
>> high bits of xmm2.
>>
>> And, according to your patch:
>>
>> +let isAsmParserOnly = 1 in {
>> +  def VCVTSS2SDrm : I<0x5A, MRMSrcMem, (outs FR64:$dst),
>> +                      (ins FR32:$src1, f32mem:$src2),
>> +                      "vcvtss2sd\t{$src2, $src1, $dst|$dst, $src1, $src2}",
>> +                      []>, XS, VEX_4V, Requires<[HasAVX, OptForSize]>;
>> +}
>> +
>> +def VCVTSS2SDrm_alt : I<0x5A, MRMSrcMem, (outs FR64:$dst),
>> +                    (ins f32mem:$src),
>> +                    "vcvtss2sd\t{$src, $src, $dst|$dst, $src, $src}",
>> +                    []>, XS, VEX, Requires<[HasAVX, OptForSize]>;
>>
>> The "alt" version is using a different encoding, this isn't correct,
>> since there's only one encoding for the "rm" version, which is the
>> "VEX_4V" one. There is no need for the "alt" version actually, but to
>> follow the manual "merge with high bits of xmm2":
>>
>> Instead of doing:
>>
>> +def : Pat<(extloadf32 addr:$src),
>> +          (VCVTSS2SDrm_alt addr:$src)>,
>>
>> You can do:
>>
>> def : Pat<(extloadf32 addr:$src2),
>>          (VCVTSS2SDrm 0, addr:$src2)>,
>>
>> or something like that...
>
>
> Ah, Its new for me.
> Since I had no idea how to use 'dummy' rester in .td, I just tried to
> solve the problem as in my patch.
> I'll investigate to rewrite my patch with above expression('0' in
> input register)
>
>
>> A better solution, since this instruction is dealing with F32 reg
>> classes and the high bits won't be touched, is to declare VCVTSS2SDrm
>> as having Constraints = "$src1 = $dst", but keep printing its operands
>> as usual. Also, you can do the pattern matching inline in the
>> instruction definition, no need to do it as a Pat here. I believe you
>> can do something similar to VCVTSI2SD_alt.
>>
>
> Okay, I'll try it and resend my (modified) patch.
>
> --
> Syoyo
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Fix-isel-fail-when-codegen-ing-fpext-instruction-for.patch
Type: application/octet-stream
Size: 2250 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20110613/b478517c/attachment.obj>