[PATCH] D17414: Fix incorrect selection of AVX512 sqrt when OptForSize is on
Dimitry Andric via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 19 11:19:13 PST 2016
For this specific case, I would like to get a minimal fix in now, so it can hopefully be merged to the 3.8 branch before it freezes.
But indeed, a more structural fix should probably be considered for trunk. Is there actually any formal (or semi-formal) definition of the .td 'language' somewhere? :)
-Dimitry
> On 19 Feb 2016, at 19:21, Jim Grosbach <grosbach at apple.com> wrote:
>
> That makes sense, but is really, really fragile in that it’s depending on the order of evaluation of the Requires<> vs. the enclosing “let Predicates = “. I don’t believe that’s required to be one order or another.
>
> Also, yes, I believe the Predicates should be additive, not a replacement, by default. There should be a way to override and replace, but that’s the uncommon case.
>
>> On Feb 19, 2016, at 10:19 AM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote:
>>
>> So from what I observed (I may be wrong), the other pattern are inlined like that:
>>
>> let Predicates = [HasAVX512] in {
>> def : Pat<(f32 (X86frsqrt (load addr:$src))),
>> (COPY_TO_REGCLASS (VRSQRT14SSrm (v4f32 (IMPLICIT_DEF)), addr:$src), VR128X)>,
>> Requires<[OptForSize]>;
>>
>> The inner Requires<[OptForSize] is *ignored*. This is why I consider it as a misoptimization and not a correctness bug. They need to be fixed, but it should be a separate patch (and test case).
>>
>>
>> The bug that this patch is fixing is different, there was *no* enclosing scope with a predicate when declaring:
>>
>> defm VSQRT : avx512_sqrt_scalar_all<0x51, "vsqrt">, VEX_LIG;
>>
>> It refers to a multiclass "avx512_sqrt_scalar_all" which itself refers to "avx512_sqrt_scalar", which contains this pattern:
>>
>> def : Pat<(_.EltVT (OpNode (load addr:$src))),
>> (!cast<Instruction>(NAME#SUFF#Zm)
>> (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[OptForSize]>;
>>
>>
>> Here the OptForSize predicate is taken into account, but there is no other restriction.
>> It is only an issue for this pattern, the instruction themselves in this multiclass are ultimately defined by the "class AVX512" which has "Requires<[HasAVX512]>".
>>
>> Now it seems to me that a pattern that tries to generate an instruction should "inherit" the restriction of this instruction, that would prevent such bugs.
>>
>> Hope it makes sense.
>>
>> --
>> Mehdi
>>
>>
>>
>>
>>
>>> On Feb 19, 2016, at 10:06 AM, Jim Grosbach <grosbach at apple.com <mailto:grosbach at apple.com>> wrote:
>>>
>>> If they’re overwriting the Predicates like this one was, they could have the same problem.
>>>
>>>> On Feb 19, 2016, at 9:09 AM, Roman Divacky <rdivacky at vlakno.cz <mailto:rdivacky at vlakno.cz>> wrote:
>>>>
>>>> What about the other cases of Requires[OptForSize] in the same file?
>>>>
>>>> On Thu, Feb 18, 2016 at 10:11:59PM +0000, Dimitry Andric via llvm-commits wrote:
>>>>> dim created this revision.
>>>>> dim added reviewers: joker.eph, grosbach, resistor.
>>>>> dim added a subscriber: llvm-commits.
>>>>>
>>>>> When optimizing for size, sqrt calls can be incorrectly selected as
>>>>> AVX512 VSQRT instructions. This is because X86InstrAVX512.td has a
>>>>> `Requires<[OptForSize]>` in its `avx512_sqrt_scalar` multiclass
>>>>> definition. Even if the target does not support AVX512, the class can
>>>>> apparently still be chosen, leading to an incorrect selection of
>>>>> `vsqrtss`.
>>>>>
>>>>> In PR26625, this lead to an assertion: Reg >= X86::FP0 && Reg <=
>>>>> X86::FP6 && "Expected FP register!", because the `vsqrtss` instruction
>>>>> requires an XMM register, which is not available on i686 CPUs.
>>>>>
>>>>> http://reviews.llvm.org/D17414 <http://reviews.llvm.org/D17414>
>>>>>
>>>>> Files:
>>>>> lib/Target/X86/X86InstrAVX512.td
>>>>> test/CodeGen/X86/pr26625.ll
>>>>>
>>>>> Index: test/CodeGen/X86/pr26625.ll
>>>>> ===================================================================
>>>>> --- /dev/null
>>>>> +++ test/CodeGen/X86/pr26625.ll
>>>>> @@ -0,0 +1,20 @@
>>>>> +; RUN: llc < %s -mcpu=i686 2>&1 | FileCheck %s
>>>>> +; PR26625
>>>>> +
>>>>> +target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
>>>>> +target triple = "i386"
>>>>> +
>>>>> +define float @x0(float %f) #0 {
>>>>> +entry:
>>>>> + %call = tail call float @sqrtf(float %f) #1
>>>>> + ret float %call
>>>>> +; CHECK-LABEL: x0:
>>>>> +; CHECK: flds
>>>>> +; CHECK-NEXT: fsqrt
>>>>> +; CHECK-NOT: vsqrtss
>>>>> +}
>>>>> +
>>>>> +declare float @sqrtf(float) #0
>>>>> +
>>>>> +attributes #0 = { nounwind optsize readnone }
>>>>> +attributes #1 = { nounwind optsize readnone }
>>>>> Index: lib/Target/X86/X86InstrAVX512.td
>>>>> ===================================================================
>>>>> --- lib/Target/X86/X86InstrAVX512.td
>>>>> +++ lib/Target/X86/X86InstrAVX512.td
>>>>> @@ -6005,7 +6005,7 @@
>>>>>
>>>>> def : Pat<(_.EltVT (OpNode (load addr:$src))),
>>>>> (!cast<Instruction>(NAME#SUFF#Zm)
>>>>> - (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[OptForSize]>;
>>>>> + (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[HasAVX512, OptForSize]>;
>>>>> }
>>>>>
>>>>> multiclass avx512_sqrt_scalar_all<bits<8> opc, string OpcodeStr> {
>>>>>
>>>>>
>>>>
>>>>> Index: test/CodeGen/X86/pr26625.ll
>>>>> ===================================================================
>>>>> --- /dev/null
>>>>> +++ test/CodeGen/X86/pr26625.ll
>>>>> @@ -0,0 +1,20 @@
>>>>> +; RUN: llc < %s -mcpu=i686 2>&1 | FileCheck %s
>>>>> +; PR26625
>>>>> +
>>>>> +target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
>>>>> +target triple = "i386"
>>>>> +
>>>>> +define float @x0(float %f) #0 {
>>>>> +entry:
>>>>> + %call = tail call float @sqrtf(float %f) #1
>>>>> + ret float %call
>>>>> +; CHECK-LABEL: x0:
>>>>> +; CHECK: flds
>>>>> +; CHECK-NEXT: fsqrt
>>>>> +; CHECK-NOT: vsqrtss
>>>>> +}
>>>>> +
>>>>> +declare float @sqrtf(float) #0
>>>>> +
>>>>> +attributes #0 = { nounwind optsize readnone }
>>>>> +attributes #1 = { nounwind optsize readnone }
>>>>> Index: lib/Target/X86/X86InstrAVX512.td
>>>>> ===================================================================
>>>>> --- lib/Target/X86/X86InstrAVX512.td
>>>>> +++ lib/Target/X86/X86InstrAVX512.td
>>>>> @@ -6005,7 +6005,7 @@
>>>>>
>>>>> def : Pat<(_.EltVT (OpNode (load addr:$src))),
>>>>> (!cast<Instruction>(NAME#SUFF#Zm)
>>>>> - (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[OptForSize]>;
>>>>> + (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[HasAVX512, OptForSize]>;
>>>>> }
>>>>>
>>>>> multiclass avx512_sqrt_scalar_all<bits<8> opc, string OpcodeStr> {
>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160219/ada95819/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 194 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160219/ada95819/attachment.sig>
More information about the llvm-commits
mailing list