[PATCH] D17414: Fix incorrect selection of AVX512 sqrt when OptForSize is on
Mehdi Amini via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 19 11:25:04 PST 2016
> On Feb 19, 2016, at 11:19 AM, Dimitry Andric <dimitry at andric.com> wrote:
>
> For this specific case, I would like to get a minimal fix in now, so it can hopefully be merged to the 3.8 branch before it freezes.
The fix makes sense separately.
>
> But indeed, a more structural fix should probably be considered for trunk. Is there actually any formal (or semi-formal) definition of the .td 'language' somewhere? :)
Pretty rough but still:
http://llvm.org/docs/TableGen/index.html
http://llvm.org/docs/TableGen/LangIntro.html
http://llvm.org/docs/TableGen/LangRef.html
--
Mehdi
>
> -Dimitry
>
>> On 19 Feb 2016, at 19:21, Jim Grosbach <grosbach at apple.com <mailto:grosbach at apple.com>> wrote:
>>
>> That makes sense, but is really, really fragile in that it’s depending on the order of evaluation of the Requires<> vs. the enclosing “let Predicates = “. I don’t believe that’s required to be one order or another.
>>
>> Also, yes, I believe the Predicates should be additive, not a replacement, by default. There should be a way to override and replace, but that’s the uncommon case.
>>
>>> On Feb 19, 2016, at 10:19 AM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote:
>>>
>>> So from what I observed (I may be wrong), the other pattern are inlined like that:
>>>
>>> let Predicates = [HasAVX512] in {
>>> def : Pat<(f32 (X86frsqrt (load addr:$src))),
>>> (COPY_TO_REGCLASS (VRSQRT14SSrm (v4f32 (IMPLICIT_DEF)), addr:$src), VR128X)>,
>>> Requires<[OptForSize]>;
>>>
>>> The inner Requires<[OptForSize] is *ignored*. This is why I consider it as a misoptimization and not a correctness bug. They need to be fixed, but it should be a separate patch (and test case).
>>>
>>>
>>> The bug that this patch is fixing is different, there was *no* enclosing scope with a predicate when declaring:
>>>
>>> defm VSQRT : avx512_sqrt_scalar_all<0x51, "vsqrt">, VEX_LIG;
>>>
>>> It refers to a multiclass "avx512_sqrt_scalar_all" which itself refers to "avx512_sqrt_scalar", which contains this pattern:
>>>
>>> def : Pat<(_.EltVT (OpNode (load addr:$src))),
>>> (!cast<Instruction>(NAME#SUFF#Zm)
>>> (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[OptForSize]>;
>>>
>>>
>>> Here the OptForSize predicate is taken into account, but there is no other restriction.
>>> It is only an issue for this pattern, the instruction themselves in this multiclass are ultimately defined by the "class AVX512" which has "Requires<[HasAVX512]>".
>>>
>>> Now it seems to me that a pattern that tries to generate an instruction should "inherit" the restriction of this instruction, that would prevent such bugs.
>>>
>>> Hope it makes sense.
>>>
>>> --
>>> Mehdi
>>>
>>>
>>>
>>>
>>>
>>>> On Feb 19, 2016, at 10:06 AM, Jim Grosbach <grosbach at apple.com <mailto:grosbach at apple.com>> wrote:
>>>>
>>>> If they’re overwriting the Predicates like this one was, they could have the same problem.
>>>>
>>>>> On Feb 19, 2016, at 9:09 AM, Roman Divacky <rdivacky at vlakno.cz <mailto:rdivacky at vlakno.cz>> wrote:
>>>>>
>>>>> What about the other cases of Requires[OptForSize] in the same file?
>>>>>
>>>>> On Thu, Feb 18, 2016 at 10:11:59PM +0000, Dimitry Andric via llvm-commits wrote:
>>>>>> dim created this revision.
>>>>>> dim added reviewers: joker.eph, grosbach, resistor.
>>>>>> dim added a subscriber: llvm-commits.
>>>>>>
>>>>>> When optimizing for size, sqrt calls can be incorrectly selected as
>>>>>> AVX512 VSQRT instructions. This is because X86InstrAVX512.td has a
>>>>>> `Requires<[OptForSize]>` in its `avx512_sqrt_scalar` multiclass
>>>>>> definition. Even if the target does not support AVX512, the class can
>>>>>> apparently still be chosen, leading to an incorrect selection of
>>>>>> `vsqrtss`.
>>>>>>
>>>>>> In PR26625, this lead to an assertion: Reg >= X86::FP0 && Reg <=
>>>>>> X86::FP6 && "Expected FP register!", because the `vsqrtss` instruction
>>>>>> requires an XMM register, which is not available on i686 CPUs.
>>>>>>
>>>>>> http://reviews.llvm.org/D17414 <http://reviews.llvm.org/D17414>
>>>>>>
>>>>>> Files:
>>>>>> lib/Target/X86/X86InstrAVX512.td
>>>>>> test/CodeGen/X86/pr26625.ll
>>>>>>
>>>>>> Index: test/CodeGen/X86/pr26625.ll
>>>>>> ===================================================================
>>>>>> --- /dev/null
>>>>>> +++ test/CodeGen/X86/pr26625.ll
>>>>>> @@ -0,0 +1,20 @@
>>>>>> +; RUN: llc < %s -mcpu=i686 2>&1 | FileCheck %s
>>>>>> +; PR26625
>>>>>> +
>>>>>> +target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
>>>>>> +target triple = "i386"
>>>>>> +
>>>>>> +define float @x0(float %f) #0 {
>>>>>> +entry:
>>>>>> + %call = tail call float @sqrtf(float %f) #1
>>>>>> + ret float %call
>>>>>> +; CHECK-LABEL: x0:
>>>>>> +; CHECK: flds
>>>>>> +; CHECK-NEXT: fsqrt
>>>>>> +; CHECK-NOT: vsqrtss
>>>>>> +}
>>>>>> +
>>>>>> +declare float @sqrtf(float) #0
>>>>>> +
>>>>>> +attributes #0 = { nounwind optsize readnone }
>>>>>> +attributes #1 = { nounwind optsize readnone }
>>>>>> Index: lib/Target/X86/X86InstrAVX512.td
>>>>>> ===================================================================
>>>>>> --- lib/Target/X86/X86InstrAVX512.td
>>>>>> +++ lib/Target/X86/X86InstrAVX512.td
>>>>>> @@ -6005,7 +6005,7 @@
>>>>>>
>>>>>> def : Pat<(_.EltVT (OpNode (load addr:$src))),
>>>>>> (!cast<Instruction>(NAME#SUFF#Zm)
>>>>>> - (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[OptForSize]>;
>>>>>> + (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[HasAVX512, OptForSize]>;
>>>>>> }
>>>>>>
>>>>>> multiclass avx512_sqrt_scalar_all<bits<8> opc, string OpcodeStr> {
>>>>>>
>>>>>>
>>>>>
>>>>>> Index: test/CodeGen/X86/pr26625.ll
>>>>>> ===================================================================
>>>>>> --- /dev/null
>>>>>> +++ test/CodeGen/X86/pr26625.ll
>>>>>> @@ -0,0 +1,20 @@
>>>>>> +; RUN: llc < %s -mcpu=i686 2>&1 | FileCheck %s
>>>>>> +; PR26625
>>>>>> +
>>>>>> +target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
>>>>>> +target triple = "i386"
>>>>>> +
>>>>>> +define float @x0(float %f) #0 {
>>>>>> +entry:
>>>>>> + %call = tail call float @sqrtf(float %f) #1
>>>>>> + ret float %call
>>>>>> +; CHECK-LABEL: x0:
>>>>>> +; CHECK: flds
>>>>>> +; CHECK-NEXT: fsqrt
>>>>>> +; CHECK-NOT: vsqrtss
>>>>>> +}
>>>>>> +
>>>>>> +declare float @sqrtf(float) #0
>>>>>> +
>>>>>> +attributes #0 = { nounwind optsize readnone }
>>>>>> +attributes #1 = { nounwind optsize readnone }
>>>>>> Index: lib/Target/X86/X86InstrAVX512.td
>>>>>> ===================================================================
>>>>>> --- lib/Target/X86/X86InstrAVX512.td
>>>>>> +++ lib/Target/X86/X86InstrAVX512.td
>>>>>> @@ -6005,7 +6005,7 @@
>>>>>>
>>>>>> def : Pat<(_.EltVT (OpNode (load addr:$src))),
>>>>>> (!cast<Instruction>(NAME#SUFF#Zm)
>>>>>> - (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[OptForSize]>;
>>>>>> + (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[HasAVX512, OptForSize]>;
>>>>>> }
>>>>>>
>>>>>> multiclass avx512_sqrt_scalar_all<bits<8> opc, string OpcodeStr> {
>>>>>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160219/bcd73d8b/attachment.html>
More information about the llvm-commits
mailing list