[PATCH] D17414: Fix incorrect selection of AVX512 sqrt when OptForSize is on

Dimitry Andric via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 19 11:19:13 PST 2016


For this specific case, I would like to get a minimal fix in now, so it can hopefully be merged to the 3.8 branch before it freezes.

But indeed, a more structural fix should probably be considered for trunk.  Is there actually any formal (or semi-formal) definition of the .td 'language' somewhere? :)

-Dimitry

> On 19 Feb 2016, at 19:21, Jim Grosbach <grosbach at apple.com> wrote:
> 
> That makes sense, but is really, really fragile in that it’s depending on the order of evaluation of the Requires<> vs. the enclosing “let Predicates = “. I don’t believe that’s required to be one order or another.
> 
> Also, yes, I believe the Predicates should be additive, not a replacement, by default. There should be a way to override and replace, but that’s the uncommon case.
> 
>> On Feb 19, 2016, at 10:19 AM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote:
>> 
>> So from what I observed (I may be wrong), the other pattern are inlined like that:
>> 
>> let Predicates = [HasAVX512] in {
>>   def : Pat<(f32 (X86frsqrt (load addr:$src))),
>>             (COPY_TO_REGCLASS (VRSQRT14SSrm (v4f32 (IMPLICIT_DEF)), addr:$src), VR128X)>,
>>             Requires<[OptForSize]>;
>> 
>> The inner Requires<[OptForSize] is *ignored*. This is why I consider it as a misoptimization and not a correctness bug. They need to be fixed, but it should be a separate patch (and test case).
>> 
>> 
>> The bug that this patch is fixing is different, there was *no* enclosing scope with a predicate when declaring:
>> 
>> defm VSQRT   : avx512_sqrt_scalar_all<0x51, "vsqrt">, VEX_LIG;
>> 
>> It refers to a multiclass "avx512_sqrt_scalar_all" which itself refers to "avx512_sqrt_scalar", which contains this pattern:
>> 
>>   def : Pat<(_.EltVT (OpNode (load addr:$src))),
>>             (!cast<Instruction>(NAME#SUFF#Zm)
>>                 (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[OptForSize]>;
>> 
>> 
>> Here the OptForSize predicate is taken into account, but there is no other restriction.
>> It is only an issue for this pattern, the instruction themselves in this multiclass are ultimately defined by the "class AVX512" which has "Requires<[HasAVX512]>".
>> 
>> Now it seems to me that a pattern that tries to generate an instruction should "inherit" the restriction of this instruction, that would prevent such bugs.
>> 
>> Hope it makes sense.
>> 
>> --
>> Mehdi
>> 
>> 
>> 
>> 
>> 
>>> On Feb 19, 2016, at 10:06 AM, Jim Grosbach <grosbach at apple.com <mailto:grosbach at apple.com>> wrote:
>>> 
>>> If they’re overwriting the Predicates like this one was, they could have the same problem.
>>> 
>>>> On Feb 19, 2016, at 9:09 AM, Roman Divacky <rdivacky at vlakno.cz <mailto:rdivacky at vlakno.cz>> wrote:
>>>> 
>>>> What about the other cases of Requires[OptForSize] in the same file?
>>>> 
>>>> On Thu, Feb 18, 2016 at 10:11:59PM +0000, Dimitry Andric via llvm-commits wrote:
>>>>> dim created this revision.
>>>>> dim added reviewers: joker.eph, grosbach, resistor.
>>>>> dim added a subscriber: llvm-commits.
>>>>> 
>>>>> When optimizing for size, sqrt calls can be incorrectly selected as
>>>>> AVX512 VSQRT instructions.  This is because X86InstrAVX512.td has a
>>>>> `Requires<[OptForSize]>` in its `avx512_sqrt_scalar` multiclass
>>>>> definition.  Even if the target does not support AVX512, the class can
>>>>> apparently still be chosen, leading to an incorrect selection of
>>>>> `vsqrtss`.
>>>>> 
>>>>> In PR26625, this lead to an assertion: Reg >= X86::FP0 && Reg <=
>>>>> X86::FP6 && "Expected FP register!", because the `vsqrtss` instruction
>>>>> requires an XMM register, which is not available on i686 CPUs.
>>>>> 
>>>>> http://reviews.llvm.org/D17414 <http://reviews.llvm.org/D17414>
>>>>> 
>>>>> Files:
>>>>>  lib/Target/X86/X86InstrAVX512.td
>>>>>  test/CodeGen/X86/pr26625.ll
>>>>> 
>>>>> Index: test/CodeGen/X86/pr26625.ll
>>>>> ===================================================================
>>>>> --- /dev/null
>>>>> +++ test/CodeGen/X86/pr26625.ll
>>>>> @@ -0,0 +1,20 @@
>>>>> +; RUN: llc < %s -mcpu=i686 2>&1 | FileCheck %s
>>>>> +; PR26625
>>>>> +
>>>>> +target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
>>>>> +target triple = "i386"
>>>>> +
>>>>> +define float @x0(float %f) #0 {
>>>>> +entry:
>>>>> +  %call = tail call float @sqrtf(float %f) #1
>>>>> +  ret float %call
>>>>> +; CHECK-LABEL: x0:
>>>>> +; CHECK: flds
>>>>> +; CHECK-NEXT: fsqrt
>>>>> +; CHECK-NOT: vsqrtss
>>>>> +}
>>>>> +
>>>>> +declare float @sqrtf(float) #0
>>>>> +
>>>>> +attributes #0 = { nounwind optsize readnone }
>>>>> +attributes #1 = { nounwind optsize readnone }
>>>>> Index: lib/Target/X86/X86InstrAVX512.td
>>>>> ===================================================================
>>>>> --- lib/Target/X86/X86InstrAVX512.td
>>>>> +++ lib/Target/X86/X86InstrAVX512.td
>>>>> @@ -6005,7 +6005,7 @@
>>>>> 
>>>>>   def : Pat<(_.EltVT (OpNode (load addr:$src))),
>>>>>             (!cast<Instruction>(NAME#SUFF#Zm)
>>>>> -                (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[OptForSize]>;
>>>>> +                (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[HasAVX512, OptForSize]>;
>>>>> }
>>>>> 
>>>>> multiclass avx512_sqrt_scalar_all<bits<8> opc, string OpcodeStr> {
>>>>> 
>>>>> 
>>>> 
>>>>> Index: test/CodeGen/X86/pr26625.ll
>>>>> ===================================================================
>>>>> --- /dev/null
>>>>> +++ test/CodeGen/X86/pr26625.ll
>>>>> @@ -0,0 +1,20 @@
>>>>> +; RUN: llc < %s -mcpu=i686 2>&1 | FileCheck %s
>>>>> +; PR26625
>>>>> +
>>>>> +target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
>>>>> +target triple = "i386"
>>>>> +
>>>>> +define float @x0(float %f) #0 {
>>>>> +entry:
>>>>> +  %call = tail call float @sqrtf(float %f) #1
>>>>> +  ret float %call
>>>>> +; CHECK-LABEL: x0:
>>>>> +; CHECK: flds
>>>>> +; CHECK-NEXT: fsqrt
>>>>> +; CHECK-NOT: vsqrtss
>>>>> +}
>>>>> +
>>>>> +declare float @sqrtf(float) #0
>>>>> +
>>>>> +attributes #0 = { nounwind optsize readnone }
>>>>> +attributes #1 = { nounwind optsize readnone }
>>>>> Index: lib/Target/X86/X86InstrAVX512.td
>>>>> ===================================================================
>>>>> --- lib/Target/X86/X86InstrAVX512.td
>>>>> +++ lib/Target/X86/X86InstrAVX512.td
>>>>> @@ -6005,7 +6005,7 @@
>>>>> 
>>>>>   def : Pat<(_.EltVT (OpNode (load addr:$src))),
>>>>>             (!cast<Instruction>(NAME#SUFF#Zm)
>>>>> -                (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[OptForSize]>;
>>>>> +                (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[HasAVX512, OptForSize]>;
>>>>> }
>>>>> 
>>>>> multiclass avx512_sqrt_scalar_all<bits<8> opc, string OpcodeStr> {
>>>> 
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160219/ada95819/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 194 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160219/ada95819/attachment.sig>


More information about the llvm-commits mailing list